H2GB.sampler.get_HGTloader

sampler.get_HGTloader(batch_size, shuffle=True, split='train')

A heterogeneous graph sampler that from the “Heterogeneous Graph Transformer” paper. This loader allows for mini-batch training of GNNs on large-scale graphs where full-batch training is not feasible.

The sampler tries to (1) keep a similar number of nodes and edges for each type and (2) keep the sampled sub-graph dense to minimize the information loss and reduce the sample variance.

Methodically, HGSampler keeps track of a node budget for each node type, which is then used to determine the sampling probability of a node. In particular, the probability of sampling a node is determined by the number of connections to already sampled nodes and their node degrees. With this, HGSampler will sample a fixed amount of neighbors for each node type in each iteration, as given by the neighbor_sizes argument from the configuration.

Sampled nodes are sorted based on the order in which they were sampled. In particular, the first batch_size nodes represent the set of original mini-batch nodes.

Parameters:
  • dataset (Any) – A InMemoryDataset dataset object.

  • batch_size (int) – The number of seed nodes (first nodes in the batch).

  • shuffle (bool) – Whether to shuffle the data or not (default: True).

  • split (str) – Specify which data split (train, val, test) is for this sampler. This determines some sampling parameter loaded from the configuration file, such as iter_per_epoch.