H2GB.sampler.get_HGTloader
- sampler.get_HGTloader(batch_size, shuffle=True, split='train')
A heterogeneous graph sampler that from the “Heterogeneous Graph Transformer” paper. This loader allows for mini-batch training of GNNs on large-scale graphs where full-batch training is not feasible.
The sampler tries to (1) keep a similar number of nodes and edges for each type and (2) keep the sampled sub-graph dense to minimize the information loss and reduce the sample variance.
Methodically, HGSampler keeps track of a node budget for each node type, which is then used to determine the sampling probability of a node. In particular, the probability of sampling a node is determined by the number of connections to already sampled nodes and their node degrees. With this, HGSampler will sample a fixed amount of neighbors for each node type in each iteration, as given by the
neighbor_sizes
argument from the configuration.Sampled nodes are sorted based on the order in which they were sampled. In particular, the first
batch_size
nodes represent the set of original mini-batch nodes.- Parameters:
dataset (Any) – A
InMemoryDataset
dataset object.batch_size (int) – The number of seed nodes (first nodes in the batch).
shuffle (bool) – Whether to shuffle the data or not (default:
True
).split (str) – Specify which data split (
train
,val
,test
) is for this sampler. This determines some sampling parameter loaded from the configuration file, such asiter_per_epoch
.