H2GB.datasets.MAGDataset

class MAGDataset(root: str, name: str, rand_split: bool = False, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

Bases: InMemoryDataset

The ogbn-mag and modified mag-year dataset from the “Open Graph Benchmark: Datasets for Machine Learning on Graphs” paper.

ogbn-mag and mag-year are heterogeneous graphs composed of a subset of the Microsoft Academic Graph (MAG). It contains four types of entities — papers (736,389 nodes), authors (1,134,649 nodes), institutions (8,740 nodes), and fields of study (59,965 nodes) — as well as four types of directed relations connecting two types of entities. Each paper is associated with a 128-dimensional word2vec feature vector, and all the other types of entities are originally not associated with input node features. We average the node features of all the published paper of an author to obtain the author feature.

The task of ogbn-mag is to predict the venue (conference or journal) of each paper. In total, there are 349 different venues. The task of mag-year is to predict year that the paper is published. The five classes are chosen by partitioning the published year so that class ratios are approximately balanced.

Parameters:
  • root (str) – Root directory where the dataset should be saved.

  • name (str) – The name of the dataset (one of "ogbn-mag", "mag-year")

  • rand_split (bool, optional) – Whether to randomly re-split the dataset. This option is only applicable to mag-year. (default: False)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)