H2GB.datasets.MAGDataset
- class MAGDataset(root: str, name: str, rand_split: bool = False, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
Bases:
InMemoryDatasetThe
ogbn-magand modifiedmag-yeardataset from the “Open Graph Benchmark: Datasets for Machine Learning on Graphs” paper.ogbn-magandmag-yearare heterogeneous graphs composed of a subset of the Microsoft Academic Graph (MAG). It contains four types of entities — papers (736,389 nodes), authors (1,134,649 nodes), institutions (8,740 nodes), and fields of study (59,965 nodes) — as well as four types of directed relations connecting two types of entities. Each paper is associated with a 128-dimensionalword2vecfeature vector, and all the other types of entities are originally not associated with input node features. We average the node features of all the published paper of an author to obtain the author feature.The task of
ogbn-magis to predict the venue (conference or journal) of each paper. In total, there are 349 different venues. The task ofmag-yearis to predict year that the paper is published. The five classes are chosen by partitioning the published year so that class ratios are approximately balanced.- Parameters:
root (str) – Root directory where the dataset should be saved.
name (str) – The name of the dataset (one of
"ogbn-mag","mag-year")rand_split (bool, optional) – Whether to randomly re-split the dataset. This option is only applicable to
mag-year. (default:False)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroDataobject and returns a transformed version. The data object will be transformed before every access. (default:None)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroDataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:None)