H2GB.datasets.MAGDataset
- class MAGDataset(root: str, name: str, rand_split: bool = False, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]
Bases:
InMemoryDataset
The
ogbn-mag
and modifiedmag-year
dataset from the “Open Graph Benchmark: Datasets for Machine Learning on Graphs” paper.ogbn-mag
andmag-year
are heterogeneous graphs composed of a subset of the Microsoft Academic Graph (MAG). It contains four types of entities — papers (736,389 nodes), authors (1,134,649 nodes), institutions (8,740 nodes), and fields of study (59,965 nodes) — as well as four types of directed relations connecting two types of entities. Each paper is associated with a 128-dimensionalword2vec
feature vector, and all the other types of entities are originally not associated with input node features. We average the node features of all the published paper of an author to obtain the author feature.The task of
ogbn-mag
is to predict the venue (conference or journal) of each paper. In total, there are 349 different venues. The task ofmag-year
is to predict year that the paper is published. The five classes are chosen by partitioning the published year so that class ratios are approximately balanced.- Parameters:
root (str) – Root directory where the dataset should be saved.
name (str) – The name of the dataset (one of
"ogbn-mag"
,"mag-year"
)rand_split (bool, optional) – Whether to randomly re-split the dataset. This option is only applicable to
mag-year
. (default:False
)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)