H2GB.datasets.RCDDDataset

class RCDDDataset(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, force_reload: bool = False)[source]

Bases: InMemoryDataset

The risk commodity detection dataset (RCDD) from the “Datasets and Interfaces for Benchmarking Heterogeneous Graph Neural Networks” paper. RCDD is an industrial-scale heterogeneous graph dataset based on a real risk detection scenario from Alibaba’s e-commerce platform. It consists of 13,806,619 nodes and 157,814,864 edges across 7 node types and 7 edge types, respectively.

Note

The original RCDD dataset from PyG has node numbering bugs. It’s fixed according to our bug report on PyG Github issues.

Parameters:
  • root (str) – Root directory where the dataset should be saved.

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • force_reload (bool, optional) – Whether to re-process the dataset. (default: False)