H2GB.datasets.RCDDDataset
- class RCDDDataset(root: str, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None, force_reload: bool = False)[source]
Bases:
InMemoryDataset
The risk commodity detection dataset (RCDD) from the “Datasets and Interfaces for Benchmarking Heterogeneous Graph Neural Networks” paper. RCDD is an industrial-scale heterogeneous graph dataset based on a real risk detection scenario from Alibaba’s e-commerce platform. It consists of 13,806,619 nodes and 157,814,864 edges across 7 node types and 7 edge types, respectively.
Note
The original RCDD dataset from PyG has node numbering bugs. It’s fixed according to our bug report on PyG Github issues.
- Parameters:
root (str) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before every access. (default:None
)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.HeteroData
object and returns a transformed version. The data object will be transformed before being saved to disk. (default:None
)force_reload (bool, optional) – Whether to re-process the dataset. (default:
False
)