H2GB.datasets.IeeeCisDataset

class IeeeCisDataset(root: str, non_target_node_types: Optional[List[str]] = None, target_cat_feat_cols: Optional[List[str]] = None, transform: Optional[Callable] = None, pre_transform: Optional[Callable] = None)[source]

Bases: InMemoryDataset

IEEE-CIS-G is a heterogeneous financial network extracted from a tabular transaction dataset from IEEE-CIS Fraud Detection Kaggle Competition.

The original dataset contains credit card transactions provided by Vesta Corporation, a leading payment service company whose data consists of verified transactions. We defined a bipartite graph structure based on the available information linked to each credit card transaction, for example product code, card information, purchaser and recipient email domain, etc. The graph therefore contains 12 diverse entities, including the transaction node, and transaction information nodes. It also consists of 22 types of relation, connecting the transaction node to each information node. Each transaction is associated with a 4823-dimensional feature vector extracting from the transaction categorical and numerical features. More description of the features can be found in the Kaggle discussion. Each transaction node is labeled with a binary label tagging whether is a fraudulent transaction or not. This dataset has around 4% of fraudulent transactions. We split the dataset over the transaction time.

Parameters:
  • root (str) – Root directory where the dataset should be saved.

  • non_target_node_types (List[str], optional) – Define all other node types besides the transaction node. (default: None)

  • target_cat_feat_cols (List[str], optional) – Define the categorical feature columns for the transaction node. (default: None)

  • transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • force_reload (bool, optional) – Whether to re-process the dataset. (default: False)