DGFraud工具分析

无标签

发布日期: 2020-12-07

Datasets

DBLP

We uses the pre-processed DBLP dataset from Jhy1993/HAN You can run the FdGars, Player2Vec, GeniePath and GEM based on the DBLP dataset. Unzip the archive before using the dataset:

cd dataset
unzip DBLP4057_GAT_with_idx_tra200_val_800.zip

数据集 We extract a subset of DBLP which contains 14328 papers (P), 4057 authors (A), 20 conferences (C), 8789 terms (T). The authors are divided into four areas: database, data mining, machine learning, information retrieval. Also, we label each author’s research area according to the conferences they submitted. Author features are the elements of a bag-of-words represented of keywords.

里面包含论文，作者，会议和专业术语这几种实体

mat文件内部情况

label表示的是作者是哪个领域的，是我们多分类的目标。

Example dataset

We implement example graphs for SemiGNN, GAS and GEM in data_loader.py. Because those models require unique graph structures or node types, which cannot be found in opensource datasets.

Yelp dataset

For GraphConsis, we preprocessed Yelp Spam Review Dataset with reviews as nodes and three relations as edges.

The dataset with .mat format is located at /dataset/YelpChi.zip. The .mat file includes:

net_rur, net_rtr, net_rsr: three sparse matrices representing three homo-graphs defined in GraphConsis paper;
features: a sparse matrix of 100-dimension Bag-of-words features;
label: a numpy array with the ground truth of nodes. 1 represents spam and 0represents benign.

To get the complete metadata of the Yelp dataset, please send an email to ytongdou@gmail.com for inquiry.

CarlYoung

http://yc1999.github.io/2020/12/07/dgfraud-gong-ju-fen-xi/