DGFraud工具分析


Datasets

DBLP

We uses the pre-processed DBLP dataset from Jhy1993/HAN You can run the FdGars, Player2Vec, GeniePath and GEM based on the DBLP dataset. Unzip the archive before using the dataset:

cd dataset
unzip DBLP4057_GAT_with_idx_tra200_val_800.zip

数据集We extract a subset of DBLP which contains 14328 papers (P), 4057 authors (A), 20 conferences (C), 8789 terms (T). The authors are divided into four areas: database, data mining, machine learning, information retrieval. Also, we label each author’s research area according to the conferences they submitted. Author features are the elements of a bag-of-words represented of keywords.

里面包含论文,作者,会议和专业术语这几种实体

mat文件内部情况

  • label表示的是作者是哪个领域的,是我们多分类的目标。

Example dataset

We implement example graphs for SemiGNN, GAS and GEM in data_loader.py. Because those models require unique graph structures or node types, which cannot be found in opensource datasets.

Yelp dataset

For GraphConsis, we preprocessed Yelp Spam Review Dataset with reviews as nodes and three relations as edges.

The dataset with .mat format is located at /dataset/YelpChi.zip. The .mat file includes:

  • net_rur, net_rtr, net_rsr: three sparse matrices representing three homo-graphs defined in GraphConsis paper;
  • features: a sparse matrix of 100-dimension Bag-of-words features;
  • label: a numpy array with the ground truth of nodes. 1 represents spam and 0represents benign.

To get the complete metadata of the Yelp dataset, please send an email to ytongdou@gmail.com for inquiry.


文章作者: CarlYoung
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 CarlYoung !
 上一篇
图信号处理 图信号处理
图信号处理(Graph Signal Process)实在是涉及太多概念了,需要好好梳理一下。
2020-12-10
下一篇 
How to Read a Paper How to Read a Paper
[TOC] 2. “三步走”方法读论文不是一蹴而就的,拿到一篇论文,直接一股脑读下去是不明智的!读论文应该分“三步走”~每一步都有自己的目标,都是建立在前一步的基础上的: 第一步:了解论文的idea是什么? 第二步:理解论文的内容(但不是
2020-12-06 CarlYoung
  目录