jpfeiffer at purdue dot edu
|
Attributed Graph Models
Under Construction -- working on putting this together.
This page distributes a number of networks sampled through the Attributed Graph Model (AGM) framework. AGM allows for sampling
a set of edges conditioned on the attributes of endpoints, meaning that the resulting set of (randomized) networks have clustering,
graph distances, degree distributions, etc., as prescribed by their corresponding structural graph model, while having
vertex attributes which correlate across the edges. When using or analyzing the sampled networks, please cite the following:
Attributed Graph Models: Modeling network structure with correlated attributes
Joseph J. Pfeiffer III, Sebastian Moreno, Timothy La Fond, Jennifer Neville and Brian Gallagher
In Proceedings of the 23rd International World Wide Web Conference (WWW 2014), 2014
[PDF]
[BibTeX]
@inproceedings {www2014,
author = {Joseph J. {Pfeiffer III} and Sebastian Moreno and Timothy {La Fond} and Jennifer Neville and Brian Gallagher},
title = {Attributed Graph Models: Modeling network structure with correlated attributes.},
year = {2014},
booktitle = {Proceedings of the 23rd International World Wide Web Conference (WWW 2014)}
}
In addition to the above citation, each (a) structural model and (b) original dataset should be cited, when applicable. As the original
datasets are the property of the original authors we do not distribute them (unless they request it); rather, we provide links
to locations where their datasets can be found (if they are publically available).
Synthetic Dataset Downloads
Currently all local models of the attributes P(X) are Naive Bayes, with attributes conditioned on label, rather than the full joint (when only 1 or 2 attributes it is obviously equivalent).
Conversely, the P(f(X)|E) vary depending on the dataset and are generally more complex. This was done as our goal is to demonstrate how to model correlation across edges between different variables.
Dataset |
Nodes |
Edges |
Attr Tot/Mod |
Data Cite |
Struct Cite |
Description |
|
cora_agm_fcl |
11,258 |
31,482 |
1/1 |
CoRA [4] |
FCL [1] |
CoRA citations dataset. FCL model used as proposal distribution. Attribute modeled is whether the topic is AI or not. |
cora_agm_tcl |
11,258 |
31,482 |
1/1 |
CoRA [4] |
TCL [2] |
CoRA citations dataset. TCL model used as proposal distribution. Attribute modeled is whether the topic is AI or not. |
cora_agm_kpgm2x2 |
16,384 |
33,699 |
1/1 |
CoRA [4] |
KPGM [3] |
CoRA citations dataset. KPGM 2x2 model used as proposal distribution. Attribute modeled is whether the topic is AI or not. |
cora_agm_kpgm3x3 |
19,683 |
33,137 |
1/1 |
CoRA [4] |
KPGM [3] |
CoRA citations dataset. KPGM 3x3 model used as proposal distribution. Attribute modeled is whether the topic is AI or not. |
|
facebook_agm_large_fcl |
444,817 |
1,016,621 |
2/2 |
N/A |
FCL [1] |
Facebook wall posting dataset. FCL model used as proposal distribution. Joint distribution of religion (label) and conservative (attr) is used. |
facebook_agm_large_tcl |
444,817 |
1,016,621 |
2/2 |
N/A |
TCL [2] |
Facebook wall posting dataset. TCL model used as proposal distribution. Joint distribution of religion (label) and conservative (attr) is used. |
facebook_agm_large_kpgm2x2 |
524,288 |
924,759 |
2/2 |
N/A |
KPGM [3] |
Facebook wall posting dataset. KPGM with 2x2 initiator matrix used as proposal distribution. Joint distribution of religion (label) and conservative (attr) is used. |
facebook_agm_large_kpgm3x3 |
531,441 |
1,303,771 |
2/2 |
N/A |
KPGM [3] |
Facebook wall posting dataset. KPGM with 3x3 initiator matrix used as proposal distribution. Joint distribution of religion (label) and conservative (attr) is used. |
|
facebook_agm_small_fcl |
5,906 |
36,685 |
3/3 |
N/A |
FCL [1] |
Facebook friendships. Religion, conservative and gender are jointly modeled. |
facebook_agm_small_tcl |
5,906 |
36,685 |
3/3 |
N/A |
TCL [2] |
Facebook friendships. Religion, conservative and gender are jointly modeled. |
|
amazon_agm_DVD_fcl |
16,118 |
37,798 |
28/1 |
Amazon[5] |
FCL [1] |
Amazon DVD copurchases. Attributes drawn from NB (conditioned on religion label). Edges conditioned on label. Label is whether salesrank is better than 10,000. |
amazon_agm_DVD_tcl |
16,118 |
37,798 |
28/1 |
Amazon[5] |
TCL [2] |
Amazon DVD copurchases. Attributes drawn from NB (conditioned on religion label). Edges conditioned on label. Label is whether salesrank is better than 10,000. |
|
amazon_agm_Music_fcl |
56,891 |
136,272 |
26/1 |
Amazon[5] |
FCL [1] |
Amazon Music copurchases. Attributes drawn from NB (conditioned on religion label). Edges conditioned on label. Label is whether salesrank is better than 10,000. |
amazon_agm_Music_tcl |
56,891 |
136,272 |
26/1 |
Amazon[5] |
TCL [2] |
Amazon Music copurchases. Attributes drawn from NB (conditioned on religion label). Edges conditioned on label. Label is whether salesrank is better than 10,000. |
Related Work
Citation Number |
Citation Information |
Further Information |
[1] |
The average distances in random graphs with given expected degrees. F. Chung and L. Lu Internet Mathematics, 1, 2002 |
|
[2] |
Fast Generation of Large Scale Social Networks While Incorporating Transitive Closures. J. J. Pfeiffer III, T. La Fond, S. Moreno and J. Neville In Proceedings of the Fourth ASE/IEEE International Conference on Social Computing, 2012 |
|
[3] |
Kronecker Graphs: An Approach to Modeling Networks. J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos and Z. Gharamani In Journal of Machine Learning Research 11 (2010), Pages 985-1042 |
|
[4] |
Automating the Construction of Internet Portals with Machine Learning. A. McCallum, K. Nigam, J. Rennie and K. Seymore In Journal of Information Retrieval 3, Issue 2 (2000), Pages 127-163 |
Dataset Download |
[5] |
The Dynamics of Viral Marketing. J. Leskovec, L. Adamic and B. Adamic ACM Transactions on the Web (ACM TWEB), 1(1), 2007 |
Dataset Download |
|