Joel Pfeiffer
Joseph J. Pfeiffer, III
jpfeiffer at purdue dot edu

Lawson 2149 #20
Purdue University
Department of Computer Science
305 North University Street
West Lafayette, IN 47907-2066


View Joseph Pfeiffer's profile on LinkedIn

Attributed Graph Models
Under Construction -- working on putting this together.

This page distributes a number of networks sampled through the Attributed Graph Model (AGM) framework. AGM allows for sampling a set of edges conditioned on the attributes of endpoints, meaning that the resulting set of (randomized) networks have clustering, graph distances, degree distributions, etc., as prescribed by their corresponding structural graph model, while having vertex attributes which correlate across the edges. When using or analyzing the sampled networks, please cite the following:

Attributed Graph Models: Modeling network structure with correlated attributes
Joseph J. Pfeiffer III, Sebastian Moreno, Timothy La Fond, Jennifer Neville and Brian Gallagher
In Proceedings of the 23rd International World Wide Web Conference (WWW 2014), 2014
[PDF] [BibTeX]

In addition to the above citation, each (a) structural model and (b) original dataset should be cited, when applicable. As the original datasets are the property of the original authors we do not distribute them (unless they request it); rather, we provide links to locations where their datasets can be found (if they are publically available).

Synthetic Dataset Downloads

Currently all local models of the attributes P(X) are Naive Bayes, with attributes conditioned on label, rather than the full joint (when only 1 or 2 attributes it is obviously equivalent). Conversely, the P(f(X)|E) vary depending on the dataset and are generally more complex. This was done as our goal is to demonstrate how to model correlation across edges between different variables.

Dataset Nodes Edges Attr Tot/Mod Data Cite Struct Cite Description
cora_agm_fcl 11,258 31,482 1/1 CoRA [4] FCL [1] CoRA citations dataset. FCL model used as proposal distribution. Attribute modeled is whether the topic is AI or not.
cora_agm_tcl 11,258 31,482 1/1 CoRA [4] TCL [2] CoRA citations dataset. TCL model used as proposal distribution. Attribute modeled is whether the topic is AI or not.
cora_agm_kpgm2x2 16,384 33,699 1/1 CoRA [4] KPGM [3] CoRA citations dataset. KPGM 2x2 model used as proposal distribution. Attribute modeled is whether the topic is AI or not.
cora_agm_kpgm3x3 19,683 33,137 1/1 CoRA [4] KPGM [3] CoRA citations dataset. KPGM 3x3 model used as proposal distribution. Attribute modeled is whether the topic is AI or not.
facebook_agm_large_fcl 444,817 1,016,621 2/2 N/A FCL [1] Facebook wall posting dataset. FCL model used as proposal distribution. Joint distribution of religion (label) and conservative (attr) is used.
facebook_agm_large_tcl 444,817 1,016,621 2/2 N/A TCL [2] Facebook wall posting dataset. TCL model used as proposal distribution. Joint distribution of religion (label) and conservative (attr) is used.
facebook_agm_large_kpgm2x2 524,288 924,759 2/2 N/A KPGM [3] Facebook wall posting dataset. KPGM with 2x2 initiator matrix used as proposal distribution. Joint distribution of religion (label) and conservative (attr) is used.
facebook_agm_large_kpgm3x3 531,441 1,303,771 2/2 N/A KPGM [3] Facebook wall posting dataset. KPGM with 3x3 initiator matrix used as proposal distribution. Joint distribution of religion (label) and conservative (attr) is used.
facebook_agm_small_fcl 5,906 36,685 3/3 N/A FCL [1] Facebook friendships. Religion, conservative and gender are jointly modeled.
facebook_agm_small_tcl 5,906 36,685 3/3 N/A TCL [2] Facebook friendships. Religion, conservative and gender are jointly modeled.
amazon_agm_DVD_fcl 16,118 37,798 28/1 Amazon[5] FCL [1] Amazon DVD copurchases. Attributes drawn from NB (conditioned on religion label). Edges conditioned on label. Label is whether salesrank is better than 10,000.
amazon_agm_DVD_tcl 16,118 37,798 28/1 Amazon[5] TCL [2] Amazon DVD copurchases. Attributes drawn from NB (conditioned on religion label). Edges conditioned on label. Label is whether salesrank is better than 10,000.
amazon_agm_Music_fcl 56,891 136,272 26/1 Amazon[5] FCL [1] Amazon Music copurchases. Attributes drawn from NB (conditioned on religion label). Edges conditioned on label. Label is whether salesrank is better than 10,000.
amazon_agm_Music_tcl 56,891 136,272 26/1 Amazon[5] TCL [2] Amazon Music copurchases. Attributes drawn from NB (conditioned on religion label). Edges conditioned on label. Label is whether salesrank is better than 10,000.
Related Work
Citation Number Citation Information Further Information
[1]
The average distances in random graphs with given expected degrees.
F. Chung and L. Lu
Internet Mathematics, 1, 2002
[2]
Fast Generation of Large Scale Social Networks While Incorporating Transitive Closures.
J. J. Pfeiffer III, T. La Fond, S. Moreno and J. Neville
In Proceedings of the Fourth ASE/IEEE International Conference on Social Computing, 2012
[3]
Kronecker Graphs: An Approach to Modeling Networks.
J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos and Z. Gharamani
In Journal of Machine Learning Research 11 (2010), Pages 985-1042
[4]
Automating the Construction of Internet Portals with Machine Learning.
A. McCallum, K. Nigam, J. Rennie and K. Seymore
In Journal of Information Retrieval 3, Issue 2 (2000), Pages 127-163
Dataset Download
[5]
The Dynamics of Viral Marketing.
J. Leskovec, L. Adamic and B. Adamic
ACM Transactions on the Web (ACM TWEB), 1(1), 2007
Dataset Download