
Coauthorship network with ground truth (for overlapping community detection)We construct coauthorship networks from DBLP, and Microsoft Academic Graph (MAG). Please see our paper for the citations to these two datasets. For DBLP, each community is a group of conferences; for MAG, each community is denoted by a ‘‘field of study’’ (FOS) tag. Each authorâ€™s ground truth community distribution (\(\mathbf{\theta}\) vector) is constructed by normalizing the number of papers he/she has published in conferences in a subfield (or papers that have the FOS tag). Please read our paper for details. We also construct bipartite version of the DBLP networks, where each node can either be an author or a paper, and the edges are between authors and papers. Please read our paper for details. Community Structure for Different NetworksDBLP1 has 6 communities as:  Machine Learning: NIPS, ICML, AISTATS, UAI  Theoritical Computer Science: STOC, FOCS, SODA, COLT, ITCS, RANDOM, ICALP, ISAAC  Data Mining: KDD, ICDM, CIKM, SDM, WSDM, RecSys  Computer Vision: CVPR, ICCV, ECCV, ICIP  Artificial Intelligence: AAAI, IJCAI  Natural Language Processing: ACL, NAACL, EMNLP, CONLL, COLING, EACL, SIGIR DBLP2 has 3 communities as:  Networking and Communications: INFOCOM, GLOBECOM, ICC  Systems: OSDI, SOSP, NSDI, SIGCOMM, MOBICOM, MOBISYS, CONEXT, ATC  Information Theory: ISIT, ITA, SIGMETRICS, MOBIHOC DBLP3 has 3 communities as:  Databases: VLDB, SIGMOD, PODS, CIKM, ICDE  Data Mining: KDD, ICDM, SDM, SIGIR  World Web Wide: WWW, WSDM, WINE, ICWSM DBLP4 has 3 communities as:  Programming Languages: PLDI, POPL, OOPSLA, ICLP, ESOP, ICFP  Software Engineering: FSE, ICSE, ASE/KBSE  Formal Methods: CAV, FM, SAS, FMSD, IFM, ICFEM, FORTE, CADE, TABLEAUX, LPAR DBLP5 has 4 communities as:  Computer Architecture: ASPLOS, ISCA, MICRO, HPCA  Computer Hardware: FPGA, CHES, ICCD, ISLPED, ASAP, ISPD  Realtime and Embedded Systems: RTSS, RTAS, ECRTS, MODELS, LCTRTS, CASES, EMSOFT, SCOPES  Computeraided Design: DAC, ICCAD, DATE, ASPDAC MAG1 has 3 communities as:  Computational Biology and Bioinformatics  Organic Chemistry  Genetics MAG2 has 3 communities as:  Machine Learning  Artificial Intelligence  Mathematical Optimization Data FormatFor each network, there are two txt files: Adjacency Matrix \(\mathbf{A}\in\mathbb{R}^{n\times n}\):
Community Groud Truth \(\mathbf{\Theta}\in\mathbb{R}^{n\times K}\):
DownloadThe data can be downloaded from here. The bipartite version of DBLP networks can be downloaded from here. Seperate files:
Code
CitationXueyu Mao, Purnamrita Sarkar, and Deepayan Chakrabart, ‘‘On Mixed Memberships and Symmetric Nonnegative Matrix Factorizations’’, in Proceedings of the 34th International Conference on Machine Learning, PMLR 70:23242333, 2017. [BibTeX] Xueyu Mao, Purnamrita Sarkar, and Deepayan Chakrabart, ‘‘Estimating Mixed Memberships with Sharp Eigenvector Deviations’’, Journal of the American Statistical Association, DOI: 10.1080/01621459.2020.1751645 [BibTeX] 