ACL Logo

ACL Anthology Reference Corpus

This is the home page of the ACL Anthology Reference Corpus, a corpus of scholarly publications about Computational Linguistics. This corpus has two versions; both are canonicalized subsets of the ACL Anthology. The newer version includes all ACL Anthology files whose copyright belongs to the ACL (excluding COLING, LREC, etc.), up to December 2015, consisting of 22,878 articles. We hope this frozen corpus will be used for benchmarking applications for scholarly and bibliometric data processing.


Members for the 2016 versions.

Members for the previous 2009 versions.

Links to information about the corpus itself, alternative and related corpora and specific tools to process it. The below pertain to the earlier versions from 2009.

Here we list some related tools for bibliographic processing, and related sites for bibliographic research.

Our efforts have been supported by the grassroots initiative call made by the ACL Exec at the ACL annual 2007 meeting in Prague. We would like to acknowledge the support of the ACL Exec in encouraging this form of collaboration.

Thanks also go to Behrang Qasemizadeh, PhD student in the Unit for Natural Language Processing, Digital Enterprise Research Institute of the National University of Ireland, Galway (funded by Science Foundation Ireland) for his work on the SEPID ARC format and to Martin Helmout of Southampton for his work on proofchecking the files and schema of the XML files.