ACL Reference Anthology


This file contains documentation on the ACL Reference Anthology, Linguistic Data Consortium (LDC) catalog number LDC2009T29 and isbn 1-58563-531-6

The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. The anthology serves as a reference repository and provides structured text for use in a variety of research tasks such as OCR benchmarking and bibliometric studies. The corpus can provide a standard reference text set for collaborative research and evaluations. The ACL Anthology covers most of the papers that appear in the ACL Anthology website up to February 2007. The material in this corpus was scanned at 600dpi grayscale for archival storage, and then down-sampled to 300dpi black-and-white, and assembled into articles and stored in the ‘PDF Image with Hidden Text’ format. Author and title metadata was extracted from the OCRed text, and used to build HTML index pages. Older materials such as conference proceedings from the 1960s and early volumes of the Journal of Computational Linguistics were manually digitized from microfiche slides.

The Association for Computational Linguistics is the international scientific and professional society for scholars working on problems involving natural language and computation. Membership includes the ACL quarterly journal,Computational Linguistics , reduced registration at most ACL-sponsored conferences, discounts on ACL-sponsored publications, and participation in ACL Special Interest Groups. Since 1988, The ACL journal, Computational Linguistics, has been the primary forum for research on computational linguistics and natural language processing.


This anthology includes:

Metadata includes a unique article ID, author(s), title, publication venue and year of publication.

Please see file.tbl for the directory structure of this publication, as well as a complete list of files.

Please go to data for a listing of data files.

Other documentation files are:



Additional information, updates, bug fixes may be available in the LDC catalog entry for this corpus at LDC2009T29.

Content Copyright

Portions © yyyy NAME OF SOURCE

© 2009 Linguistic Data Consortium, Trustees of the University of Pennsylvania. All Rights Reserved.