You are invited to participate in the 4th Joint Workshop on Bibliometric-enhanced IR and NLP for Digital Libraries (BIRNDL).
This is the 4th BIRNDL workshop, following a series of successful BIRNDL and BIR workshops other premier NLP/IR/DL venues. In conjunction with the BIRNDL workshop, we will also hold the 5th CL-SciSumm Shared Task in Scientific Document Summarization.
Reports from the shared task systems will be featured as part of a session at the workshop.
The goal of the BIRNDL workshop at SIGIR 2019 is to engage the IR community in the open problems in Big Science. Big Science refers to the large, cross-domain digital repositories which index research papers, such as the ACL Anthology, ArXiv, ACM Digital Library, PubMed, IEEE database, Web of Science and Google Scholar. Currently, digital libraries collect and allow access to digital papers and their metadata---inclusive of citations---but mostly do not analyze the items they index. The scale of growth in scholarly publishing poses a challenge for scholars in their search for relevant literature. Finding relevant scholarly literature is the key focus of the workshop and sets the agenda for methods and approaches to be discussed and evaluated at BIRNDL.
We invite papers and presentations that incorporate insights from IR, bibliometrics and NLP to develop new techniques to address the open problems in Big Science, such as evidence-based searching, measurement of research quality, relevance and impact, the emergence and decline of research problems, identification of scholarly relationships and influences and applied problems such as language translation, question-answering and summarization.For your reference please see proceedings of the 3rd BIRNDL workshop here and a recent report in SIGIR Forum http://sigir.org/wp-content/uploads/2019/01/p105.pdf.
By design, BIRNDL is an inclusive and diverse venue, in terms of both constituency and research. To promote a diverse constituency, we explicitly encourage female first authors. We invite stimulating research on topics including, but not limited to, full-text analysis, including multilingual analysis, IR methods for DL, and applications of citation-based NLP. Specific examples of fields of interest include:
Importantly, to address the scarcity of validated datasets in this area, we also invite papers describing new and pre-existing datasets. Submissions in this track will include instructions for accessing the data; metadata and documentation on its organization, content, and quality; and descriptions of possible use cases. We also invite descriptions of running projects and ongoing work as well as contributions from industry. Papers that investigate multiple themes directly are especially welcome.
The CLSciSumm19 corpus is expected to be of interest to a broad community including those working in natural language processing, machine learning, computational linguistics, text summarization, discourse structure in scholarly discourse, paraphrase, textual entailment and text simplification.
The task constitutes automatic scientific paper summarization in the Computational Linguistics (CL) domain. The output summaries will be of two types: faceted summaries of the traditional self-summary (the abstract) and the community summary (the collection of citation sentences .citances.). We also propose to group the citances by the facets of the text that they refer to.
At SIGIR 2019, we will hold the 5th Computational Linguistics (CL) Scientific Summarization Shared Task CL-SciSumm 2019 which is sponsored by SRI International and Chan-Zuckerberg Initiative (CZI). This task follows up on the successful CLSciSumm series since 2016 and a Pilot Task TAC 2014.
|Notification||June 18, 2019 |
|Camera Ready Contributions|
|Workshop||July 25, 2019|
Check the CL-SciSumm 2019 Shared Task homepage for details on dates with respect to the shared task. The dates are coordinated.
All deadlines for the BIRNDL workshop are calculated as 11:59pm Baker Island Time (BIT: UTC/GMT-12).
Abstract: Computational discourse processing has come a long way in the 10 years since my talk at ACL'2009 on "Discourse: Early problems, current successes, future challenges". I would attribute much of this progress to a weakening of doctrinal commitments that could not stand up to or deal with the vast amounts of textual data that we wanted to be able to use in information extraction, sentiment analysis, question answering, etc. Instead, progress has followed from a greater willingness to consider what can be learned from actual texts and various forms of annotation, in English and other languages as well. In this talk, I will review changing assumptions about discourse structure, summarize recent work on lexico-syntactic grounding of low-level discourse structure and genre-based frameworks for higher-level discourse structure, and end with some directions for addressing some of the remaining challenges.
BIO Bonnie Webber received her PhD from Harvard University and taught at the University of Pennsylvania in Philadelphia for 20 years before joining the School of Informatics at the University of Edinburgh, where she is now professor emeritus. Known for her research on discourse anaphora and discourse relations, she has served as President of the Association for Computational Linguistics (ACL) and Deputy Chair of the European COST action IS1312, "TextLink: Structuring Discourse in Multilingual Europe". Along with Aravind Joshi, Rashmi Prasad, Alan Lee and Eleni Miltsakaki, she is co-developer of the Penn Discourse TreeBank (both the 2008 Version 2 release, and the about-to-be-released Version 3.) She is a Fellow of the Association for Advancement of Artificial Intelligence (AAAI), the Association for Computational Linguistics (ACL) and the Royal Society of Edinburgh (RSE), where she is also convenor of the Research Awards Committee. She works towards promoting women to more prominent positions in the NLP community and in Science and Technology more generally.
Abstract: The Chan Zuckerberg Initiative's Meta discovery system is designed to help the biomedical research community stay up to date with the latest and most important papers and preprints, through feeds and search. Meta can generate a personalized feed of newly published papers specific and relevant to each user's scientific interests by leveraging state of the art embeddings and clustering techniques. Meta further calculates an article-level predicted Eigenfactor which is used in ranking the papers within each feed. This talk will demonstrate the Meta application and will cover some of the recent bibliometric approaches to query formulation and ranking to improve retrieval of recently published academic publications.
BIO:Alex Wade currently works with the Chan Zuckerberg Initiative as technical program manager for Meta. Previously Wade served as the Director for Scholarly Communication for Microsoft Research, focused on Microsoft Academic, a semantic knowledge graph of academic research publications, people, and institutions. During his career at Microsoft, Wade managed the corporate search and taxonomy management services and served as Senior Program Manager for Windows Search. Prior to joining Microsoft, he held Systems Librarian, Engineering Librarian, and Philosophy Librarian, and technical library positions at the University of Washington, the University of Michigan, and the University of California, Berkeley. Wade holds a bachelor's degree in Philosophy from the University of California, Berkeley, and a Master of Librarianship degree from the University of Washington
|8:30||Registration at the workshop and Poster Setup|
|9:00||Introduction to the workshop [paper]||Philipp Mayr and Muthu Kumar Chandrasekaran|
|9:10||Keynote:Personalized feed/query-formulation, predictive impact, and ranking [paper]||Alex Wade, Meta@Chan-Zuckerberg Initiative, CA, USA|
|9:50||Distant supervision for silver label generation of software mentions in social scientific publications [paper]||Katarina Boland and Frank Krüger|
|10:10||Transfer Learning for Scientific Data Chain Extraction in Small Chemical Corpus with BERT-CRF Model [paper]||Na Pang, Li Qian, Weimin Lyu and Jin-Dong Yang|
|11:00||Overview and Results: CL-SciSumm 2019 [paper]||Muthu Kumar Chandrasekaran|
|11:15||CL-Scisumm Winner Talk: University of Manchester [paper]||Chrysoula Zerva, Minh-Quoc Nghiem, Nhung Nguyen and Sophia Ananiadou|
|11:30||Invited Paper: Talksumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks [paper]||David Konopnicki, IBM Research - Israel|
|11:50||Supervised Learning for Automated Literature Review [>paper]||Jason Portenoy and Jevin West|
|12:00||Revaluating Semantometrics from Computer Science Publications [paper]||Christin Katharina Kreutz, Premtim Sahitaj and Ralf Schenkel|
|13:30||Can Models of Author Intention Support Quality Assessment of Content? [paper]||Arlene Casey, Bonnie Webber and Dorota Glowacka|
|13:40||Comparing Word Embeddings for Active Learning-based Document Screening [paper]||Andres Carvallo and Denis Parra|
|13:50||Extracting and matching patent in-text references to scientific publications [paper]||Suzan Verberne, Ioannis Chios and Jian Wang|
|14:10||Towards Formula Concept Discovery and Recognition [paper]||Philipp Scharpf, Moritz Schubotz, Howard Cohl and Bela Gipp|
|14:20||Poster Pitches and Poster Session|
|CiteTracked: A Longitudinal Dataset of Peer Reviews and Citations [paper]||Barbara Plank and Reinard van Dalen|
|Measuring the Prevalence of Open Citation Data [paper]||Chifumi Nishioka and Michael Färber|
|Why Machines Cannot Learn Mathematics, Yet [paper]||André Greiner-Petter, Terry Ruas, Moritz Schubotz, Akiko Aizawa, William Grosky and Bela Gipp|
|Thinking the citation space using End-to-end Neural Coreference Resolution Model [paper]||Marc Bertin, Pierre Jonin, Frédéric Armetta and Iana Atanassova|
|Why are you calling my name? Analysing author name mentions of highly cited references in their citation contexts [paper]||Rajesh Piryani, Wolfgang Otto, Philipp Mayr and Vivek Kumar Singh|
|IR&TM-NJUST @ CLSciSumm-19 [paper]||Shutian Ma, Heng Zhang, Tianxiang Xu, Jin Xu, Shaohu Hu and Chengzhi Zhang|
|CIST@CLSciSumm-19: Automatic Scientific Paper Summarization With Citances and Facets [paper]||Lei Li, Yingqi Zhu, Wei Liu, Zuying Huang, Yang Xie, Yinan Liu and Xingyuan Li|
|IRIT-IRIS @ CL-SciSumm 2019: Matching Citances with their Intended Reference Text Spans from the Scientic Literature||Yoann Pitarch, Karen Pinel-Sauvagnat, Gilles Hubert, Guillaume Cabanac and Ophélie Fraisier-Vannier|
|Transfer learning for effective scientific research comprehension||Bakhtiyar Syed, Vijayasaradhi Indurthi, Balaji Vasan Srinivasan and Vasudeva Varma|
|LaSTUS/TALN+INCO @ CL-SciSumm 2019||Ahmed Ghassan Tawfiq Abura'Ed, Àlex Bravo, Luis Chiruzzo and Horacio Saggion|
|Poli2Sum@CL-SciSumm 2019: identify, classify, and summarize cited text spans by means of ensembles of supervised models||Moreno La Quatra, Luca Cagliero and Elena Baralis|
|Ranking-based Identification of Cited Text with Deep Learning||Hyonil Kim and Shiyan Ou|
|Siamese recurrent bi-directional neural network for scientific summarization @ CL-SciSumm 2019||Aris Fergadis, Dimitris Pappas and Haris Papageorgiou|
|15:30||SRI Distinguished Keynote: Discourse Processing for Text Analysis (2009--2019): Recent successes, remaining challenges||Bonnie Webber, Univ. of Edinburgh|
|16:10||HITS hits Scholarly Publishing: Validating Peer Review Alternatives using Bias and Network Analysis||Michael Soprano, Kevin Roitero and Stefano Mizzaro|
|16:30||Summary, Outlook and Discussion||Philipp Mayr and Muthu Kumar Chandrasekaran|
|17:00||END OF WORKSHOP|
Regular Research paper track: All submissions must be written in English, following the Springer LNCS author guidelines (max. 6 pages for short and 12 pages for full papers; exclusive of unlimited pages for references) and should be submitted as PDF files to EasyChair. All submissions will be reviewed by at least two independent reviewers. Please be aware of the fact that at least one author per paper needs to register for the workshop and attend the workshop to present the work. In case of no-show the paper (even if accepted) will be deleted from the proceedings and from the program Submissions and reviewing will be managed by the EasyChair conference management system.
Poster track: We welcome submissions detailing original, early findings, works in progress and industrial applications of bibliometrics and IR for a special poster session, possibly with a 2-minute presentation in the main session. Some research track papers will also be invited to the poster track instead, although there will be no difference in the final proceedings between poster and research track submissions. These papers should follow the same format as the research track papers.
Shared Task: Teams that wish to participate in the CL Shared Task track at BIRNDL 2019 are invited to register on EasyChair by April 15th with a title and a tentative abstract describing their approach. Participants are advised to register as soon as possible in order to receive timely access to evaluation resources, including development and testing data. Registration for the task does not commit you to participation - but is helpful to know for planning. All participants who submit system runs are welcome to present their system at the BIRNDL Workshop in the poster session, while the best performing system will be invited to present their paper in the main session. Dissemination of CL-SciSumm work and results other than in the workshop proceedings is welcomed, but the conditions of participation specifically preclude any advertising claims based on these results. Any questions about conference participation may be sent to the organizers mentioned below.
Workshop proceedings will be deposited online in the CEUR workshop proceedings publication service (ISSN 1613-0073) and on the ACL anthology - This way the proceedings will be permanently available and citable (digital persistent identifiers and long term preservation)
is an Advanced Computer Scientist, Machine Learning at SRI International's Artificial Intelligence Center. Previously he was a Ph.D. student at NUS School of Computing. He is broadly interested in natural language processing, machine learning and their applications to information retrieval; specifically, in retrieving and organising information from asynchronous conversation media such as scholarly publications and discussion forums. He has been co-organizing the CL-SciSumm Shared Task series and the BIRNDL workshop series since 2014. He also reviews for ACL, EMNLP, NAACL and JCDL conferences. During his PhD he also spent time at the Allen Institute for Artificial Intelligence's Semantic Scholar research and National Institute of Informatics, Tokyo.
Philipp Mayr is a deputy department head and a team leader at the GESIS -- Leibniz-Institute for the Social Sciences department Knowledge Technologies for the Social Sciences (WTS). He has been a visiting professor for knowledge representation at University of Applied Sciences in Darmstadt, Department of Information Science and Engineering during 2009-2011. Philipp Mayr received his PhD in applied informetrics and information retrieval from the Berlin School of Library and Information Science at Humboldt University Berlin in 2009. To date, he has been awarded substantial research funding (PI, Co-PI) from national and European funding agencies. Philipp Mayr has published in top conferences and prestigious journals in the areas informetrics, information retrieval and digital libraries. His research group focuses on methods and techniques for interactive information retrieval. Philipp Mayr was the main organizer of the Combining Bibliometrics and Information Retrieval at ISSI 2013, the BIR workshops at ECIR 2014, 2015 and 2016 and the first BIRNDL workshop at JCDL 2016.
Dayne is the director of the Advanced Analytics group in SRI's Artificial Intelligence Center. His research seeks to apply artificial intelligence to information assimilation, management and exploitation. Freitag has served as principal investigator for a number of research projects including several large, multi-institutional efforts. His research goals have focused on the automation of data science; the automatic extension of mechanistic models through machine reading; knowledge federation over diverse information sources through data analytics and natural language processing; explaining the spread of ideas through online communities; and novel approaches to institutional knowledge management using controlled English. Freitag holds a B.A. in English literature from Reed College, and a Ph.D. in computer science from Carnegie Mellon University.
I am an Associate Professor at School of Computing, Singapore. My research interests fall under the areas of digital libraries, natural language processing, information retrieval, human-computer interaction. Specifically, they include document structure acquisition, verb analysis, digital library resource annotation and and applied text summarization. My research goal aims to investigate how natural language processing and information retrieval can be applied to improve scholarly publication and knowledge discovery.
The main organizers will be supported by our previous co-organizers:
The following committee members support the workshop series and will form our reviewer pool: