Building APIs to Leverage Knowledge Graphs

MIDS logo
: Academia, NGO
: 2022

Digital libraries are an indispensable source of information in the modern era. However, given the advances in technology and the gargantuan rate of information creation, it is becoming increasingly important to enable digital libraries with intelligent solutions that aid comprehension of the constantly evolving and growing sea of scholarly publications. To that end, we aim to facilitate the labeling of publications on the publication library of the IEEE Xplore Digital Library. To do so, we developed a proof-of-concept intelligent Natural Language Processing-based end-to-end pipeline that dynamically identifies and associates validated concepts with the academic papers. We tested a few popular methodologies on a representative sample of the IEEE scholarly data for both of those tasks and developed a pipeline based on the samples. The pipeline so created takes in scholarly publication text as input, extracts validated topics from standard external references and tags the papers with topics having the highest relevance scores. For extracting validated topics, it integrates both IEEE internal concept repository and popular external standard libraries to create a stand-alone set of valid concepts. While for tagging, it generates transformer-based embeddings of the text and then tags the papers with topics having high relevance. Relevancy is based on cosine-similarity of their embeddings using the in-house transformer-based language model. The outcome pipeline performs well when identifying niche tags but requires improvements on tagging generic topics.