Condensed Matter > Materials Science
[Submitted on 17 Sep 2022 (this version), latest version 15 Aug 2023 (v3)]
Title:ChemNLP: A Natural Language Processing based Library for Materials Chemistry Text Data
View PDFAbstract:Natural language processing (NLP) has an immense potential to aid materials design processes. While there have been several advancements in this field, a complete and integrated framework with well-curated dataset and tools to apply NLP is still needed. In this work, we present the ChemNLP library and an accompanying web-app that can be used to analyze important materials chemistry information. We use the publicly available arXiv dataset that has been collected over 34 years and contains ~1.8 million articles. First, we analyze the article publication trend, categorizations, and common phrases in the arXiv dataset. Then, we develop a user-friendly, interactive web-app to retrieve articles for a given chemical compound. Furthermore, we demonstrate the effectiveness of the proposed framework to accelerate the identification of superconducting materials. We determine the overlap between density functional theory and text-based databases for superconductors. Finally, we perform machine learning based clustering and classification tasks to quickly categorize scholarly articles given article title information with accuracy up to 81.2 %. ChemNLP is available at the websites: this https URL and this https URL.
Submission history
From: Kamal Choudhary [view email][v1] Sat, 17 Sep 2022 00:27:50 UTC (2,115 KB)
[v2] Wed, 29 Mar 2023 21:05:07 UTC (3,933 KB)
[v3] Tue, 15 Aug 2023 16:05:56 UTC (2,261 KB)
Current browse context:
cond-mat.mtrl-sci
Change to browse by:
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
IArxiv Recommender
(What is IArxiv?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.