
Share
IBM and NASA have联手打造了一套专为科学文献设计的transformer模型,涵盖分类、实体提取等多项任务,现已开源,旨在助力科研与学术界突破信息处理的界限。
In a groundbreaking collaboration, IBM and NASA have developed a suite of transformer-based language models specifically trained on scientific literature. These models, which leverage the transformer architecture, are designed to excel in various natural language understanding tasks such as classification, entity extraction, question-answering, and information retrieval. The models have been open-sourced on Hugging Face, making them accessible to the broader scientific and academic communities.
The IBM-NASA models were trained on a massive corpus of 60 billion tokens, encompassing domains such as astrophysics, planetary science, earth science, heliophysics, and biological and physical sciences. This extensive dataset ensures that the models have a deep understanding of scientific terminology.
The IBM-NASA models demonstrate superior performance across various benchmarks:
The models are built using the transformer architecture, which has become the de facto standard for natural language processing (NLP) tasks. Key components include:

These specialized language models have numerous practical applications:
By open-sourcing these models on Hugging Face, IBM and NASA aim to foster collaboration and innovation in the scientific community. Researchers and developers can leverage these models to build more sophisticated NLP applications tailored to specific scientific domains.
The IBM-NASA collaboration marks a significant step forward in the development of specialized language models for scientific literature. These models not only outperform general-purpose alternatives but also offer unique capabilities that are essential for advanced NLP tasks in scientific research. The open-sourcing of these models ensures that they can be widely adopted and further refined by the community.
Source: IBM Research Blog
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
20 March 2024
88 articles
Related Articles
Related Articles
More Stories