indica News Bureau-
The Coronavirus disease 2019 (COVID-19), caused by novel coronavirus, has become a pandemic in the last three months by spreading to over a hundred of locations internationally, including Asian, European and Gulf countries and the United States.
To help address this problem, researchers and leaders from the leading AI institutes, including Allen Institute for AI and Microsoft, and the federal government agency (i.e., the National Library of Medicine) have teamed together with extensive collaboration, resulting in the release of the COVID-19 Open Research Dataset (CORD-19) of scholarly literature about COVID-19, SARS-CoV-2, and other kinds of coronavirus.
According to a report of Greekwire, the COVID-19 Open Research Dataset, or CORD-19, was created in response to a request from the White House’s Office of Science and Technology Policy. It takes advantage of AI tools to organize more than 24,000 articles about the COVID-19 disease and the SARS-CoV-2 coronavirus that causes it.
This CORD-19 has attracted 122k views with over 3,000 downloads. More than 70 researchers have contributed to this challenge. In particular, the latest daily download number has exceeded 2,000.
“We think that AI has an important part to play in solving this problem,” said Doug Raymond, general manager for the Semantic Scholar academic search engine at the Allen Institute for Artificial Intelligence, also known as AI2.
AI2’s CEO, Oren Etzioni, said his team leaped at the opportunity to participate in CORD-19. “We hesitated all of negative-two seconds,” he joked.
This dataset is available on the Kaggle’s website, which is publicly accessible to any AI researchers through the link below. Researchers in the AI world can’t be unfamiliar with Kaggle — an online community of data scientists and machine learning researchers.
As part of the initiative, 10 high-level research questions have been posted on Kaggle, an online community for AI researchers that’s owned by Google’s cloud business. Among them: “What do we know about virus genetics, origin, and evolution?” “What do we know about COVID-19 risk factors?” And “What has been published about ethical and social science considerations?”
The dataset, which covers the coronavirus, the disease it causes, and the family of viruses it belongs to, will be updated as more articles become available.
The CORD-19 database was built on the foundation laid by Semantic Scholar, and is being housed on Semantic Scholar’s website.
“The core problem is information overload in research,” Raymond explained. “There are dozens of institutions that have published research on coronavirus. … Putting all the information together in a common format that is comprehensive is a huge challenge for researchers, and it’s a great application of our AI capabilities.”
For years, AI2’s researchers have been using tools such as machine learning and natural language processing to extract key features from research literature and help researchers find studies that are most relevant to problems they’re trying to solve.
In 2018, AI2 partnered with Microsoft to expand the scope of Semantic Scholar. The AI-enabled database now takes in more than 182 million research papers from all fields of science.
Microsoft is playing a similar role in curating the contents of CORD-19.
“It’s all hands on deck as we face the COVID-19 pandemic,” Eric Horvitz, chief scientific officer at Microsoft, said in a news release. “We need to come together as companies, governments and scientists, and work to bring our best technologies to bear across biomedicine, epidemiology, AI and other sciences.”
The National Library of Medicine at the National Institutes of Health facilitated access to 10,000 scholarly articles related to coronavirus. AI2 transformed all that content and more into machine-readable form, and created an adaptive feed that keeps users up to date on the research areas in which they’re most interested.
The Chan Zuckerberg Initiative and Georgetown University’s Center for Security and Emerging Technology also contributed to the effort.
CORD-19 will continue to be updated as new research about coronavirus is published on preprint servers and in peer-reviewed publications. Raymond pointed out that Semantic Scholar can also link academic research to clinical trial data, GitHub data archives and non-academic reports based on research.
Researchers making use of the database can share the data mining tools and insights they develop in response to the CORD-19 call to action via the Kaggle data science community.
“We’re putting this dataset up in front of our community of 4.3 million data scientists in the hope that the world’s AI community can help find answers to a key set of questions about COVID-19,” Anthony Goldbloom, Kaggle’s co-founder and CEO, said in a news release.
The key questions were formulated in coordination with experts on infectious disease at the World Health Organization and a standing committee at the National Academies of Science, Engineering and Medicine.
Michael Kratsios, the White House’s chief technology officer, said decisive action in the scientific community will play a critical role in stopping the coronavirus outbreak. He called on the U.S. research community to make full use of CORD-19.
“The White House will continue to be a strong partner in this all-hands-on-deck approach,” Kratsios said. “We thank each institution for voluntarily lending its expertise and innovation to this collaborative effort.”