Researchers at the US Lawrence-Berkeley National Laboratory have used machine learning to reveal new scientific knowledge hidden in old research papers. Artificial intelligence makes it possible to make the link between separate scientific writings to draw new discoveries.
The majority of scientific knowledge exists only in the form of articles, and thus in text format, which makes any global analysis difficult. As a result, many potential discoveries are missed simply because no human being has been able to link two separate discoveries. A team of researchers from the Lawrence-Berkeley National Laboratory in the United States has published an article in the journal Nature, which details the use of artificial intelligence to alleviate this problem.
The researchers used deep learning, an automatic machine learning method of artificial intelligence that relies on neural networks, to create an Algorithm called Word2vec. The latter analyzed the abstracts – in other words, the abstracts – of 3.3 million articles dealing with the science of materials, and generated a vocabulary of about 500,000 words. The algorithm has broken down the relationships between the different words, each representing a vector.
Material predictions of the years before their actual discovery
This vectorization has allowed artificial intelligence to understand the structure of the periodic table of elements, or the relationship between the structure and the properties of materials, without any previous knowledge. The researchers were able to draw up a list of materials and to select the ten that, according to the IA, had the highest probability of being associated with the term “thermoelectric”, even if no article explicitly makes the link. By comparing them to different material databases, they were able to conclude that they all had an estimated potential above the average of known thermoelectric materials.
In order to verify the validity of their algorithm, the researchers wanted to use artificial intelligence to predict past discoveries. They have therefore removed the recent articles and once again led their model on 18 different text bodies, each time limiting the texts to those published before a limit year, between 2001 and 2018. The algorithm has, each time, given five materials considered the most promising for a thermoelectric application according to associations made in the literature.
A method applicable in all areas of research
They were able to predict the discovery of CuGaTe 2, one of the best modern thermoelectric materials, four years before its first publication in 2012. On four other materials highlighted by the IA on the basis of articles published before 2009, two have only been suggested in the literature as 8 or 9 years after the articles studied, while two others have never been tested.
The researchers brought artificial intelligence only to the texts, without giving it any prior information on the science of materials. This means that this method could be very easily used in other areas of research, and accelerate some discoveries of several years, or even allow new discoveries. According to Vahe Tshitoyan, one of the researchers, “We could use it for medical research or drug discovery. The information exists. We just did not make the link because you can not read all the articles.”