Whether we take it for granted or not, deep learning algorithms have become an inseparable part of our daily lives. Personalized feeds, face and voice recognition, web search, smart speakers, digital assistants, email, and many other applications that we can’t part ways with using deep learning algorithms under the hood.
But how effective is deep learning in scientific research, where problems are often much more complex than classifying an image and requirements are much more sensitive than recommending what to buy next?
To answer this question, former Google CEO Eric Schmidt and Google AI researcher Maithra Raghu have put together a comprehensive guide on the different deep learning techniques and their application to scientific research.
“The amount of data collected in a wide array of scientific domains is dramatically increasing in both size and complexity,” the authors write, adding that along with advances in machine learning, this rich corpus of data can provide “many exciting opportunities for deep learning applications in scientific settings.”
Read: [ How deep learning algorithms can be used to measure social distancing ]
Titled “ A Survey of Deep Learning for Scientific Discovery ,” their guide provides a very accessible overview of deep learning and neural networks for scientists who aren’t necessarily versed in the complex language of artificial intelligence algorithms.
I strongly recommend reading the entire 48-page document and visit many of its references. But here are some key takeaways. You don’t necessarily need to do deep learning
With deep learning being all the rage, it’s easy to be tempted to apply it to anything and everything. After all, the basic proposition is very attractive: It’s an end-to-end AI model that takes a bunch of data, develops a mathematical representation, and performs complex classification and prediction tasks.
Deep neural networks can tackle problems previously solved by other types of machine learning algorithms, such as content recommendation or fraud detection. They can also handle problems that were traditionally difficult to handle with other machine learning techniques, including complex computer vision and natural language processing (NLP) tasks.
However, Schmidt and Raghu warn, when formulating a problem, it is important to consider whether deep learning provides the right set of tools to solve it. “In many settings, deep learning may not be the best technique to start with or best suited to the problem,” they write.
For many problems, simpler machine learning algorithms often provide more efficient solutions. For instance, if you want to find the most relevant of a set of chemical characteristics of different substances, you might be better off using “dimensionality reduction,” a technique that can find the features that contribute most to outcomes.
On the other hand, if you have limited data or if your data has been neatly arranged in a tabular format, you might want to consider trying a regression model before using neural networks. Neural networks usually (but not always) need lots of data. They are also difficult to interpret. In contrast, linear and logistic regression algorithms can provide more accurate results when the data is scarce, especially if the problem is linear in nature. Regression models also provide a clear mathematical equation with coefficients that explain the relevance of each feature in the dataset. Deep learning for image-related scientific tasks
Scientists are using deep learning algorithms to detect signs of COVID-19 infection in chest x-rays of patients (source: COVID-Net ) One area where deep learning algorithms have been very effective is the processing of visual data. The authors describe convolutional neural networks as “the most well-known family of neural networks” and “very useful in working with any kind of image data.”
Aside from the commercial and industrial applications , CNNs have found their way into many scientific domains. One of the best-known applications of convolutional neural networks is medical imaging analysis. There are already many deep learning algorithms that examine CT scans and x-rays and help in the diagnosis of diseases such as cancer. Recently, scientists have been using CNNs to find symptoms of the novel coronavirus in chest x-rays.
Some of the visual applications of deep learning are less known. For instance, neuroscientist are experimenting with pose-detection neural networks to track the movements of animals and analyze their behavior. NLP technology can expand to other fields
Another area that has benefitted immensely from advances in deep learning algorithms is natural language processing . Recurrent neural networks, long short-term memory (LSTM) networks, and Transformers have proven to be especially good at performing language-related tasks such as translation and question-answering.
To be clear, the current AI algorithms process language in fundamentally different—and inferior—ways than the human brain. Even the largest neural network will fail at some of the simplest tasks that a human child with a very rudimentary understanding of language can perform.
This is because like all other types of neural networks, RNNs and Transformers are at their very core pattern-matching machines. They can find recurring patterns in sequences of data, whether it be text or any other kind of information. According to Schmidt and Raghu, these structures can be used in “Problems where the data has a sequential nature (with different sequences of varying length), and prediction problems such as determining the next sequence token, transforming one sequence to another, or determining sequence similarities are important tasks.”
While this scheme presents limits in dealing with the abstract and implied meanings of language , it has some very interesting applications in scientific research in areas such as genomics and proteomics, where sequential structures play an important role.
Transformers have proven to be especially efficient in scientific research. In one recent project , AI researchers used unsupervised learning to train a bidirectional Transformer on 86 billion amino acids across 250 million sequences spanning evolutionary diversity. “The resulting model maps raw sequences to representations of biological properties without labels or prior domain knowledge,” the researchers write. This is an important step toward understanding protein sequences and extracting general and transferable information about proteins from raw sequences. What if you don’t have a lot of data?
One of the main criticisms against deep learning is its […]