en de

Online Magazine

How AlphaFold helps drug discovery

Artificial intelligence (AI) can play a significant role in drug discovery. While this idea is not new, Google’s algorithm AlphaFold has completely changed the game. Let’s look at how exactly AI helps the pharma industry in finding a new drug, and what makes AlphaFold so revolutionary.

by Dzmitry Ashkinadze

Computer-aided drug discovery (CADD) has gained a significant foothold in the pharma industry in recent years. According to research, the market share of CADD is expected to reach $7.5 billion in 2030 at a compound annual growth rate (CAGR) of 11.48%. CADD is popular as it allows to speed up drug discovery and the drug’s time to market (TTM) by using state-of-the-art artificial intelligence (AI).

What do you think:
How else is AI being used in the healthcare industry?

Can AI help visually impaired people navigate their environment? Listen to our podcast on the topic!

Does AI help with the initial diagnosis of acute myeloid leukemia? Find out here!

Can AI detect pneumonia? Read the article about it!

Nowadays, the average development of each Food and Drug Administration (FDA) approved drug costs $2.6 billion. Only in 2021, FDA approved 50 new drugs with a total development cost of approximately $130 billion. Successful drug development is expensive, however, the major cost factor is failed drug development – in fact, 9 of 10 drugs currently fail to win approval. This puts a significant priority on lowering drug failure rates and improving the prediction of effective drugs, which can be attained by CADD.

How to discover a drug against a virus

But how does this highly promising way of discovering new drugs work? Let me first show you which 2 steps a CADD process principally consists of. For this, we will take the example of the Raltitrexed discovery. Raltitrexed is a drug that was found to have an effect on human immunodeficiency virus (HIV) by deactivating its “target” – a vulnerable part through which a drug could attack the virus. HIV’s target is the protein Thymidylate synthase. This is how researchers found the substance that could deactivate this protein:

  1. Find the target: In a first step, the researchers determined the structure or shape of the target (the protein) and identified the biologically active site, or the exact part of the protein structure that should be attacked by the drug (see Figure 1).
  2. Find a drug that can deactivate the target: In a second step, a series of potential drug molecules were designed and tested experimentally. From all the tested molecules, Raltitrexed was the most potent. Thus, it was passed on to clinical trials. Generally, the target is deactivated by binding the drug to the target after a “Lego principle” and blocking its activity. Thus, the target protein cannot fulfil its purpose anymore (see Figure 1).


Figure 1. The active target, i.e., the protein (red) can be blocked
and deactivated by the drug being bound to its biologically active site.

How can you increase productivity in your healthcare organization?

Since COVID-19, expectations have grown for the healthcare system regarding how quickly they find new drugs and treatments. But with that, the costs incurred also grew, putting increasing pressure on the industry's profitability.

So how do healthcare companies continue to drive innovation while increasing productivity and thereby managing expenses?

In this report, you'll find concrete tips.

What role does AI play in this?

In the context of the previously mentioned Step 1, the protein structure and its active site must be determined for novel targets. Structure determination is a long and expensive manual process which is typically done using experimental methods (no AI involved). Alternatively, the protein structure can be determined using prediction and AI. For the last 50 years, scientists attempted to predict the protein structure from its genetic code, but results were unusable in the context of CADD due to their poor quality. This situation changed in 2018 when Google’s DeepMind introduced the protein folding algorithm AlphaFold.

How AlphaFold changed the game

AlphaFold was presented in the Critical Assessment of Protein Structure Prediction (CASP) competition 2018, and the accuracy of its protein structure prediction outperformed all the other competing protein folding algorithms. The accuracy was measured using the so-called global distance test (GDT) that shows how close the protein structure prediction is to the ground truth (GDT=100) obtained with experimental methods. While none of the other algorithms had been able to surpass a GDT score of 40 for many years, AlphaFold’s GDT was 87. Thus, AlphaFold set a new standard for protein structure modeling using advanced AI (see Figure 2).

Today, AlphaFold predictions are approaching the quality of experimentally determined protein structures (GDT=100) and can be used for CADD. This makes AlphaFold a breakthrough technology for drug discovery and for science in general. By the British scientific journal Nature, it was even claimed “method of the year 2021”.

Figure 2. Left: AlphaFold and its advanced AI technology outperformed other competing protein folding algorithms on the CASP protein folding competition and set a new standard. Right: Examples of predicted (blue) vs. actual (green) protein structures. Source: DeepMind


So, what exactly separates AlphaFold from its competition? Like other approaches, to find the target (Step 1), AlphaFold heavily relies on the structural and genetic databases that contain the mapping between the known protein structures and their genetic codes. Structures of proteins with similar genetic code or even genetic code portions provide valuable information for protein folding.

In contrast to other protein folding algorithms, the AlphaFold team came up with a novel Evoformer approach that ensures that all collected information is consistent and prioritizes gathered information using a ML attention mechanism (see Figure 3). This key feature differentiates AlphaFold from the competition.

Figure 3. Left: Schematic of the AlphaFold pipeline, that predicts the protein 3D structure (right) from its genetic code (left). The key differentiator of AlphaFold is its Evoformer block that works using attention architecture.
Source: Jumper, John, et al. "Highly accurate protein structure prediction with AlphaFold." Nature 596.7873 (2021): 583-589, https://www.nature.com/articles/s41586-021-03819-2.


Finally, when the structure and its biologically active site are determined, it is time to come up with a selection of the potential drugs, as described in Step 2. Typically, this is done using docking methods that grade how accurately various known molecules from the active molecule database fit to the protein. The “most promising” drug candidates are selected and tested experimentally.

Those methods mostly assume that the drug molecule and the protein have the same shape before and after the drug binds to the protein active site (see Figure 1). Without such limitations, it would be hard to predict the interactions between a drug and the target. However, the drug design community is prepared for a surprise that might bring CADD another step further: in November 2021, Google’s DeepMind announced the creation of Isomorphic Labs that will build on the success of AlphaFold and try to further improve the CADD pipeline using advanced AI.


Computer-aided drug discovery (CADD) has the potential to lower the drug failure rate because it improves the prediction of effective drugs. Researchers tried to predict protein structures and use them for CADD for some time, however, Google’s AlphaFold has taken AI-driven protein structure prediction to a new level: while other algorithms that were supposed to determine the structure have for some time failed to surpass an accuracy of 40 GDT, AlphaFold’s GDT when presented at the CASP competition 2018 was 87.

With the creation of Isomorphic Labs, which was announced in November 2021, Google plans on further improving the CADD pipeline. I am excited to see what AI technology they will use to model the highly complex protein-drug system.

Your contact

Keep looking around!

Data analytics Machine learning

7 habits to shorten the time-to-value in process mining
AI in medicine Data analytics

The smart drinking cup
AI in business Data analytics Machine learning

How can banks become truly AI-driven?