Paving the way for personalized medicine — Deep Learning and Pharmacogenomics

You hear all the breakthroughs with artificial intelligence (AI), whether that’s beating five other professional poker players in a game of Texas Holdem, or Waymo’s self-driving car taxi service that’s quickly coming to production, or even finding faster ways to detect breast cancer.

The field of AI is growing exponentially, with discoveries like those happening frequently. But, what’s not that frequent compared to AI, is the progress in healthcare. Recently, there’s been this new field called pharmacogenomics, which is about studying “how genes affect a person’s response to drugs.

Just like how AI intersects with many fields, pharmacogenomics is the intersection between “pharmacology (the science of drugs) and genomics (the study of genes and functions)”. This new field aims to solve some of the challenges in developing effective and safe personalized medications.

So many intersections!!

But wait … we can also combine AI with pharmacogenomics to help speed the process up faster!

Recently, there’s this paper published by the University of Michigan exploring the intersection of deep learning (a discipline in AI) with pharmacogenomics. Particularly, the three main points that stood out where AI can be used are the following:

1. For patient stratification (subgrouping patient data before sampling) from medical records

2. Prediction of drug response and interactions with the human body

3. Toxicity prediction of certain chemicals/drugs

Before we start…

Let’s get a basic overview of what AI (specifically deep learning) is and understand the current limitations that are holding us back.

The architecture for a simple neural network

Deep learning attempts to replicate how the brain learns and recognizes things, through neural networks. Just like how the brain has neurons that fires off signals to other neurons, a neural network takes on a similar architecture! Chaining together thousands or millions of neurons allows us to replicate this ‘learning’ process, and this can be done through supervised (meaning we give it examples), unsupervised (it learns how to interpret the data completely on its own), or semi-supervised techniques.

Specifically, deep learning has been used in the healthcare sector to improve data modeling/analytics, make predictions for rare diseases, and using it for drug discovery.

The limitations of AI in healthcare:

Because public healthcare data is very limited and often the open-source data sets require a lot of cleaning and preprocessing, this makes it very difficult for most people to take advantage of machine learning. The typical machine learning algorithm will require a lot of training data, in order to make an accurate prediction. However, with limited data, there are solutions like feature selection.

In our particular case in drug discovery, this process isn’t well optimized. Because our goal is to create personalized medicine, this requires “careful experimental design”.

However, there are state-of-the-art machine learning algorithms such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), stacked auto-encoders (SAEs), and reinforcement learning approaches that can help tackle the complexity behind modeling biological systems.

To summarize, the limitations of machine learning at the moment are:

1. Still need human supervision and guidance

2. Need to carefully choose the right features

3. Need to preprocess/clean the data

4. Need to perform dimensionality reduction to achieve top performance

The impact that pharmacogenomics can have:

Because pharmacogenomics can be specialized for populations, specific communities, or individuals, there is the potential to impact other disciplines like genomics, pharmacology, oncology, psychiatry, neurology, and cardiology, and how we use this in practice.

Pharmacogenomics aims to impact the applications of medication optimization on an individual level, given a specific genotype. It also aims to reduce the time and cost for drug discovery and development (it takes 12 years and over $2.5 billion for this process!)

What’s currently being done:

Patient stratification:

The idea of patient stratification is to able to cluster subgroups of data within a larger dataset, to determine which patient data to use. The goal is to help reduce the time spent on choosing the correct data.

Diagram of clustering patient data

This process is complicated because it involves fusing biomedical, demographic, and socio-metric data to categorize the patients. The problem is the number of variables that we have to deal with, which will require lots of feature analysis and extraction (to make sure we pick the right data that we need).

Deep learning has the potential to learn useful data representation that can help with treatments or predictions. We want to be able to design models that are capable of finding patterns that are sparse and complex.

Current solutions being used would be Deep Patient, which uses SAEs, a semi-supervised technique that can predict “final diagnosis, patient risk level, and outcome (e.g. mortality, re-admission)”.

(Researchers are also looking into other semi-supervised techniques like Generative Adversarial Networks (GANs), which could possibly help with understanding this complex data.)

Drug discovery and development:

The approach that has been used for a while focused on the process after proteins are replicated (synthesized) by ribosomes (called post-translational modifications). There have been hundreds of these identified proteins that later become larger and complex proteins. Some of these larger proteins have demonstrated the potential to be druggable targets.

However, we can use deep learning approaches to help speed up the progress for finding the right protein candidate. The goal would be able to test a compound/drug by simulating the drug in a virtual human system.

The biggest challenge holding this field back is being under-funded in bioinformatics.

Toxicity prediction of certain chemicals/drugs:

The push for new methods to test for toxicity prediction started a challenge called Tox21 Data challenge. Given an input of 12,000 different chemicals and drugs, a deep learning model has to measure the result of 12 different toxic effects.

The deep learning model, DeepTox was able to achieve the highest performance in this challenge and was able to demonstrate the benefits of using a multi-task network over a single-task.

Models like DeepTox used a similar architecture to the DeepAOT family, which is comprised of methods like regression, multi-classification, and multi-task networks.

What’s really interesting about DeepAOT and DeepTox is that it’s not just limited to detecting oral toxicity, but also toxicity induced into more complex systems.

The future with AI:

In the next 5–10 years, it’s projected that more patient-specific data will be released, which will allow developers to train more accurate and powerful models.

Using machine learning, pharmaceutical companies could possibly save over $300 billion annually. It’s also predicted that the market for deep learning and drug discovery will be around $1.25 billion by 2024.

However, what we need to consider is being able to make high quality, labeled datasets shareable. By doing so, we can create a community where developers can quickly fine-tune the hyperparameters for models and apply transfer learning to conceptualize new data.


  • Limitations of machine learning applications in pharmacogenomics are due to: overfitting easily, lots of feature analysis/selection, and computationally expensive
  • The use of AI in pharmacogenomics has the potential to help uncover patterns in patient-data (specifically EHR data)
  • AI can also help stimulate and test drug candidates in a virtual simulation
  • It can also be used in toxicology and determining the toxicity prediction from certain chemicals/drugs


If you enjoyed my article or learned something new, please make sure to:

  • Check out my website!
  • Connect with me through LinkedIn
  • Send some feedback (I am always open to suggestions!)

18 y/o | Currently reading on infra/devops | Twitter @wlaw_

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store