Advancing Data Driven Drug Discovery

Letian Kuai, head of WuXi Biology, WuXi AppTec.

Kuai introduced his organization’s strong biology department, with 3,000 scientists in nine sites, on three continents. They address end-to-end, service solutions from early discovery to preclinical candidate (PCC) and R&D, with a platform comprising early discovery, lead optimization, and in vivo pharmacology. A center of excellence supports oncology, immunology, and new modalities, including oligonucleotide therapeutics.

Kuai focused his presentation on WuXi AppTec’s early discovery platform, to which the team brings a combined 15 years of experience with research-scale protein production, including crystallography. Elements comprise a comprehensive screening platform, a world-leading DNA encoding library (DEL) screening platform, and an affinity selection mass spectrometry (ASMS) platform for rapid and flexible screening of soluble protein targets and compound libraries. The team has expertise in early phase biophysical, biochemical, and cellular assays. He detailed how the group leverages big data generated from those screenings and how it is working to improve those results.

He highlighted use and applications of the AlphaFold protein structure database (developed by DeepMind and EMBL-EBI,, which includes 200,000,000 protein structures. He and his team are using the database to improve stability of protein expression in ways that improve on homologous results achieved through traditional crystallography models. In addition to predicting difficult proteins, the database also can elucidate small-molecule protein interactions.

In a short case study, he noted how his group predicted a structure and deposited the manuscript into the bio archive. Not only was the resulting cryo-EM screen almost an exact match, but two salt bridges were identified that had not previously been seen.

But Kuai noted that AlphaFold is known to struggle with large multidomain and flexible proteins and does not address posttranslational modifications, protein dynamics and conformational changes. AlphaFold models also are built without the context of ligand binding, such as to a small molecule, a DNA, or other cofactors.

To address some of these shortcomings, Kuai introduced WuXi’s DEL, a collection of billions of compounds made using combinatorial chemistry. Each compound is tethered to a unique DNA bar code to facilitate deep sequencing. Incorporating data collected by the same scientists in the same laboratory on the same day (and in the same tube), this facilitates application of a big-data mindset to the collected data. The combinatorial nature of DEL also reveals hidden SAR information. It provides a holistic view beyond individual chemical “hits,” revealing a great deal of otherwise hidden knowledge. For example, by applying machine learning to all potential binders in a selection data set, a team can build a generative model that can be applied in other chemical spaces. Ideally, such a generative model can identify compounds in a commercial compound collection without requiring synthesis. If that is not yet feasible, the model can indicate how to modify those compounds as an initial step.

Kuai offered an example of combining DEL data with a machine learning model to identify compounds that are outside of the known intellectual property (IP) space and how those potential hits are docked into the structure provided by AlphaFold. Because the large data set provided by DEL can provide an alternative descriptor of a protein, the company is working to build an even larger data set. Currently, it has collected two thousand protein targets. The vision is to apply all the available tools to a target and then combine the resulting data with machine learning to improve successful target identification.

The final message from the WuXi Biology team is that all these screening methods not only help identify individual chemical matters, but they also can feed back into the science to provide much more information than previously possible for selecting a target. The chemical space that’s related to a target then constitutes that target’s fingerprint, enabling identification of off-target effects and target class issues.

Fill out the form below to view the full presentation now.