Data sharing in the age of AI driven drug discovery

04 July 2019


by Krishna Vallurupalli Consulting, Pharmaceutical and life sciences

Email +44 (0)7809 777483

The recently announced MELLODDY (Machine Learning Ledger Orchestration for Drug Discovery) consortium marks the first time major pharmaceutical companies across Europe join hands to share their chemical libraries in the interest of accelerating drug discovery. This initiative aims to use ‘federated learning’ and blockchain architecture to train AI models across multi-partner datasets with full traceability. This collaborative model helps boost the AI engine’s predictive power while protecting the intellectual property rights from competitors. 

AI driven drug discovery is rapidly becoming mainstream and the pharmaceutical industry is hedging its bets on emerging technologies to revolutionise science (see our last health perspectives blog). Data and data collection techniques are the foundation for AI. Typically, more (better quality) data leads to better AI results. However, a pharma company’s in-house datasets are stunted by nature. Such data only says what’s happening within the company’s discovery labs. Therefore, it makes more sense to combine their own data with the competitors’ and run an external AI engine to generate better insights. In reality, this is easier said than done considering the competitiveness of the drug market. Companies do everything they can to protect their intellectual property. However, projects such as MELLODDY are essential for sharing data among competitors for the collective benefit.

Operating alone is no longer an option for pharma companies. Although most big pharma companies collaborate with AI startups and tech companies, data still remains the bottleneck to train AI engines to deliver quality outputs for better drug design. 

Data sharing in the following ways may be the best solution for this AI bottleneck:

Data format standardisation: Using a standard data format is beneficial as additional resources may be needed to convert different data formats, which adds to data sharing costs. One example is the UDM (Unified Data Model). The UDM project aims to deliver an open and freely available data format for the storage and exchange of experimental data on compound synthesis and biological testing. Making the data machine readable by default will greatly benefit AI implementation in drug research.

New ways of building privacy into AI to embrace data sharing: One major concern for pharma companies to share data among competitors is the trust issue with letting external AI use their propriety datasets. However, as new AI techniques emerge, building privacy into AI learning is becoming a reality. One approach, federated learning, achieves this by only sending limited aggregated updates to global models. Another approach is virtual research environments that rely on anonymisation techniques (such as homomorphic encryption) to enable privacy aware collaborative analytics.

Embracing a data sharing mind set: Being a heavily regulated and competitive sector, pharma’s obvious aversion from data sharing is understandable. However, with increasing pricing pressures and cost cutting measures, it makes more sense for companies to collaborate and actively share data for the collective benefit to shorten the product development lifecycle and to reduce the time to bring a new drug to the market. As such, more data sharing platforms should be encouraged and mutually beneficial data sharing agreements may be established using trusted frameworks among industry, research and technology partners.

AI is the future of drug discovery and pharma groups should collaborate to harness the power of large datasets to help solve global health challenges. 

by Krishna Vallurupalli Consulting, Pharmaceutical and life sciences

Email +44 (0)7809 777483