Use cases show benefits of unlocking data to improve signal and adverse event detection.
Advances in technology and analytics are bringing huge promise to all aspects of drug discovery and development, and drug safety is no exception. Bringing new artificial intelligence (AI) and machine learning (ML) technologies to bear on drug safety at all stages offers to improve operational efficiencies and free up resources to perform value-add safety activities, such as benefits-risk evaluation and real-world evidence analysis.
At last year’s World Drug Safety Congress in Amsterdam, there were many industry leaders discussing technologies, and it was clear that there is widespread acceptance of the need for AI and ML to help automate safety processes. This ranged from early discovery safety, or much more downstream, such as semi-automated creation of individual or aggregate safety reports. The general view of the meeting seemed to be that AI/ML is needed, that there is still plenty of hype, but that there are real significant advances being made.
Natural language processing (NLP) is an AI/ML technology that is well-established in many areas of pharma discovery and development and is now being used more and more in safety. The core value of NLP is its ability to handle unstructured text, as it has been widely estimated that about 80% of scientific, clinical, or medical data is unstructured. In particular, NLP can bring efficiencies in three key areas for safety lifecycles: preclinical target liability and toxicology; safety assessment, signal evaluation and medical review; and safety case processing for clinical and post-market pharmacovigilance.
For the first two areas, preclinical drug safety, and safety assessment, access to the landscape of public and internal safety intelligence is hugely valuable. While there are many readily available sources of information that could help pharmaceutical companies improve drug safety, key safety-related data is often buried in these documents as unstructured text, creating significant challenges in unlocking important insights.
Numerous insights can be found from sources such as scientific papers, drug labels, and regulatory reviews, or internal toxicology or safety study reports. These enable pharmaceutical companies to develop a deeper understanding of the safety landscape around any drug, adverse event (AE), or drug target. When drug companies have effective access to the right information, they can enhance safety assessment and risk management. Most pharmaceutical companies continue to use traditional evaluation methods, manually mining literature and other text-based databases for content and context around targets, drugs, adverse reactions, and potential safety signals.
Unfortunately, manual search and review of safety literature is often slow and repetitive, with abstracts requiring minutes to review and full articles sometimes taking hours. Much of this activity involves manual collation and extraction of data from one format to another, which may lead to high costs and issues with data quality due to human error. These activities do not further pharmaceutical organizations’ business objective of bringing safe products to market quickly and maintaining them on the market.
Similarly, for safety case processing, this is often an intensely manual task. Potential adverse events are reported to pharma organizations during clinical trials and throughout the post-market lifecycle of a drug. These can be reported via adverse event forms, website input, emails, call center feeds, and can include both structured and unstructured text that has to be manually reviewed to find the key information (drug, patient, adverse event, reporter) with additional context to determine causality, severity, and more. Additionally, key information needs to be coded for database entry, using standards such as MedDRA for the adverse event. For post-market surveillance, a large pharma organization may need to process thousands of cases per day or per week, covering portfolios of 100s of products across many countries.
Time and cost savings through automation
To address these problems, many pharma companies have turned to NLP within their safety processes. NLP can extract and standardize safety data for downstream use, such as curation within a safety database; analysis and visualizations for deeper understanding of safety mechanism; ingestion into data warehouses and data pipelines; or use within statistical or predictive models. As it relates to drug safety, NLP is particularly valuable because it allows automated recognition and coding of AEs and other relevant data (drug, severity, mechanism of action) in free text even when the term is not exact, significantly reducing effort and improving quality.
Use case: NLP text mining to identify safety events during a clinical trial
A top 50 biopharmaceutical company used NLP for many applications, bench to bedside, and decided to apply NLP to safety case processing during a clinical trial for a novel first-in-class cancer therapeutic. The company wanted to improve support for clinical safety teams in the identification and understanding of serious adverse events (SAEs) in the clinical trial participants.
During clinical trials, investigators use SAE forms to report any potential AEs. Each form contains a high volume of valuable data, including: adverse event information; date of onset; MedDRA terms describing the event; lab tests; concomitant medications; and medical history. In traditional data search and extraction processes, these forms are either scanned image files or saved in pdf format, with much of the critical data as unstructured text. As a result, significant manual effort is needed to extract key data elements from these forms for clinical safety teams to assess.
The clinical team wanted to reveal insights to better understand the AE landscape across the patient population by efficiently mining the available safety information. In particular, they wanted to understand the potential to characterize groups of patients who might need a closer follow-up or require special attention.
To accomplish this, the team needed to extract and analyze information effectively from the clinical SAE documents. Rapid identification of AEs occurring in patients or patient groups is important, and discrimination of whether the AE was anticipated or not, and serious or not, is critical. For example, the team needed to understand whether an AE was caused by the study drug, by the disease, or by other prescribed medications. To make that decision, they needed to see all available information, such as incidence of an AE across patients. They also wanted to see which AEs were in progress at a certain time point, and how they changed over time. These insights are extremely useful and can even prevent life-threatening events if timely and close follow-up is needed.
To address these challenges, the team developed a workflow to process the SAE forms, and used NLP to extract all relevant patient data for internal clinical use. The technology identifies negations, synonyms for diseases and medications that appear in the medical history section, dates in different formats, and measurement units such as drug doses. At all times, steps were taken to protect patient privacy. The NLP technology was deployed to the customer and the work completed by their organization with no patient data being exposed to outside teams.
After implementation, the technology improved turnaround times and provided the clinical team with rapid access to critical patient alerts. Extracting information from a five-page SAE form previously required a manual effort that could take several hours to complete, but NLP provided results to the clinical team within an hour. By using NLP, the team was able to find and structure the critical AE data, visualizing the data in network graphs to highlight clusters of patients at risk, ensuring better patient safety.
Use case: NLP to enable predictive safety
At the other end of the drug lifecycle, NLP is used in early discovery for predictive safety. In one example, a top 10 pharma company wanted to improve access to safety intelligence for their discovery scientists. Specifically, they wanted to mine scientific and clinical literature for data on target-safety links related to a broad range of diseases. The challenge is, assessing target safety requires many data sources and data types, including tissue expression, assay data, phenotype data, as well as associations with proteins, drugs, and diseases. Target safety leads may spend weeks researching potential safety issues around a new target, and compound safety profile teams will search for adverse event mentions that relate to particular preclinical species, ideally in known tissues or organs. Manual search and review are laborious, and hard to keep up-to-date.
The company used NLP to develop a workflow that ran a set of queries automatically, feeding the output on a weekly basis to their internal knowledge base. A final workflow transformed the data into a suite of easy-to-use visualizations, including a “toxico-matrix” of the target-safety literature landscape. This enabled users to access the data by target, preclinical or clinical, organ class, with links to underlying evidence. The result was a significant increase in the quality of NLP-driven safety analysis compared with standard keywords searches, with good precision and recall. This more systematic approach to risk prediction provided safety leads and researchers with a single comprehensive overview of potential liabilities for targets and chemicals, within the therapeutic area of interest.
These use cases represent just two of the many pharmaceutical companies that have implemented NLP technology to mine unstructured text to better understand safety signals and improve adverse event detection. By helping unlock the key data within unstructured text from safety reports and documents, NLP enables pharma companies to accelerate, improve, and reduce costs from drug safety processes.