In June 2024, the FDA extended a Cooperative Research and Development Agreement (CRADA) to assess data quality in multicenter clinical trials using statistical modeling and ML.
François Torche, Chief Product & Technology Officer & Co-Founder at CluePoints dispels the myth that, at least in RBQM, there is such a thing as too much data, and discusses how advanced data analytics platforms are the key to ensuring every data point has a chance to share its story.
Driven risk-based quality management (RBQM) is a powerful, FDA-backed way to increase efficiency while protecting patient safety and data integrity during clinical trials. It enables drug developers to identify, investigate and, if necessary, rectify issues before they have an opportunity to threaten study success. It is an approach that thrives on data, meaning it only stands to become more effective as the power of artificial intelligence (AI) and the volume of information being collected both continue to increase.
Protecting Participant Safety and Data Integrity
RBQM uses data analytics to spot trends and outliers in clinical trial data, enabling teams to identify quality issues, investigate risk signals and rapidly take the action necessary to protect participant safety and data integrity. This process can be further enhanced using new technologies. For example, risk signals, which monitor and track resulting investigations and actions once a potential issue has been identified, often contain reams of free text, some of which may be incomplete or unreliable. This can make it difficult to identify those which are likely to represent a real issue for participant safety or data integrity.
By using a natural language processing algorithm, supported by a deep learning (DL) system, we can interrogate signal comments, mitigation rationales, and root cause rationales. The algorithm then flags those which lack required documentation or which have an unreliable root cause selected by the user. This helps sponsors determine if risk signals flagged by central monitoring are sufficiently documented ahead of regulatory submission. It also helps to build a database of documented issues, enabling prediction of which risk signals are likely to represent a real issue.
Deployment of a Root Cause Decision
Support feature which used natural language processing and DL, led to an overall reduction in unclear risk signals of around 25%. This increases the chances of meeting regulatory requirements and enables further optimization of issue detection through machine learning (ML).
The Regulatory Landscape
There is now extensive regulatory grounding for RBQM. In 2023, the FDA said sponsors should implement a system to manage risks to participants and data integrity throughout all stages of the clinical investigation. In June 2024, the FDA extended a Cooperative Research and Development Agreement (CRADA) to assess data quality in multicenter clinical trials using statistical modeling and ML. The CRADA will also explore how monitoring platforms can be adapted to better support FDA processes related to anomaly detection, review, and follow-up, as well as site selection for inspections.
ICH E6(R3) ties together the concepts of quality-by-design (QBD) and RBQM. It highlights the need to identify the factors which are critical to ensuring trial quality and the risks which threaten the integrity of those factors.
Using AI to Power the Data Analytics Capabilities of RBQM
Artificial intelligence (AI) techniques like ML and DL are increasingly powering the data analytics capabilities of RBQM. We have already discussed how DL techniques can improve risk signal documentation and enable prediction of serious issues. However, this is just one way in which AI is being integrated with quality management.
In data anomaly detection, ML can significantly reduce the amount of noise being reviewed. There will always be a place for manual edit checks, particularly on critical data. However, edit checks, even ones which appear to have been written correctly and have been tested, often fire erroneously. This leaves data managers reviewing hundreds of pages of noise. Intelligent query detection using unsupervised ML allows us to take noise out of the equation and examine which edit checks need to be written. This allows clinical knowledge to grow in the algorithm, data managers to focus on critical issues, and easier identification of safety issues.
In medical coding, a DL model can remove the need for first-line medical coders and labor-intensive synonym maintenance. Instead, the model automatically suggests MedDRA and WHODrug dictionary codes for medical and drug terms reported either as part of a patient’s adverse events or concomitant medications. This model can provide researchers with the correct corresponding dictionary term with more than 90% accuracy in seconds.
A DL module can also be used to detect duplicate patients who might have enrolled at multiple sites. By using a statistical-based comparison to determine the relative likelihood that any given pair of patients are the same person, it can present a prioritized listing of patient pairs for review and confirmation. This can help to improve patient safety and data integrity by automatically detecting duplicate patients.
No Such Thing as Too Much Data
As clinical trial data volume continues to increase the industry needs to adapt to ensure we continue to protect patient safety and data integrity while also optimizing our chances of gaining invaluable new insights. Because RBQM thrives on information, there is no such thing as “too much data.” It can utilize vast datasets to protect patient safety and data integrity. Advanced RBQM analytics software and platforms can take the increasing volume of clinical trial data from disparate sources and turn it into invaluable insights that guide informed decision making.