#10 What If the Data Is Not Accurate?

Artificial intelligence has the potential to revolutionize clinical trials, from recruitment and design to diagnostics and analysis. Unfortunately, many clinical trials close prematurely due to failure to enroll or retain participants.

In oncology, only 8 percent of adults diagnosed with cancer participate and 24 percent of trials fail to reach trial recruitment goals. Large language model (LLMs) tools such as TrialGPT, developed in collaboration with the NIH, have been used to match patients to clinical trials, with high accuracy and more efficient recruitment. This model and others have built-in human oversight, aligning with the human-centered application principle of the American Society of Clinical Oncology (ASCO) ethical framework of AI use, FAIITH. These tools are increasingly being incorporated into Phase II and Phase III clinical trials.

Reasons for trial recruitment failure are complex and intertwined. The criteria for determining patient recruitment may include unnecessary restrictions limiting participation. In the National Cancer Institute's (NCI) affiliated trials, eligibility criteria are increasing dramatically, and this strongly associates with accrual failure. Data in clinical trial training datasets may be impartial or incomplete, further impacting clinical trial recruitment. Additionally, a gap exists in the recruitment of oncology patients from lower-income, educational groups, or more menial employment groups

When trial participants differ from the broader patient population, it limits how confidently treatment safety and efficacy can be applied to real-world patients and can lead to inconsistencies in care or outcomes. One driver of this gap is the rigid criteria historically used in clinical trials. Often, restrictive criteria are directly borrowed from prior studies without strong justification. Trial Pathfinder is a notable example of a machine learning algorithm that uses electronic health records (EHR) data to tackle overly restrictive trial criteria head-on.

Research using Trial Pathfinder found that several common trial criteria could be relaxed with minimal risk to participants, therefore doubling the number of qualified participants. Machine learning algorithms like Trial Pathfinder should be used alongside large language models (LLMs) for matching patients to trials, since many LLMs have not incorporated impartial criteria testing into their development. Even if impartial criteria are included, though, how can we expect models trained with impartial datasets not to reinforce skewed patterns?

LLMs in medicine are designed by taking a general language model trained on publicly available data, then fine-tuning it and providing additional training on specified datasets. This means there are multiple entry points for skewed data training, and often the training documents for base models are undisclosed by developers. The FDA’s draft guidance on “Considerations for the Use of Artificial Intelligence to Support Regulatory Decision Making for Drug and Biological Products,” addresses concerns about training datasets through its credibility assessment framework. This only applies to tools related to regulatory decision-making. Tools for patient-trial matching and for investigating inclusion and exclusion criteria remain in a regulatory gap. The same credibility assessment framework used for tools related to regulatory decision-making should be applied to tools addressing clinical trial recruitment.

While I believe that federal guidance should address this, the onus of oversight in the meantime falls on institutional review boards (IRB), institutions, and professional organizations. AI use in clinical trials is actively improving the efficiency of patient recruitment with tangible benefits. Alternatively, as new tools are developed and applied, the potential harm from preconceived impartiality in trial criteria and training datasets, as well as widening regulatory gaps, cannot be overlooked. This calls for federal regulation. How do we address the current regulatory gap without overburdening those involved? Is it important and is it possible to develop a national standard for fairness evaluation in patient recruitment?

Author

— by Catherine Diop Chalmers, Research Specialist, Emory University

Editor's Response

— by Karen Lindsley, DNP, RN, CDE, CCRC, Manager, Georgia CTSA Coordinating Center and Regulatory Knowledge & Support program

Thank you to Catherine for her timely and perceptive blog highlighting three critical areas: overly restrictive eligibility criteria, practical strategies to recruit participants who reflect intended users, and skewed training data in AI systems. The following expands on each area with current regulatory actions, evidence‑based strategies, and emerging tools.

Regulatory agencies have begun addressing the gap between trial participants and intended users. In parallel, ICH E6(R3) and E8(R1) marked a strategic shift in Good Clinical Practice. E6(R3) emphasizes participant safety, data integrity, and modeling expected enrollment proportional to intended user groups, while E8(R1) focuses on quality by trial design. These guidelines are intended to be implemented together and represent a turning point in trial design and conduct.

Several studies demonstrate that targeted, structured approaches can increase participation. The RECRUIT trial (2013–2017) showed that trust‑building and staff training can improve enrollment. The OSPREY study (NCT07290335) is currently evaluating MyChart message framing, comparing value‑based language, payment information, combined messaging, and standard messaging. The UK WRAPPED trial is testing structured planning to support recruitment and retention in behavioral studies. The TIDieR checklist aims to standardize recruitment activity reporting while building a catalog of tested strategies. These efforts highlight that recruitment can be strengthened through intentional communication, planning, and dissemination strategies.

Multiple analytic platforms help evaluate and refine inclusion and exclusion rules. Trial Pathfinder uses real‑world datasets to model the impact of relaxing restrictive criteria. Eligibility Criteria Analytics from Columbia DBMI and OHDSI tests how eligibility rules affect participation across populations. TriNetX Analytics uses OMOP‑based cohort modeling to assess feasibility and reduce unnecessary restrictions. ASCO and Friends of Cancer Research continue to advance broader, disease‑agnostic protocol criteria through the Modernization of Eligibility Criteria initiative. Additionally, sponsors are using real‑world AI benchmarking early in trial launch to compare actual enrollment with expected user groups to adjust strategies as needed.

Digital twin platforms further support feasibility assessments and risk‑based criteria development. Tools such as UnLearn Digital Twins and MDClone allow exploration of original or generated datasets, reducing reliance on historical or arbitrary rules and potentially opening trials to a broader range of participants.

Newer analytic methods are emerging to detect and mitigate skewed data in AI systems. TRAK (Tracing Related‑examples for Adversarial Knowledge) identifies training‑data contributions to model outputs. Dual‑Directional Prompting constrains model reasoning to reduce unwanted patterns. Conditional variational autoencoder frameworks address misrepresentation in EHR datasets. These methods aim to improve the reliability of AI‑driven tools used in trial design and recruitment.

Some researchers argue that not all skewed data should be removed without deeper evaluation. John Thomas Foxworthy of Carnegie Mellon University notes that certain patterns may reflect meaningful real‑world signals rather than errors. Distinguishing between systematic mistakes and legitimate trends is essential for maintaining predictive accuracy and avoiding overcorrection.

The NIH has identified several real‑world obstacles that affect recruitment and retention across many groups, including transportation limitations, rigid work schedules, childcare needs, language barriers, and fear or mistrust (Promoting Factors and Barriers to Participation in Early Phase Clinical Trials: Patients Perspectives). These factors require actionable plans for both large and small sites.

Several design approaches may help address these barriers. Adaptive and real-world study designs, decentralized and hybrid trials, and personalized trial models may reduce participation burdens. Sponsors are encouraged to conduct studies in usual‑care settings, reduce logistical demands, and select outcomes meaningful to all stakeholders. Operational innovations such as home monitoring devices, off‑site lab draws, multimedia or remote visits, off‑hours scheduling, personal navigators, and drone‑based delivery of supplies are being tested and implemented.

Catherine highlighted actionable areas for strengthening recruitment quality. AI‑driven analytics may reduce overly restrictive criteria and improve enrollment of participants who reflect intended users. Newer platforms show promise in producing cleaner, more reliable datasets. Regulatory agencies have recently refocused guidance on trial conduct and participant selection, and meaningful progress is underway. AI development moves so fast, however, that governance must also continually evolve.

Comments