Text Mining to Recommend Protocol Designs for a Multinational CRO
Objective:
How to leverage information from previously executed clinical trials to optimally design a new trial.
Key Challenges:
- Lack of format standardization in protocol documents
- Identifying amendment entries to the same study
- Presence of non-standard tables
- Presence of scanned objects/images in the protocol document
Primary Fields of Interest:
- Trial Disease Area
- Therapeutic Area (Oncology, Cardio, etc.)
- Clinical Indication (NHL, NSCLC, etc.)
- Trial Phase
- Therapy (Monotherapy, Adjuvant, etc.)
- Disease Phase (Metastatic, Locally Advanced, etc.)
- Study Design
- Treatment Duration
- Key Screening Procedures
- Target Population (Inclusion/Exclusion Criterion)
- Dosage/Active Agent
- Route of administration
- Complex Design Features (Adaptive, Cross Over, etc.)
- Protocol Definitions
- Patient Randomization Target
- Site Activation Target
- Study Rationale
- Study Type (Interventional, Observational)
- Definition of Adverse Events
- Assessment of Compliance/Adherence
Approach:
- Feature extraction from a repository of protocol documents to identify key features and tags
- Parse through the trial document for topic classification
- Apply NLP techniques to obtain an intelligent library of feature sets from a past trial repository
- Superimpose operational and performance data to identify drivers and provide study recommendations