Skip to content
Text mining to recommend protocol designs for a multinational CRO (For a Multinational CRO)
Objective: How to leverage information from previously executed clinical trials to optimally design a new trial.

Key Challenges:

  • Lack of format standardization in protocol documents
  • Identifying amendment entries to the same study
  • Presence of non-standard tables
  • Presence of scanned objects/images in the protocol document

Primary Fields of Interest:

  • Trial Disease Area
    • Therapeutic Area (Oncology, Cardio, etc.)
    • Clinical Indication (NHL, NSCLC, etc.)
    • Trial Phase
    • Therapy (Monotherapy, Adjuvant, etc.)
    • Disease Phase (Metastatic, Locally Advanced, etc.)
  • Study Design
    • Treatment Duration
    • Key Screening Procedures
    • Target Population (Inclusion/Exclusion Criterion)
    • Dosage/Active Agent
    • Route of administration
    • Complex Design Features (Adaptive, Cross Over, etc.)
  • Protocol Definitions
    • Patient Randomization Target
    • Site Activation Target
    • Study Rationale
    • Study Type (Interventional, Observational)
    • Definition of Adverse Events
    • Assessment of Compliance/Adherence


  • Feature extraction from a repository of protocol documents to identify key features and tags
  • Parse through the trial document for topic classification
  • Apply NLP techniques to obtain an intelligent library of feature sets from a past trial repository
  • Superimpose operational and performance data to identify drivers and provide study recommendations