Skip to content
Leverage NLP Techniques for Risk Classification of Legal Documents


Leverage NLP based techniques for risk-based contract classification for a large number of contracts with a quick turnaround


Key Challenges:

  • 100,000+ legal documents to be evaluated
  • Lack of a standard format
  • Foreign language usage in some of the contracts


  • Pre-processing engine for white-space removal, punctuation-removal, stop-words removal, etc.
  • Term document matrix creation
  • Text Classification and NLP Algorithms were leveraged to build the foundational ontology using feature extraction
  • Use Expectation Maximization Algorithms to transfer the classification knowledge across languages, by translating the model features
  • Use the extracted feature set, in conjunction with business rules to flag contracts into three risk categories – high, medium, and low
  • Validate results against test set, and have incorporate feedback loop to continuously improve the model accuracy


  • Identified fraudulent contracts with a precision of 98%
  • Automated engine to efficiently parse 10000+ contracts per day