Skip to content
Leverage NLP techniques for risk classification of legal documents (For Legal Documents)
Objective: Leverage NLP based techniques for risk-based contract classification for a large number of contracts with a quick turnaround

Key Challenges:

  • 100,000+ legal documents to be evaluated
  • Lack of a standard format
  • Foreign language usage in some of the contracts

Approach:

  • Pre-processing engine for white-space removal, punctuation-removal, stop-words removal, etc.
  • Term document matrix creation
  • Text Classification and NLP Algorithms were leveraged to build the foundational ontology using feature extraction
  • Use Expectation Maximization Algorithms to transfer the classification knowledge across languages, by translating the model features
  • Use the extracted feature set, in conjunction with business rules to flag contracts into three risk categories – high, medium, and low
  • Validate results against test set, and have incorporate feedback loop to continuously improve the model accuracy

Benefits:

  • Identified fraudulent contracts with a precision of 98%
  • Automated engine to efficiently parse 10000+ contracts per day