Leverage NLP Techniques for Risk Classification of Legal Documents – TCG DIGITAL

Leverage NLP Techniques for Risk Classification of Legal Documents

Objective:

Leverage NLP based techniques for risk-based contract classification for a large number of contracts with a quick turnaround

Key Challenges:

100,000+ legal documents to be evaluated
Lack of a standard format
Foreign language usage in some of the contracts

Approach:

Pre-processing engine for white-space removal, punctuation-removal, stop-words removal, etc.
Term document matrix creation
Text Classification and NLP Algorithms were leveraged to build the foundational ontology using feature extraction
Use Expectation Maximization Algorithms to transfer the classification knowledge across languages, by translating the model features
Use the extracted feature set, in conjunction with business rules to flag contracts into three risk categories – high, medium, and low
Validate results against test set, and have incorporate feedback loop to continuously improve the model accuracy

Benefits:

Identified fraudulent contracts with a precision of 98%
Automated engine to efficiently parse 10000+ contracts per day