Automated document review


Client, a leading pharmaceutical manufacturer, manually reviews RoHS compliance reports on various materials and substances supplied by their vendors. These reports are PDF files prepared by various laboratories. The authenticity and insights from these reports were used to be checked by the client manually, which made the process laborious, time-consuming, and prone to human bias. Hence, the client wanted to leverage AI to implement a solution to automatically extract relevant information from these reports and verify the authenticity and drive insights by validating against pre-defined sets of expectations.

NLP Lab Report BOM RoHS


Data Understanding

Since the lab reports were highly domain-intensive in nature, we collaborated with the experts to understand various sections and their relevant in the PDF reports from various labs, client’s standard BOM details that capture different types of materials and expectations, various industry standards such as RoHS-2 and IEC etc. Business rules and Exceptions to those rules were codified as part of this phase in alignment with the client’s domain experts.

Ontology Curation

Cactus NIH resources were utilized to curate ontologies and synonyms of various materials and chemical substances. The ontologies were standardized for usage in subsequent model flow.

Information Extraction

Advanced NLP methods were used to extract not only relevant texts but also data from relevant tables. Contextual rules and ontologies were used to clean, organize, and structure this information for downstream processing

Final Output Generation

Business rules codified in the first phase were used to process the information to compare against set expectations and drive insights. Finally, a web-based interface was provided for easy user interactions with the tool, and the generated output for any given lab report uploaded by the user was automatically emailed for reference and consumption.


Model achieved 100% coverage for information extracted on unseen validation data, and was considered highly successful by the client for adoption in their business process.

Case Studies