Automating Regulatory Compliance: Secure AI for Toxicological Document Profiling | Case Study
AI AutomationNLPPharma
3 min

Share this Case Study

Automating Regulatory Compliance: Secure AI for Toxicological Document Profiling

Executive Summary:

A leading pharmaceutical manufacturer faced significant inefficiencies in manually reviewing and classifying research documents for toxicological compliance. InXiteOut developed a secure, on-premise AI solution that leverages fine-tuned language models and domain ontologies to automatically identify and categorize documents into 40+ toxicological profiles. The platform reduced manual effort by over 80%, achieved high predictive accuracy, and eliminated bias through standardized classification.

 


Client Context

A leading pharmaceutical manufacturer must regularly review hundreds of internal research documents to prepare toxicological reports for 25+ substances for regulatory compliance. This required scientific experts to manually identify relevant documents and categorize them into more than 40 different toxicological profiles corresponding to potential anatomical and ecological impacts.

The Challenge

Because of its highly manual nature, the review process was incredibly resource-heavy, consuming over 500 man-hours every month. Furthermore, the subjective nature of manual classification left the process susceptible to human bias, as experts categorized documents based on their individual experiential knowledge.

The client needed to automate and standardize this review process. However, due to strict data confidentiality requirements, routing proprietary research through commercial LLM APIs was strictly prohibited.


The InXiteOut Approach

We developed a secure, fully enclosed AI pipeline for toxicological profiling that operates entirely within the client's infrastructure. The solution was built around three core pillars:

Data Ingestion and Pre-Processing with Bio-Ontologies

To prepare the highly technical content for machine learning workflows, we integrated chemical and biological ontologies from reliable public APIs and databases (such as PubChem, BioPortal, and AberOWL). This allowed the system to automatically clean and disambiguate complex scientific terminologies and lexicons across the document corpus, perfectly preparing them for the ML pipeline.

Automating Regulatory Compliance: Secure AI for Toxicological Document Profiling | Case Study

Secure Custom AI and SLM Deployment

To meet strict data privacy rules, we bypassed commercial APIs entirely and fine-tuned an open-source LLM using the PEFT-LoRA technique directly on the client's database. To optimize inference-time GPU infrastructure costs, we utilized model distillation techniques to build task-specific Small Language Models (SLMs) for classification and metadata extraction tasks. Finally, we ensembled the SLM classification predictions with a weighted zero-shot prediction from pooled embeddings of titles, keywords, and meta-tags. Several competing models were integrated to allow for metric-based automated model choice.

Interactive Dashboard and Active Learning

All model predictions are securely stored in the client's database and routed to a custom web application. This provides stakeholders with a monitoring dashboard to track model performance and usage. We also built an in-app feedback loop that allows scientific experts to easily review AI predictions for each document and provide corrections. This drives continuous Active Learning, allowing the system to achieve greater accuracy and robustness over time.

Technology Stack

  • Models: Task-specific Small Language Models (SLMs), Fine-tuned open-source LLMs (LLaMa2-13B, LLaMa2-34B, and Mixtral 8x7B), and Zero-shot prediction models from HuggingFace.
  • Architecture: NVIDIA A100 clusters, PEFT (LoRA fine-tuning), MS MiniLLM (model distillation), NVIDIA Triton Inference Server.


Benefits Delivered

The secure AI solution transformed the regulatory review workflow, delivering massive efficiency gains and high clinical accuracy across 25+ substances:

  • ~80%+ Reduction in Manual Effort: Completely automated the initial document review, saving the client over 400 expert man-hours every single month.
  • High Predictive Accuracy: Achieved an F1-score of ~90% (with a ~95% average recall) for document relevance prediction, and an F1-score of ~85% for categorizing the 40+ toxicological profiles.
  • Optimized Infrastructure Budget: The custom SLM approach delivered ~10% better accuracy than baseline open-source models while being ~40% lighter on the infrastructure budget.
  • Standardized Compliance: Replaced subjective human reviews with standardized AI-powered identification, significantly reducing bias and ensuring highly consistent regulatory reporting.

Suggested Reads

Reach out to know how we can help your business with tailored AI and data analytics solutions

By submitting this form, you agree to your data being stored and
processed by InXiteOut in accordance with our privacy policy.