Information extraction from invoices for expense management tool


Client, a Fintech software provider was developing an AI-enabled Employee Expense Management Tool to automate expense processing for its customers. To achieve the above-mentioned objective, client required the creation of a generic AI module to extract a pre-defined set of relevant information from invoices, bills, and receipts raised by different vendors in image / pdf format.

Information Extraction From Invoices For Expense Management Tool


Data Annotation & Knowledge Repository Creation

Documents (invoices, bills, receipts) in different formats (pdf / image) across various organizations were analyzed in order to create a continually evolving Knowledge Repository. Additionally, annotation of training data for various computer vision models to be trained were undertaken.

Text Extraction from Images using Computer Vision

Custom object detection model was applied in conjunction with open-source image to text libraries to extract different clusters of textual information from image files. The resulting text extracted from different clusters were aggregated for information extraction using NLP models.

NLP Based Information Extraction from Text

Context-aware regex rules (developed based on the Knowledge Repository) are being applied to extract relevant information from text-based files / text clusters of image files.


Solution was adopted by the client and integrated in their product.

Case Studies