THE PROBLEM
Client, a Fintech software provider was developing an AI-enabled Employee Expense Management Tool to automate expense processing for its customers. To achieve the above-mentioned objective, client required the creation of a generic AI module to extract a pre-defined set of relevant information from invoices, bills, and receipts raised by different vendors in image / pdf format.
INXITE OUT APPROACH
Data Annotation & Knowledge Repository Creation
Documents (invoices, bills, receipts) in different formats (pdf / image) across various organizations were analyzed in order to create a continually evolving Knowledge Repository. Additionally, annotation of training data for various computer vision models to be trained were undertaken.
Text Extraction from Images using Computer Vision
Custom object detection model was applied in conjunction with open-source image to text libraries to extract different clusters of textual information from image files. The resulting text extracted from different clusters were aggregated for information extraction using NLP models.
NLP Based Information Extraction from Text
Context-aware regex rules (developed based on the Knowledge Repository) are being applied to extract relevant information from text-based files / text clusters of image files.
RESULT
Solution was adopted by the client and integrated in their product.