The need to digitalize documents is overwhelming. We do have situations of different invoice (or other documents) formats that makes it extremely laborious to digitalize them. Digitalization is no longer an option for companies today. It is the only way to have control over vast amounts of data. The invoices often are in pdf format or scanned paper format that can take humongous time and effort to input to a digital format for future use or for authorities or audit. We have developed InvoiceReader to address this by transforming unstructured data into a structured digital format. InvoiceReader can extract specific fields from different types of documents and group them into the right fields in the digital format - thereby freeing up precious time for data handling teams and significantly increasing their productivity.

How it works

The InvoiceReader scans through the entire documentation and “understands“ the entire document

The second engine of the AI model converts the scanned documents to digital form

The digital data is then passed through an embedding layer and later passed through graph neural networks

The graph neural networks understand the data passed and extract out specific fields from the documents. The data is now digitalized for current and future use


The bill/ invoice is uploaded, preprocessed and then fed into the AI model

The AI model then uses OCR (Optical Character Recognition) to extract out the details

A combination of Graphical Neural Networks and BERT (Bidirectional Encoder Representation from Transformers) is used to extract out specific details from the input data

A probability score is added to the outputs in case of similar looking data (example from, to addresses) to ensure human check

The model can be easily trained to digitalize other documents as well


The AI model can be used to accelerate/ augment digitalization of many routine tasks like manual data capture from different documents - pdf, word, excel, scanned document or even printed documents. Examples of this include digitalization of:

Inter/ Intra company invoices with all relevant fields populated automatically within an organization

Different formats of Invoices from different channel partners (suppliers, distributors) for claims & payments

Documents such as various mandatory licenses as governed by law for company and for distributors

Any historical documents that currently may be available in non-digital formats

Regulatory Licenses that are often in printed or pdf formats

Client quotations to create an accurate data-base of all submitted quotations and revisions thereof

Investment Portfolio digitalization from pdf files

Patient reports including demographics that are generally in pdf or word format

The applications of Invoice Reader are immense and really help you improve accuracy by minimizing human errors during manual digitalization. The model needs to be trained in various types of documents to be deployed for digitalization. This is easily achieved and is undertaken by MCG.