Gal Petkovšek, Iztok Bajcar, Bojan Miličić2 April 2024

Exploring a new era of data processing with tailored artificial intelligence solutions

Large Language Models (LLMs) are evolving at an unprecedented pace and understand our world better than ever before. Every day we generate a huge amount of new data and use it for machine learning, yet there is still some knowledge that is only accessible to the human eye. A lot of the key information that LLMs don't have access to is hidden in internal company documentation, contracts, memos, etc. Today we are introducing the technologies that are changing this and opening the way to creating personalised LLMs that are able to understand and process your company's specific data.

Photo by Unsplash

Introduction to Optical Character Recognition

Optical Character Recognition is a technology that enables the conversion of scanned paper documents, PDF files or images into organised and searchable digital text. OCR has traditionally been used to digitise documents, making it easier to access information and automating manual data entry.

The current capabilities of OCR technology range from handwriting recognition to processing visually complex documents. As OCR has progressed over the years, its applicability has therefore expanded into a variety of industrial and business environments.

Introduction to Large Language Models (LLM)

Large language models are advanced artificial intelligence systems that have been trained on huge amounts of textual data. They are revolutionising Natural Language Processing (NLP) by being able to understand, interpret and even generate text that is very similar to what a human would write. Examples of LLMs, such as the GPT (Generative Pre-trained Transformer), demonstrate the ability to perform complex language tasks such as writing articles, assisting with learning or answering questions.

What happens when we combine OCR and LLM?

When the two technologies are combined, OCR and LLM complement each other, allowing AI assistants to not only "see" and "understand" text in documents and images, but also to analyse and process it. In practice, this means that by converting text from images into digital form with OCR and understanding and processing it with LLM, we get much deeper insights into our data that we would probably not have discovered on our own.

Deeper understanding with RAG technology

Large language models are limited to the information on which they have been trained, which means that company-specific documentation is out of their reach. This is why the innovative Retrieval-Augumented Generation (RAG) solution combines the power of large language models with specific information, usually in the form of documents. This allows companies to create their own AI agent that is trained on a specific database. RAG proves particularly useful in question-answering applications, as it can provide trivial and accurate answers by quickly searching through documents.

In the production phase, documents are processed in such a way that they are broken down into chunks of text and become part of the database. When an AI agent is asked a question, it looks for the pieces in the database that are most relevant to answering the question. One important advantage of this method is that the system knows where the information was taken from, which means it can link us to the relevant part of the document, giving the answer extra credibility.

The described system works exclusively on texts, which also reveals a major drawback, since in many cases documents contain images, graphics, tables, which also contain important information or are even crucial for the understanding of the documents. This is where Optical Character Recognition (OCR) comes in, which ensures that our LLM can also search for information hidden in images. The synergy between OCR and RAG technologies offers new dimensions in understanding and processing complex data, allowing AI to better adapt and respond to user needs.

A sea of opportunities for different sectors

Understanding and integrating these technologies naturally opens up a whole new world of possibilities. As our AI assistants become smarter and more capable of understanding and processing complex text data, they are also bringing exciting innovations in a wide range of sectors, from education, to legal services, to all forms of consultancy and much more. Will your company be one of the early innovators?

Introduction to Optical Character Recognition

Introduction to Large Language Models (LLM)

What happens when we combine OCR and LLM?

Deeper understanding with RAG technology

A sea of opportunities for different sectors

Cookie Settings