News & Insights

2 March 2025 3 min read

The growing need for AI-ready corporate ESG reports

The recent and rapid development of Large Language Models (LLMs) has generated excitement over the various ways they can be applied to speed up current workflows. In the world of corporate reporting, the question is whether they could be used to streamline the creation as well as the analysis of reports. As a consequence, the advent of these models is likely to encourage a fundamental change in how companies report their information.

The advantage of using AI in evaluating company ESG performance is undeniable. The process is costly in terms of time and resources. Annual Reports are long, averaging 248 pages. Information is spread across multiple documents, the information is sometimes quite technical and thus hard to read, and comparative analysis (between companies or year on year) increases the time needed to sift through the information.

Investors have been using ESG ratings agencies to bypass time-consuming work, often relying on just an output number and trusting somewhat opaque methodologies. These agencies themselves have been using LLMs and other automated systems to speed up the analysis.

Who’s reading ESG reports?

LLMs are a game changer, allowing investors more direct control over the way they analyse company documents. One issue remains – reports are still largely made to be read by humans, not machines. The difference is non-trivial.

The current best practice in terms of transparency is to publish reports and policies as PDFs on company websites. This allows the report to be downloaded and sent as an attachment via email. Crucially, providing a PDF version of the Annual Report allows for version control. These documents are clearly dated, and amendments have to be published as a new PDF with a new date.

However, PDFs present their own challenges to automated analysis. Before this can be run through an AI model of any kind, the text has to be extracted, sometimes using Optical Character Recognition (OCR) for difficult documents. Text extraction from PDFs is rarely perfect, and some of the content can be lost, mostly because of the formatting of “glossy” reports.

Human vs. machine

Examples of things that are designed to be read by humans and are complicated for machines to read:

  • Graphs and images
  • Tables
  • Columns or uneven text layouts
  • Special characters
  • Fancy fonts

Below are some examples of “glossy” sections and the extracted text, which illustrate some of the issues that are common. Here is a piece of financial information:

7a5755 Cd259b5508274a2c9b115aa1978f9619~mv2

Here is the resulting extraction:

7a5755 Fa985761becf4e3395915d2fba247e42~mv2

Here is a table:

7a5755 0e42fa350a5548929c6694a474b6b3c6~mv2

Here is the resulting extraction: 

7a5755 87399b6888534e50963aaf81f325cc10~mv2

As a result, there’s an increasing call for more toned-down formats that can be parsed through automation more easily. For example, SEC filings, such as Form 10-Ks, are required in the US, and XBRL (eXtensible Business Reporting Language) documents are on the rise and in the pipeline for the EU’s CSRD requirements. These formats are very unappealing to the human eye but ideal for finding and extracting data from the reports.

Future-proofing formats

As LLMs are increasingly likely to be doing a “first pass” analysis on corporate reports, we can assume that the need for a “machine-ready” format will increase. Companies may choose to simply make their reports less stylized or publish two different versions of the same documents, just like 10-Ks and glossies, PDF and XBRL formats. This is worth considering for companies to ensure that their information is being evaluated to its full extent. Companies that continue relying on highly stylized PDF glossy reports will have to rely on text extraction and accept the risk of information being lost.

Insig Ai Logo Horiz
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.