European machine readable company filings, a data scientist’s dream
If you ask any data scientist what they spend most of their time doing, they won’t tell you that it is solving the world’s hardest problems, or fine tuning an incredible new life changing algorithm, or even predicting the next bitcoin bounce. No. What they would say is, collecting and cleaning data. Period. Once they have done this, they still have to transform it into a format that is machine readable.
The reality is that these seemingly mundane tasks are 80% of their day job. In the asset management industry, there is a constant drive to find a trading edge or to find a new signal in a dataset that informs an investment decision.
You only need to look at firms like Renaissance, TwoSigma and Citadel to find evidence of advanced data science driven strategies. They have invested millions of dollars into curating their own datasets and building teams to find value in their data. Anyone new to the game, however, has to start from scratch.
This is why getting access to transformed, structured, labelled/tagged and machine readable data is so valuable.
Initial success
In late 2020, we were developing a new machine learning model for an asset management client who was looking to identify the strategic direction of a company based on earnings call transcripts and annual reports. We started with the S&P 500 and very quickly were able to get machine readable transcripts from Capital IQ along with companies’ annual reports from the SEC Edgar database. The data was well formatted, tagged, easy to download and there was a lot of history – a data scientist’s dream! We used a combination of NLP to analyse what was being said in the text, and predictive machine learning algorithms to analyse the company financials. This approach resulted in an ecstatic client and an expanded mandate.
The challenge
Filled with confidence (perhaps even a hint of arrogance) following our recent success, we set out to repeat the exercise on European companies. The earnings call transcripts were again, readily available and these were ingested and processed quickly. Next, we set out to add the European annual reports and filings, expecting to find the same clean dataset, beautifully structured as was held in the Edgar database. Instead, we found ourselves in a data-science nightmare. European companies do not have to report in a uniform, consumable standard, nor is there a central database to access files. They are free to do whatever they want and can publish the content anywhere, whether this be the investor page on their website, to LinkedIn.
Put simply, there is no consistency of format, or filing. Given we had promised our client a quick turnaround (based on our first experience), we had a problem.
The solution
In a scramble to find a reliable source of data that would help us avoid having to find and download thousands of PDF documents that we would have to manually clean and tag, we stumbled across an article by a small company called FDB Filings. They had apparently come across the same problem and built a business around it!
- Comprehensive coverage of European annual, interim & press release reports
- PDF documents converted into a machine readable JSON format
- Link back to every original source document
- Point-in-time entity mapping
- Clear and well-defined tags on revenue guidance, executive speaker, business risks & more
- 10 years of filing history
- Data accessible via an AWS S3 bucket
We set up a trial to test the data, and to our collective delight, the quality of their technology, data structuring and team was outstanding! We were able to ingest and score 10+ years of annual reports for 1,000 European companies, almost immediately and deliver back the results our client was expecting. The team at FDB had set out to solve a very hard data structuring problem and succeeded so much so that In November 2021 Insig AI acquired FDB.
What we learned
Starting from scratch doesn’t necessarily require billions. This process demonstrates that a combination of thoughtful data structuring, good timing, and robust infrastructure, is fertile ground to create an AI-powered analytical platform, that delivers unbiased, uniform, value-add insights to an entire sector, across different markets, in an accessible, and easily digestible format. Financial Institutions are entering an entirely new paradigm, one where stakeholders and shareholders alike are squarely focused on sustainability and accountability. This is bringing a significant layer of complexity and cost to a financial institution. Technology and data companies are emerging with cost effective and transparent data scientific propositions that enhance the due diligence and risk management capabilities of an institution, while informing and improving investment decisions. For the heads of risk and sustainability at financial institutions, the choice of data provider is yours, but the time for acting, is now!
