Dec 16, 2021

AIMCo Infrastructure Team Experiments with Open-Source Solutions

Staying on top of emerging trends or risks has never been easy for investors, especially as the ever-growing amount of available information we need to manage continues to increase. This has made the task for investment managers more extensive and more difficult. Although the issue of managing this explosion of data is relevant to anyone who deals with a wide variety of information from various sources, efficiently cutting through the avalanche of data has become an imperative for many private asset class investors.

The ability to efficiently identify risks and opportunities relatively early as well as being able to react quickly represents a key competitive advantage to add value for investors. Moreover, due to the infrequency of externally visible pricing and lack of direct public disclosure for the types of assets that many private managers invest in, finding innovative ways to extract relevant information has become a significant area of focus. Many investors rely heavily on qualitative sources such as media sources, subscription-based services, annual reports, industry reports and other subjective resources. However, given the diversity of sources across multiple sectors and geographies, the challenge has been to efficiently filter and source quality information for making multimillion-dollar decisions.

Advancements in artificial intelligence (AI) specifically in Natural Language Processing (NLP) and sentiment detection are making it possible to manage this information overload. These advancements allow investors to cast a wider net across a broader range of text-based information sources to stay better informed of developing trends. These NLP-based tools not only allow more efficient coverage of various sources but also enable the seamless ability to drill down into details to better assess the underlying context of each situation. Such NLP capabilities provide the ability to dig deeper by helping to flag risks and opportunities that might otherwise be made less visible given the potential impact of cognitive biases.

In recent years, the models for NLP that have been developed can effectively identify context and meaning of each word in a particular document. These NLP models can achieve state of the art results through extensive pretraining in which words are removed at random from a large volume of documents available from public datasets such as Wikipedia. Training of these models involves using deep learning to predict the words that were removed from the training examples and then using the results to improve the prediction abilities with each attempt. Through many iterations and across large volumes of documents, these models are gradually trained to extract the effective semantics of word combinations to ultimately be able to apply that training to more generalized cases. When these trained models are subsequently applied to a document or other written text not seen previously, they are able to extract high quality language features across a wide range of NLP tasks.


Several powerful NLP models that have been developed by Google and others active in AI research, are available on an open-source basis. In other words, these models are accessible for free to anyone with the requisite knowledge and interest. Although these NLP models are readily available, right now it is a significant challenge for people to properly understand the possible use cases across the many disciplines they could be applied. This is especially true in disciplines such as investing and finance which involve domain specific aspects which are considerably different than the original applications of the researchers who developed these tools. Much of the information available to the investment community is in the form of unstructured language and as such investors may currently only see the tip of the iceberg in terms of NLP use-cases. Presumably those who can bridge an understanding of the technical aspects of these tools with the domain specific issues within investing will be best positioned in creating a differentiated NLP approach to add value.

The practical applications of NLP for AIMCo include the identification of emergent trends within environmental, social and governance (ESG), energy transition, digital infrastructure, in addition to other areas. Group’s like AIMCo’s infrastructure team determined that they should take some initial steps to explore how NLP tools could extend the ability to extract information relevant to our decision making in investments. Using nothing more than some online research and self-taught Python skills, they completed a working example. The team used a couple open-source NLP models to assess the practicality of summarizing content from web-based news articles providing sentiment detection/analysis and flagging content which might suggest further investigation by the team as an opportunity or risk.

To date, the infrastructure team’s limited proof of concept appears to validate what can be done with NLP tools even without finetuning or enhancing current open-source models. The team is currently exploring both internal solutions and external providers to provide operational versions of such tools which would provide the capability to better identify trends and find further insights necessary to make better investment decisions. By experimenting with current open-source NLP models, momentum in the AI and NLP space is building and creating options that AIMCo is excited about for the future. Such options include assessing the combination of these qualitative parameters with quantitative analysis as well as using the results in cluster analysis to identify deep correlations and to better define specific value factors.