Automatic Factor Discovery with AI

2013 Nobel Prize in Economics

In 2013, the Royal Swedish Academy of Sciences selected Eugene Fama, a professor at the University of Chicago, and Robert Schiller, a professor at Yale University, as the winners of the Nobel Prize in Economics for their empirical analysis of asset prices.

In economics, factors or market anomalies refer to the common characteristics that companies share when they outperform the market. Thus, numerous studies are currently being conducted in academia as well as the asset management industry to discover more factors to raise excess profits. However, under the efficient market hypothesis, these factors are considered as exotic beta, not alpha, which denies real excess revenue because of hidden risk factors.

To address this issue, William Sharpe, an American economist who later won the 1990 Nobel Prize in Economic Sciences, proposed the CAPM model in 1964 to calculate investment risk and what return on investment an investor should expect. Ever since the CAPM model was introduced, scholars in finance have discovered size factor, value factor, momentum factor, and quality factor. Whether these factors capture market inefficiencies or risks that have yet to be found through the efficient market hypothesis remains a controversial topic. However, there is no doubt that the study of the asset pricing models is one of the most actively pursued subjects in today’s finance sector.

Factor research has to accurately reflect stock price data. Even if we create a plausible hypothesis about factors driving stock prices, if the results are inconsistent with the actual stock price data, it will be meaningless. That’s why the 2013 Nobel Prize in Economics was selected as an “empirical” analysis of asset prices.

So how do scholars in finance actually study factors? They first backtest all the relevant data available at their disposal. The data that make up the factors are price-related information such as the 12-month returns, 6-month returns, and 3-month returns from the S&P Compustat database. Other information includes market cap, corporate value, ROE, capital assets, goodwill, working capital, and more than 2,000 other fundamental data. Unfortunately, scholars in finance have already studied whether there is a valid factor among these data items and found almost all suitable candidate.

Since finding a factor through a single data item is not possible now, a better way would be to try combining and backtesting multiple datasets. A good example is the value factor, which consists of dividing two data: book value by market value. The only issue with this is that now, the number of cases to be backtested will increase dramatically. Let’s consider a function in the form of [data1] [operators] [data2]. Just like the value factor, even if the number of operators is limited to 10 and data candidates are limited to just 2,000, the total number of cases will still amount to 400 million. If the function becomes more complicated, then that number will easily go to the astronomical scale. This number is impossible to compute even if there exists a SOTA supercomputer. In the end, scholars in finance will have no choice but to repeatedly backtest in the hopes that they will develop theories with their own intuition and experience, and that their theories will conform to past data.

We have experienced the same situation in the game of Go. The number of combinations possible in Go amounts to over trillions. For this reason, no supercomputer has been able to defeat human Go masters in a brute force method.

Qraft’s Factor Factory

At Qraft Technologies, Inc., we’ve been able to develop a deep learning-based reinforcement learning model expressed in a factor tree (see reference below), to find candidates with a high probability of becoming a valid factor. This model is known as Factor Factory.

The figure above represents a function that means (a+b) * c+7, where a, b, and c represent thousands of financial data variables, including ROE, market cap, goodwill, liabilities, stock price return, etc. Inside the operators, there can be dozens of formulas in place, such as addition, subtraction, power, z-normalize, etc. Even if the number of data candidates is limited to 2,000 and operator candidates limited to just 10, a combination of 3 operators and 4 variables can produce over 6,000 trillion cases, making it impossible to search through them all.

In order to leverage the factor tree more effectively, we first need a data platform that:

Can produce backtest results more accurately and quickly when an input (factor formula) is added.
Can determine the validity of the factor (+minimize overfitting)
Can utilize a DRL model to effectively narrow down valid factor candidates from a large number of cases.

At Qraft Technologies, we are taking important steps to make sure that the process of finding excess return factors is easy and smooth. Kirin API, which is developed by Qraft’s data scientists, integrates multiple vendors to provide both macroeconomic and company fundamentals with the correct point-in-time data. This ensures that all backtest data is accurate and used in real-time. As for the second and third criteria mentioned above, Qraft’s Factor Factory can automatically find factors that could bring excess returns and come up with new investment strategies.

Reference
https://bit.ly/3a3Zugj (More details of Qraft Factor Factory)
https://bit.ly/33URaeR (Experiment Report of Qraft Factor Factory)
https://bit.ly/39VskPS (AI Asset Management Process of Qraft)

Factor Factory shows interesting results.

Decreasing Alpha

The above graph shows the distribution of the 1851 alpha factors automatically searched by Factor Factory. Looking at the three time periods, period 1 (1990–1999), period 2 (2000–2009), and period 3 (2010–2019), you’ll notice that the size of the alpha decreases as it draws near to recent times. In addition, as dispersion becomes smaller and sophistication grows larger, finding super alphas is becoming more difficult. This is one of the reasons why large quant hedge funds spend large amounts of resources and time in research.

Robustness of Alpha

The X-axis is the size of alpha generated from the training data and the Y-axis is the size of alpha made from the test data. This data shows if past strategies that had performed well can do just as well in the future and how severe the problem of overfitting can be in the training phase. Since overfitting cannot be completely avoided, you can see that there is more blue distribution in the graph (the red dots mean that the alpha was actually better than the alpha that had appeared in the learning phase). It is clear that the factors that performed well during the training phase also showed good results in the test phase. In other words, there is a high probability that the factors that perform well in Factor Factory can perform well in the future.

The use of these factors to construct a long-short portfolio consistently shows good performance. (Out of sample test after 2015).

Reproducing Papers in Finance

Interestingly, many of the factors previously published in famous research papers have also been rediscovered through Factor Factory. In fact, this is a natural outcome since scholars in finance and Factor Factory both utilize the same data. Compare this to the game of Go, AlphaGo, which is the famous AI Go player, can rediscover the rules of the Go game without any guides or human intervention.

After allowing Factor Factory to learn through the vast space of the search universe (based on Qraft’s GPU server), the following factors, which are known to be historically famous, have been discovered:

Given only financial data, AI was able to rediscover famous factors that scholars in finance have found in the past. (The factors from Factor Factory don’t come out exactly the way factors from academia, but the meanings are similar.) Interestingly, the order of discovery was roughly the same as the order in which the paper was published. Of course, the factors previously discovered by scholars in finance are relatively simple, so they were easier to find. Over time, however, AI was able to locate more complex factors as well.

In the above image, Factor A is a momentum-based factor, which means that stocks with a large value in returns (12-month returns minus recent 6-month returns), will bring excess profits. Interestingly, this goes in line with a famous research paper titled “Is Momentum Really Momentum?” published on the 2012 Journal of Financial Economics by a renowned scholar by the name of R. Novy Marx.

That paper won the Fama-DFA prize in 2012, a prize jointly created by Professor Eugene Fama and an investment firm called Dimensional Fund Advisor. Dimensional Fund Advisor is famous for its excellent research on asset pricing models. While the value of the asset pricing model is reliant on not just the discovery of factors, but also on its implication and interpretation, the fact of the matter is, the idea that AI can contribute to the winning of the economics prize doesn’t seem so far-fetched anymore.

Economic Thesis by AI

Just as people cannot beat AI in the game of Go, it can be challenging for people to compete against AI in the field of factor search. By looking at the complex structure of factors automatically found by Factor Factory, you will see that it is almost impossible for any humans to find similar factors using a top-down approach — testing after setting up a hypothesis. That’s because AI can search through the vast search space and find probable factor candidates much faster than any humans can. That is not to undermine human ability, but to state a fact. Humans, on the other hand, have much better capabilities than AI at reasoning and interpreting factors.

Therefore, it is safe to assume that the future of financial research involving market anomalies will move towards a working collaboration between human scholars in finance and AI technology. While AI can explore and find complex factors, humans can interpret the meaning of the factors and explain their reasonings on research papers. Luckily, the Go game has already changed in this direction with the introduction of AlphaGo.

At Qraft Technologies, Inc., beyond applying factors found by Factor Factory to asset management firms, our team is also working to publish a series of papers in finance to highlight the collaboration between AI and humans. If this project succeeds, Qraft’s Factor Factory, or engineers in charge of Factor Factory, will be published as co-authors of well-known finance journals. Just as AlphaGo was given several awards, perhaps Factor Factory will also leave a legacy of its own.

More on Factor Factory and the Ultimate AI Hedge Fund Model

To maintain consistency in bringing excess returns, the following formula must be satisfied:

Velocity(Finding New Alpha) * Size(New Alpha) >
Velocity(Decay of Exisitng Alpha) * Size(Existing Alpha)

Due to the zero-sum nature of the asset management industry, maintaining a steady excess return is possible only if the speed of discovering new alpha is faster than the speed at which the existing alpha disappears. In other words, excess return strategies will deem useless once people start to figure them out. When markets are rapidly changing at unprecedented speed, famous hedge funds like Renaissance Technologies and Bridgewater Associates, have experienced a -20% drawdown.

The quant crash phenomenon, in which large quant funds suffer huge losses, occurred in August 2007 and during the recent corona outbreak. This type of loss is becoming increasingly frequent, and it is because quant funds find and use similar alpha strategies. This issue can only be resolved by finding complex factors faster that others may not be familiar with and learning new datasets to adapt to the market quickly. Fortunately, AI has clear strengths in this area. In fact, Qraft has outperformed the markets during the corona outbreak.

Qraft’s Factor Factory has the potential to substantially expand its alpha search method. The current version of Factor Factory has a certain degree of freedom such as investment universe, period, and portfolio construction methods, more suited for asset management companies or using for papers in finance . However, by increasing the time and type of data, as well as expanding on the learning model, automatic search for mid-frequency and high-frequency investment strategies can soon become possible. While search space will grow exponentially, along with the demand for better learning models and computer power, the asset management industry is one field that, unlike the game of Go, can keep its economic feasibility by bringing profits far beyond the BEP for search costs.

Disclaimer

The past performance may not be indicative of future results.

This material was prepared for informational purposes and cannot be used for the purpose of soliciting the sale of financial investment products such as funds.

This document contains the contents of the patent-pending or registered by Qraft Technologies, Inc.

Can AI Win the Nobel Prize in Economics?