Factor Factory Series 1: A framework to find market anomalies and potential factors

What is Factor Investing?

Even for those unfamiliar with the word factor investment, the name Warren Buffett may not be unfamiliar. Warren Buffett is a disciple of Benjamin Graham, also known as the founder of value investment, and a legendary investor with countless modifiers, including the genius of investment and the Oracle of Omaha. Warren Buffett analyzed the company’s financial statements and invested in accordance with the following rules:

“Buy firms that are deemed undervalued in the market among blue-chip companies whose financial structure is sound and generating steady cash flows.”

Factor investments also have something in common in that they are invested on a principle basis. Factor investment is a methodology that invests in companies with characteristics that are known to generate excess return over the market. For example, the value factor is measured by the ratio of book value to market value. Except relatively recently, high-value companies have been reported to have performed better than low-value companies on most exchanges around the world. In addition to well-known factors such as value, there are countless factors in the market.

Background

Market anomalies are usually revealed through inductive research because it is difficult to clearly explain the cause, as the name implies. Market anomalies are numerous and the verification process is not uniform. Thus, the process of finding and evaluating market anomalies has only been undertaken by some researchers with prior knowledge of factor investment. Because traditional methods rely heavily on the competence of researchers, they have the problem of needing a large number of researchers and taking a lot of time to discover new anomalies.

Icon made by Eucalyp from www.flaticon.com

The Factor Factory, developed by Qraft technologies, is a framework that automates the process of market anomaly exploration to solve these problems. The Factor Factory has transformed the exploration process of new anomalies into a problem that computers can solve without human intervention by modeling and quantifying the process of constructing and evaluating them. In addition, the search algorithm based on machine learning is combined to optimize the search process to efficiently explore various anomalies. Factor Factory processes all data including S&P Compustat data using Kirin API, a data ETL API developed by Qraft technologies. Factor Factory explores anomalies with robust alpha by combining data brought up through Kirin API in a way proposed by the navigation algorithm.

Main Components

The process of Factor Factory exploring new anomalies is similar to the process of developing new menus in restaurants. The Factor Factory consists of five representative components. Each component determines how it is cooked and what ingredients it contains. There are Universe which contains ingredients to be cooked, SearchAlgorithm and PortfolioMethodology in response to chefs, FactorGym, which corresponds to the kitchen where the dish is made, and MetricFn, which finally evaluates the food.

1. Universe
Universe determines the pool of stocks to form a portfolio that corresponds to the ingredients. Options determined by Universe are very wide, including exchanges (NYSE, NASDAQ, AMEX..), class (ex, ClassA Only), types of securities (Common, Preferred, ADR), time-period, and market capitalization. It is a simple but important step because the same factor can have very different results depending on the universe setting.

2. Factor Gym

FactorGym corresponds to the kitchen where the dish is made. After selecting ingredients at Universe, create a new dish (portfolio) in the way suggested by the chef(Research Algorithm & PortfolioMethodology). The dish is represented by the Expression Tree, Factor Tree, as shown below. The chef completes one Factor Tree by repeating the process of investigating the currently given ingredients(Node) and selecting the next. The nodes that can configure the Factor Tree are divided into three types: Operator, Data, and Normalizer, and the tree is constructed in the form of Operator Node and Normalizer Node alternately stacked, then Data Node is placed at the end.

3. Search Algorithm

In order to make a good dish, the characteristics and harmony of each ingredient must be fully considered. The discovery algorithm determines what ingredients (Nodes) will be used to make new dishes. There are two main ways to choose materials. First of all, you can try to make a new dish by changing the combination of ingredients that have been successful in the past. This method has the advantage of being more likely to succeed, but it has the disadvantage of being able to make similar dishes as before. On the other hand, the use of a whole new combination of ingredients entails many attempts and failures, but also creates a whole new successful dish. Applying these ideas, the Factor Factory embodies two kinds of exploration algorithms: machine learning methods and random search.

The random search method is a search method that completes the factor tree by recommending nodes randomly, just as the name suggests. Random searches, in addition to grid searches, correspond to the simplest example of search algorithms. It is virtually impossible to apply grid search in an infinitely large number of factor tree configurations. Random search is less efficient, but there is less risk of bias than other search algorithms. In addition, when compared with other navigation algorithms, random search methods often perform better. Also, unlike the machine learning-based navigation described below, there is little overhead of the search itself. Therefore the search process is quite fast.

Random search methods have limitations in exploring space inefficiently. The number of factor trees created by a combination of more than 100 financial data and operators is numerous. In order to effectively explore a large space, a method other than a random search is needed. Various methods such as Bayesian Optimization can be good options. Deep learning, one of the areas that has recently become prominent, and among them, methodologies using reinforcement learning were applied. The structure of neural networks was modified and improved to suit factor search based on Prajit et al (2017).

4. Portfolio Methodology

PortfolioMethodology determines how the selected ingredients are cooked. Even if the same ingredients are used, different results can be made depending on how it is cooked. PortfolioMethodology performs the function of setting up the various elements required to construct a portfolio (rebalancing cycle, portfolio configuration baseline (ex, evaluated Expression Tree score is above the top 25% of NYSE), minimum number of stocks in a portfolio, time series length limits, sector neutralization, etc.).

5. MetricFn

The newly-made dishes can be evaluated from various perspectives, including the taste of food, plating, harmony of ingredients, marketability, and harmony with other menus that already exist in restaurants. MetricFn corresponds to this evaluation process. Factor Factory includes an evaluation process to evaluate factors explored in various perspectives (Regression Alpha, RankIC, Shape, Sortino, MDD, etc.).

Examples

The complexity of the portfolio explored through the Factor Factory can be seen through the depth index. Depth means the depth of the factor tree and is determined by the number of NormalizerNodes, OperatorNodes, and DataNodes that make up the factor tree and how they are combined. In general, the higher the depth, the greater the complexity. The following figure visualizes the tree structure of a relatively simple Factor Tree (A) to a complex Factor Tree © corresponding to Depth 7.

The Expression Tree of the Factor Factory is characterized by its very difficult interpretation. However, one interesting fact is that some of the anomalies that the Factor Factory found inductively are consistent with the results found in prior research. Factor A is a market anomaly that has performed well in the past by incorporating stocks with momentum effect, which subtracts from the past 12 months’ return to the last six months’ return, into the portfolio. In other words, the 7–12month return contributes more to the momentum effect than the latest six-month return. This is similar to the result of R. Novy-Marx (2012).

[ Figure 04. Tree structure of Factor A~C ]

The following figure visualizes the tree structure of some of the market anomalies explored in the Factor Factory (F Factor, G Factor).

[ Figure 05. Tree structure of Factor F and G ]

The table below shows the basic statistics of the performance of Factor F and G. The analysis was performed by separating the entire period into the training period (1992.01–2015.04) and the test period (2015.05–2020.04). Both Factor F and Factor G performed well against the benchmark in both sub-periods. The Sharp ratio, which measures return on risk increased significantly, and the MDD, a risk indicator, also has been improved.

[ Table 01. Summary statistics of Factor F and G returns ]

[ Figure 06. Cumulative return of portfolio. Factor F(up), Factor G(down) ]

Conclusions

Inductive factor search methods of Factor Factory can find various market anomalies that are difficult to find with conventional deductive thinking. The speed of exploration is also noteworthy, as you can explore hundreds of significant market anomalies a day simply by using a personal computer. As such, the Factor Factory shows the possibility of automated anomaly-finding by creating a robust alpha that is actually available.

Recent studies of new factors in academia are conducted deductively . Typically, Hou(2014) proposed a new factor model, Q-factor, with economic implications based on the General Equilibrium Model. The most widely known factor model at present, the Fama-French(1993) 3-factor model, is based on market anomaly (size, book-to-market). So far, the Factor Factory, unlike its name, has been focusing on exploring market anomalies. The methodology for exploring factors based on market anomalies is still considered valid. The numerous anomalies found by the Factor Factory suggest the possibility of new factorial exploration to match the name of the Factor Factory.

On the other hand, there are limitations of the current factor factory. The first is low explanatory power. What the derived factor tree means economically can only be understood by the user directly. However, it is not easy to interpret a random factorial formula in an economic way. The fact that the experimentation process and the test process are thoroughly separated and statistically thoroughly verified that the factor is significant does not contain an economic discourse is a clear limitation of the factor factory. Second, we are not considering the relationship between factors. Considering that reinforcement learning searches the fact tree in an inductive way, it is highly likely that it will be fitted toward the factorial formula that has already been found, or that sub-trees of the fact tree will continue to be recycled. This increases the correlation between each factor and consequently shares the same unsystematic risk.

Factor Factory is currently an early-stage project, and there are still many areas that can be developed. Therefore, if we overcome the above limitations and develop them, we will be able to bring the numerous factors hidden in the financial world to the surface.

Featured