A deep learning architecture to apply Dow theory in stock price prediction

What are the features to predict stock price? Obvious choices include:

Previous stock price
Technical indicators (obv, rsi, macd, …)
Financial statements
News

When predicting the stock price of, say, AAPL, it’s critical to consider other stocks’ price, e.g., MSFT, GOOG, META, NVDA, etc. If all the tickers in the NASDAQ are going down, we can say with confidence that we’re in a bear market.

DOW theory is the one that suggests such ideas. It states that both dow jones industrial and dow jones transport should confirm each other, e.g., if we’re in a strong market, both should go up. As a result, we add the following signal to the list:

Other stocks’ price

Continuing that observation, we could conclude that, not only stock prices, but also all the other market indicators like interest rate, oil price, vix, currency, market indices should be confirming each other.

As a result, the following deep learning architecture of multitask learning emerges. In the architecture, input of the combiation of everything that I described above is given of the shared layers. Those layers stand in the middle to determine the general market status. Their outputs are fed to the sets of layers that predict stock price, usdkrw, usdjpy, nasdaq100, and so on.

However, the above architecture leads to a problem of failing to predict stock price very well. Although input contains a list of (ticker, stock price, financial statements) tuples, it’s forgotten in the shared layers since individual stock price isn’t very useful in predicting currencies, market indicies, and so on. The number of non stock price targets are larger than 1 (= the stock price target), so it’s natural for the model to forget.

The approach to address this is using the skip connections.

While keeping everything the same, by adding skip connection to carry over the company features (ticker, stock price, financial statement), we can drastically improve stock price prediction performance.

When using this architecture, the speed of prediction accuracy increases on training data is about x7 when compared to the case of building layers just predicting the stock price.