Statistical arbitrage on the KOSPI 200: An exploratory analysis of classification and prediction machine learning algorithms for day trading

Sutherland, I.; Jung, Y.; Lee, G.

Article abstract

Journal of Economics and International Business Management

Research Article | Published June 2018 | Volume 6, Issue 1, pp. 10-19

Statistical arbitrage on the KOSPI 200: An exploratory analysis of classification and prediction machine learning algorithms for day trading

Ian Sutherland

Yesuk Jung^*

Gunhee Lee

Email Author

Department of Business Analytics, Sogang Business School, Sogang University, Seoul, South Korea.

……......................................…....……...……………...................…....…........…....…..................................................………...……..….........………...........

Citation: Sutherland I, Jung Y, Lee G (2018). Statistical arbitrage on the KOSPI 200: An exploratory analysis of classification and prediction machine learning algorithms for day trading. J. Econ. Int. Bus. Manage. 6(1): 10-19.

Abstract

In this study, several machine learning methods on the representative stock market index of South Korea and the Korean Composite Stock Price Index (KOSPI) 200 were tested as machine learning has become ubiquitous in the financial field for asset selection. Compared to other major global stock markets, KOSPI has remained relatively flat over time. Despite the extremely low overall market growth, all of the tested models experienced annualized returns between 2.4 and 7.5 times the KOSPI 200 index over the same period. Even after applying an overestimated 0.5% transaction fee per daily trade the models beat out the market by a notable margin. While all tested models outperformed the market significantly, some models outcompeted the other tested models. A highlight of the present research is determining whether predicting class labels or predicting values is preferable in machine learning-driven daily trading algorithms. Four classification models - logistic regression, random forest, deep neural network, gradient-boosted trees - are compared to four prediction models - multiple regression, random forest, deep neural network, gradient-boosted regression trees. Additionally, an equally-weighted ensemble of the classification models is compared to an equally -weighted ensemble of the prediction models. Of the total of ten models, classification techniques which tend to outcompete prediction techniques by a slight margin was shown, and all models outperform the market.

Keywords Statistical arbitrage machine learning random forests gradient boosted trees deep neural networks

This article is published under the terms of the Creative Commons Attribution License 4.0

References

Arora A, Candel A, Lanford J, LeDell E, Parmar V (2015). Deep Learning with H2O. http://h2o.ai/resources.

Asness CS, Moskowitz TJ, Pedersen LH (2013). Value and momentum everywhere. J. Finan. 68(3):929-985.

Atsalakis GS, Valavanis KP (2009). Surveying stock market forecasting techniques - Part II: Soft computing methods. Exp. Syst. Appl. 36(3):5932-5941.

Avellaneda M, Lee JH (2010). Statistical arbitrage in the US equities market. Quantitative Finan. 10(7):761-782.

Bogomolov T (2013). Pairs trading based on statistical variability of the spread process. Quant. Finan. 13(9):1411-1430.

Breiman L (2001). Random forests. Mach. Learn. 45(1):5-32.

Burgess AN (2000). A computational methodology for modelling the dynamics of statistical arbitrage (Doctoral dissertation, University of London).

Chiu MC, Wong HY (2015). Dynamic cointegrated pairs trading: mean - variance time-consistent strategies. J. Comput. Appl. Math. 290:516-534.

Chen T, He T, Benesty M, Khotilovich V, Tang Y (2018). xgboost: Extreme Gradient Boosting. R package version 0.6.4.1. https://CRAN.R-project.org/package=xgboost.

Clegg M, Krauss C (2018). Pairs trading with partial cointegration. Quant. Finan. 18(1):121-138.

Click C, Lanford J, Malahlava M, Parmar V, Roark H (2015). Gradient Boosted Models with H20. http://h2o.ai/resources.

Dixon M, Klabjan D, Bang JH (2015). Implementing deep neural networks for financial market prediction on the Intel Xeon Phi. In Proceedings of the 8th Workshop on High Performance Computational Finance ACM. p. 6.

Do B, Faff R (2010). Does simple pairs trading still work?. Finan. Anal. J. 66(4):83-95.

Engle R, Granger C (1987). Co-Integration and Error Correction: Representation, Estimation, and Testing. Econometrica, 55(2):251-276. doi: 10.2307/1913236.

Enke D, Thawornwong S (2005). The use of data mining and neural networks for forecasting stock market returns. Exp. Syst. Appl. 29(4):927-940.

Graham B, Dodd DL (1934). Security analysis: Principles and technique. McGraw-Hill. Graham B, McGowan B (2005). The intelligent investor. Harper Collins.

Gray WR, Carlisle TE (2012). Quantitative Value, + Web Site: A Practitioner's Guide to Automating Intelligent Investment and Eliminating Behavioral Errors (Vol. 836). John Wiley & Sons.

Gray WR, Vogel JR (2016). Quantitative Momentum, + Web Site: A Practitioner’s Guide to Building a Momentum-Based Stock Selection System. John Wiley & Sons.

Hong H, Stein JC (1999). A unified theory of underreaction, momentum trading, and overreaction in asset markets. J. Finan. 54(6):2143-2184.

Huck N (2009). Pairs selection and outranking: An application to the S&P 100 index. Eur. J. Operat. Res. 196(2):819-825.

Jacobs H, Weber M (2015). On the determinants of pairs trading profitability. Journal of Financial Markets, 23, 75-97.

Krauss C (2017). Statistical arbitrage pairs trading strategies: Review and outlook. J. Econ. Surv. 31(2):513-545.

Krauss C, Do XA, Huck N (2017). Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500. Eur. J. Operat. Res. 259(2):689-702.

Lei Y, Xu J (2015). Costly arbitrage through pairs trading. J. Econ. Dyn. Control, 56:1-19.

Leung MT, Daouk H, Chen AS (2000). Forecasting stock indices: a comparison of classification and level estimation models. Int. J. Forecast. 16(2):173-190.

Liaw A, Wiener M (2002). Classification and Regression by random Forest. R News 2(3):18-22.

Peterson BG, Carl P (2015). Portfolio Analytics: Portfolio Analysis, Including Numerical Methods for Optimization of Portfolios. R package version 1.0.3636. https://CRAN.R-project.org/ package= Portfolio Analytics.

Ryan JA, Ulrich J (2017). xts: extensible Time Series. R package version 0.10-1. https://CRAN.R-project.org/package=xts.

Ryan JA, Ulrich J (2017). quantmod: Quantitative Financial Modeling Framework. R package version 0.4-12. https://CRAN.R-project.org/package=quantmod.

Sermpinis G, Theofilatos K, Karathanasopoulos A, Georgopoulos EF, Dunis C (2013). Forecasting foreign exchange rates with adaptive neural networks using radial-basis functions and particle swarm optimization. Eur. J. Operat. Res. 225(3):528-540.

Song Q, Zhang Q (2013). An optimal pairs-trading rule. Automatica, 49(10):3007-3014.

Takeuchi L, Lee YYA (2013). Applying deep learning to enhance momentum trading strategies in stocks. In Technical Report. Stanford University.

Thomaidis NS, Kondakis N, Dounias GD (2006). An intelligent statistical arbitrage trading system. In Hellenic Conference on Artificial Intelligence (pp. 596-599). Springer, Berlin, Heidelberg.

Tsai CF, Lin YC, Yen DC, Chen YM (2011). Predicting stock returns by classifier ensembles. Appl. Soft Comput. 11(2):2452-2459.

	Email: service@sciencewebpublishing.net
Copyright © 2018 Scienceweb Publishing, All Rights Reserved: Terms of Use \| Copyright Policy