Training subset selection for support vector regression

Liu, Cenru; Cen, Jiahao

doi:10.15439/2019F363

Artykuł - szczegóły

Tytuł artykułu

Training subset selection for support vector regression

Autorzy

Liu Cenru , Cen Jiahao

Wybrane pełne teksty z tego czasopisma

http://annals-csis.org

Identyfikatory

DOI

10.15439/2019F363

Warianty tytułu

Konferencja

Federated Conference on Computer Science and Information Systems (14 ; 01-04.09.2019 ; Leipzig, Germany)

Języki publikacji

Abstrakty

As more and more data are available, training a machine learning model can be extremely intractable, especially for complex models like Support Vector Regression (SVR) train- ing of which requires solving a large quadratic programming optimization problem. Selecting a small data subset that can effectively represent the characteristic features of training data and preserve their distribution is an efficient way to solve this problem. This paper proposes a systematic approach to select the best representative data for SVR training. The distribution of both predictor and response variables are preserved in the selected subset via a 2-layer data clustering strategy. A 2-layer step-wise greedy algorithm is introduced to select best data points for constructing a reduced training set. The proposed method has been applied for predicting deck's win rates in the Clash Royale Challenge, in which 10 subsets containing hundreds of data examples were selected from 100k for training 10 SVR models to maximize their prediction performance evaluated using R-squared metric. Our final submission having a R2 score of 0.225682 won the 3rd place among over 1200 solutions submitted by 115 teams.

Słowa kluczowe

Clash Royale support vector regression SVR R-squared metric radial basis function kernel RBF k-means clustering

regresja wektorów nośnych radialna funkcja bazowa SVR SBF algorytm centroidów gra komputerowa Clash Royale

Wydawca

Polskie Towarzystwo Informatyczne

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2019

Tom

Vol. 18

Strony

11--14

Opis fizyczny

Bibliogr. 21 poz., wz., rys.

Twórcy

autor

Liu Cenru

liucenru@gmail.com

Ngee Ann Polytechnic, Singapore

autor

Cen Jiahao

cenjiahao456@gmail.com

Nanyang Polytechnic, Singapore

Bibliografia

1. B. Marr, "How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read," https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/ae77e2560ba9, 2018.
2. B.E. Boser, I.M. Guyon, V. Vapnik, "A training algorithm for optimal margin classifiers," Proceedings of the Annual Conference on Computational Learning Theory, ACM, pp. 144–152, Pittsburgh, PA 1992.
3. I. Guyon, B. Boser, and V. Vapnik, "Automatic capacity tuning of very large VC-dimension classifiers," Advances in Neural Information Processing Systems 5, pp. 147–155, Morgan Kaufmann Publishers, 1993.
4. C. Cortes, and V. Vapnik, Support vector networks, Machine Learning, vol. 20, pp. 273–297, 1995.
5. B. Schölkopf, C. Burges, and V. Vapnik, "Extracting support data for a given task," Proceedings of First International Conference on Knowledge Discovery and Data Mining, AAAI Press, 1995.
6. B. Schölkopf, C. Burges, and V. Vapnik, "Incorporating invariances in support vector learning machines," Artificial Neural Networks, Springer Lecture Notes in Computer Science, Vol. 1112, pp. 47–52, Berlin, 1996.
7. V. Vapnik, S. Golowich and A. Smola, “Support Vector Method for Function Approximation, Regression Estimation, and Signal Processing,” in M. Mozer, M. Jordan, and T. Petsche (eds.), Neural Information Processing Systems, vol. 9, MIT Press, Cambridge, MA., 1997.
8. V. Vapnik and A. Chervonenkis, “Theory of Pattern Recognition” (in Russian), Nauka, 1974.
9. V. Vapnik, “Estimation of dependences based on empirical data,” Springer Verlag.
10. V. Vapnik, "The Nature of Statistical Learning Theory," Springer, New York.
11. B. Schölkopf, P. Simard, A. Smola, and V. Vapnik, "Prior knowledge in support vector kernels," In: M.I. Jordan, M.J. Kearns, and S.A. Solla (Eds.), Advances in Neural Information Processing Systems 10, MIT Press, Cambridge, MA, pp. 640–646, 1998.
12. V. Blanz, B. Schölkopf, H. Bulthoff, C. Burges, V. Vapnik, and T. Vetter, "Comparison of view-based object recognition algorithms using realistic 3D models," Artificial Neural Networks, Springer Lecture Notes in Computer Science, vol. 1112, pp. 251–256, Berlin, 1996.
13. B. Schölkopf, K. Sung, C. Burges, F. Girosi, P. Niyogi, T. Poggio, and V. Vapnik, "Comparing support vector machines with Gaussian kernels to radial basis function classifiers," IEEE Transactions on Signal Processing, vol. 45, pp. 2758–2765, 1997.
14. K.R. Muller, A. Smola, G. Ratsch, B. Schölkopf, J. Kohlmorgen, and V. Vapnik, "Predicting time series with support vector machines," Artificial Neural Networks, Springer Lecture Notes in Computer Science, vol. 1327, pp. 999–1004, Berlin, 1997.
15. H. Drucker, C.J.C. Burges, L. Kaufman, A. Smola, and V. Vapnik, "Support vector regression machines," Advances in Neural Information Processing Systems 9, pp. 155–161, MIT Press, Cambridge, MA, 1997.
16. M. Stitson, A. Gammerman, V. Vapnik, V. Vovk, C. Watkins, and J. Weston, "Support vector regression with ANOVA decomposition kernels," Advances in Kernel Methods—Support Vector Learning, MIT Press Cambridge, MA, pp. 285–292, 1999.
17. A. Smola, and B. Schölkopf, "A Tutorial on Support Vector Regression," STATISTICS AND COMPUTING, vol. 14, pp. 199-222, 2003.
18. D. Basak, S. Pal, and D. Patranabis, "Support Vector Regression," Neural Information Processing – Letters and Reviews, vol. 11, Non. 10, pp. 203-224, October 2007.
19. X. Xia,M. Lyu, T. Lok, G. Huang, "Methods of Decreasing the Number of Support Vectors via k-Mean Clustering," Proc. International Conference on Intelligent Computing, Lecture Notes in Computer Science book series (LNCS),vol. 3644 pp. 717-726, 2005.
20. J. Hartigan, and M. Wong, "Algorithm AS 136: A k-Means Clustering Algorithm," Journal of the Royal Statistical Society, Series C, vol. 28, no. 1, pp. 100–108, 1979.
21. fitrsvm: Fit a support vector machine regression mode, https://www.mathworks.com/help/stats/fitrsvm.html.

Uwagi

1. Track 1: Artificial Intelligence and Applications

2. Technical Session: 14th International Symposium Advances in Artificial Intelligence and Applications

3. Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-25205e05-469e-4dec-b1a4-8332178405c4