Iteratively reweighted least squares classifier and its l2- and l1-regularized Kernel versions

Łęski, J.

Artykuł - szczegóły

Tytuł artykułu

Iteratively reweighted least squares classifier and its l2- and l1-regularized Kernel versions

Autorzy

Łęski J.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

This paper introduces a new classifier design method based on regularized iteratively reweighted least squares criterion function. The proposed method uses various approximations of misclassification error, including: linear, sigmoidal, Huber and logarithmic. Using the represented theorem a kernel version of classifier design method is introduced. The conjugate gradient algorithm is used to minimize the proposed criterion function. Furthermore, .1-regularized kernel version of the classifier is introduced. In this case, the gradient projection is used to optimize the criterion function. Finally, an extensive experimental analysis on 14 benchmark datasets is given to demonstrate the validity of the introduced methods.

Słowa kluczowe

classifier design IRLS conjugate gradient optimization gradient projection Kernel matrix

Wydawca

Polska Akademia Nauk, Wydział IV Nauk Technicznych

Czasopismo

Bulletin of the Polish Academy of Sciences. Technical Sciences

Rocznik

2010

Tom

Vol. 58, nr 1

Strony

171--182

Opis fizyczny

Bibliogr. 35 poz., rys., tab.

Twórcy

autor

Łęski J.

Institute of Electronics, Silesian University of Technology, 16 Akademicka St., 44-100 Gliwice, Poland, jleski@polsl.pl

Bibliografia

[1] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis, John Wiley&Sons, New York, 1973.
[2] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, John Wiley&Sons, New York, 2001.
[3] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge, 1996.
[4] J.T. Tou and R.C. Gonzalez, Pattern Recognition Principles, Addison-Wesley, London, 1974.
[5] A. Webb, Statistical Pattern Recognition, Arnold, London, 1999.
[6] B. Sch¨olkopf and A.J. Smola, Learning with Kernels. Support Vector Machines, Regularization, Optimization, and Beyond, The MIT Press, London, 2002.
[7] V. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, 1995.
[8] V. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.
[9] O.L. Mangasarian and D.R. Musicant, “Lagrangian support vector machines”, J. Mach. Learn. Res. 1 (1), 161–177 (2001).
[10] J.A.K. Suykens and J. Vandewalle, “Least squares support vector machine classifiers”, Neur. Proc. Lett. 9 (3), 293–300 (1999).
[11] I.W. Tsang, J.T. Kwok, and P.-M. Cheung, “Core vector machines: Fast SVM training on very large data sets”, J. Mach. Learn. Res. 6 (1), 363–392 (2005).
[12] I.W. Tsang, J.T. Kwok, and J.M. Zurada, “Generalized core vector machines”, IEEE Trans. Neur. Net. 17 (5), 1126–1140 (2006).
[13] I.W. Tsang, A. Kocsor, and J.T. Kwok, “Large-scale maximum margin discriminant analysis using core vector machines”, IEEE Trans. Neur. Net. 19 (4), 610–623 (2008).
[14] S. Mika, G. R¨atsch, J. Weston, S. Sch¨olkopf, and K.-R. M¨uller, “Fisher discriminant analysis with kernels”, in: Neur. Net. Sig. Proc. IX, pp. 41–48, eds. Y-H. Hu, J. Larsen, E. Wilson, S. Douglas, IEEE Press, New York, 1999.
[15] W. Zheng and L. Zou, “Foley-Sammon optimal discriminant vectors using kernel approach”, IEEE Trans. Neur. Net. 16 (1), 1–9 (2005).
[16] Y. Freund and R.E. Schapire, “Large margin classification using the perceptron algorithm”, Mach. Learn. 37 (1), 277–296 (1999).
[17] J.-H. Chen and C.-S. Chen, “Fuzzy kernel perceptron”, IEEE Trans. Neu. Net. 13 (6), 1364–1373 (2002).
[18] E. Pękalska, P. Paclik, and R.P.W. Duin, “A generalized kernel approach to dissimilarity-based classification”, J. Mach. Learn. Res. 2 (1), 175–211 (2001).
[19] Y.-C. Ho and R.L. Kashyap, “An algorithm for linear inequalities and its applications”, IEEE Trans. Elec. Comp. 14 (5), 683–688 (1965).
[20] Y.-C. Ho and R.L. Kashyap, “A class of iterative procedures for linear inequalities”, J.SIAM Control. 4 (2), 112–115 (1966).
[21] M.H. Hassoun and J. Song, “Adaptive Ho-Kashyap rules for perceptron”, IEEE Trans. Neu. Net. 3 (1), 51–61 (1992).
[22] J.M. Łęski, “Ho-Kashyap classifier with generalization control”, Pattern Recognition Letters 24 (2), 2281–2290 (2003).
[23] J.M. Łęski, “Kernel Ho-Kashyap classifier with generalization control”, Int. J. App. Math. Comp. Sci. 14 (1), 53–62 (2004).
[24] Z. Wang, S. Chen, J. Liu, and D. Zhang, “Pattern representation in feature extraction and classifier design: matrix versus vector”, IEEE Trans. Neu. Net. 19 (5), 758–769 (2008).
[25] Z. Wang, S. Chen, and T. Sun, “MultiK-MHKS: a novel multiple kernel learning algorithm”, IEEE Trans. Patt. Ana. Mach. Intel. 30 (2), 348–353 (2008).
[26] C.-W. Hsu and C.-J. Lin, “A comparison of methods for multiclass support vector machines”, IEEE Trans. Neu. Net. 13 (2), 415–425 (2002).
[27] P.J. Huber, Robust Statistics, Wiley, New York, 1981.
[28] S. Haykin, Neural Networks: a Comprehensive Foundation, Prentice Hall, Upper Saddle River, 1999.
[29] T. Blumensath and M.E. Davies, “Gradient pursuit”, IEEE Trans. Sig. Proc. 56 (6), 2370–2382 (2008).
[30] M.A.T. Figueiredo, R.D. Nowak, and S.J. Wright, “Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems”, IEEE J. Select. Top. Sig. Proc. 1 (4), 586–597 (2007).
[31] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression”, The Annals of Statistics 32 (2), 407–451 (2004).
[32] R. Tibshirani, “Regression shrinkage and selection via the lasso”, J.R. Statist. Soc. B 58 (1), 267–288 (1996).
[33] S.-J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, “An interior-point method for large-scale l1-regularized least squares”, IEEE J. Select. Top. Sig. Proc. 1 (4), 606–617 (2007).
[34] V. Ruggiero and L. Zanni, “A modified projection algorithm for large strictly convex quadratic programs”, J. Optim. Theory Appl. 104 (2), 281–299 (2000).
[35] T. Serafini, G. Zanghirati, and L. Zanni “Gradient projection methods for quadratic programs and applications in training support vector machines”, Optim. Meth. Soft. 20 (2), 353–378 (2004).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BPG8-0020-0018