Estudio comparativo sobre ensambles de clasificadores en flujos de datos no estacionarios

Autores/as

  • Alberto Verdecia Cabrera Departamento de informática, Universidad de Granma, Cuba
  • Isvani Frías Blanco Instituto de Ciˆencias Matemáticas e de Computac˜ao, Universidade de S˜ao Paulo, Brazil
  • Agustín Ortíz Díaz Departamento de informática, Universidad de Granma, Cuba
  • Yanet Rodríguez Sarabia Centro de Estudios de Informática, Universidad Central ”Marta Abreu” de Las Villas, Villa Clara, Cuba
  • Antonio Mustelier Hechavarría Departamento de informática, Universidad de Granma, Cuba

Palabras clave:

Flujos de datos, ensambles de clasificadores, cambio de concepto

Resumen

Muchas fuentes generan datos continuamente (conocidos como flujos de datos), por lo que es imposible almacenar estos grandes vol úmenes de datos y es necesario procesarlos en tiempo real. Debido a que estos datos son adquiridos a lo largo del tiempo y a la dinámica de muchas situaciones reales, la distribución de probabilidades (concepto objetivo) que rige los datos puede cambiar en el tiempo, un problema comúnmente denominado cambio de concepto. Existen varios algoritmos para manipular cambios de concepto, y entre ellos se encuentran los ensambles de clasificadores incrementales y los ensambles de clasificadores basados en bloques de instancias. En la literatura revisada no existen artículos para comparar estos dos enfoques. Por lo que, en este trabajo se realiza un estudio comparativo sobre estos dos enfoques.

Descargas

Los datos de descargas todavía no están disponibles.

Citas

Stephen Bach and Mark Maloof. A bayesian approach to concept drift. In Advances in Neural Information Processing Systems, pages 127–135, 2010. 1

Stephen H Bach, Marcus Maloof, and others. Paired learners for concept drift. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on, pages 23–32. IEEE, 2008. 1

Manuel Baena-Garcia, Jose del Campo-Avila, Raul Fidalgo, Albert Bifet, Ricard Gavalda, and Rafael Morales-Bueno. Early drift detection method. 2006. 1

Mich`ele Basseville and Igor V. Nikiforov. Detection of Abrupt Changes: Theory and Application. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993. 1

Albert Bifet, Eibe Frank, Geoffrey Holmes, and Bernhard Pfahringer. Accurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking. In ACML, pages 225–240, 2010. 5.2

Albert Bifet and Ricard Gavalda. Learning from timechanging data with adaptive windowing. In In SIAM International Conference on Data Mining, 2007. 1, 4.2

Albert Bifet, Geoff Holmes, Richard Kirkby, and Bernhard Pfahringer. Moa: Massive online analysis. The Journal of Machine Learning Research, 11:1601–1604, 2010. 5, 5.2.2

Albert Bifet, Geoff Holmes, and Bernhard Pfahringer. Leveraging bagging for evolving data streams. In Machine learning and knowledge discovery in databases, pages 135–150. Springer, 2010. 5.1, 5.2

Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby, and Ricard Gavalda. New ensemble methods for evolving data streams. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 139–148. ACM, 2009. 4.2, 5.1, 5.2

Leo Breiman. Bagging predictors. Machine learning, 24(2):123–140, 1996. 4.2

D. Brzezinski and J. Stefanowski. Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm. IEEE Transactions on Neural Networks and Learning Systems, 25(1):81–94, January 2014. 4.1, 5.1, 5.2

Dariusz Brzezinski and Jerzy Stefanowski. Combining block-based and online methods in learning ensembles from concept drifting data streams. Information Sciences, 265:50 – 67, 2014. 4.1, 5.1, 5.2

Bojan Cestnik. Estimating probabilities: a crucial task in machine learning. In ECAI, volume 90, pages 147–149, 1990. 5.1

Peter Clark and Tim Niblett. The CN2 induction algorithm. Machine learning, 3(4):261–283, 1989. 5.1

Padraig Cunningham, Niamh Nowlan, Sarah Jane Delany, and Mads Haahr. A case-based approach to spam filtering that can track concept drift. In The ICCBR, volume 3, pages 03–2003, 2003. 1

A. P. Dawid. Present Position and Potential Developments: Some Personal Views: Statistical Theory: The Prequential Approach. Journal of the Royal Statistical Society. Series A (General), 147(2):278–292, 1984. 5

Magdalena Deckert. Batch weighted ensemble for mining data streams with concept drift. In Foundations of Intelligent Systems, pages 290–299. Springer, 2011. 4.1

José Del Campo Ávila. Nuevos enfoques en aprendizaje incremental. 2007. 3.3

Janez Demˇsar. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7:1–30, 2006. 5.2.1

Pedro Domingos and Michael Pazzani. On the optimality of the simple Bayesian classifier under zero-one loss. Machine learning, 29(2-3):103–130, 1997. 5.1

Yoav Freund and Robert E. Schapire. A Short Introduction to Boosting. In In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pages 1401–1406. Morgan Kaufmann, 1999. 4.2

Isvani Frias-Blanco, Jose del Campo-Avila, Gonzalo Ramos-Jimenez, Rafael Morales-Bueno, Agustin Ortiz-Diaz, and Yaile Caballero-Mota. Online and Non-Parametric Drift Detection Methods Based on Hoeffding Bounds. IEEE Transactions on Knowledge and Data Engineering, 27(3):810–823, March 2015. 1, 3.3, 4.2, 5.2

Isvani Frías-Blanco, José del Campo-Ávila, Gonzalo Ramos-Jiménez, Andre CPLF Carvalho, Agustín Ortiz-Díaz, and Rafael Morales-Bueno. Online adaptive decision trees based on concentration inequalities. Knowledge-Based Systems, 104:179–194, 2016. 1, 3.3, 5.2

Isvani Frías Blanco, José del Campo Ávila, Gonzalo Ramos Jiménez, Rafael Morales Bueno, Agustín Ortiz Díaz, and Yailé Caballero Mota. Aprendiendo con detección de cambio online. Computación y Sistemas, 18(1):169–183, 2014. 5.2

Isvani Frías-Blanco, Alberto Verdecia-Cabrera, Agustín Ortiz-Díaz, and Andre Carvalho. Fast adaptive stacking of ensembles. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, pages 929–934. ACM, 2016. 1, 4.2, 5.1, 5.2

Keinosuke Fukunaga and Raymond R Hayes. Estimation of classifier performance. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 11(10):1087–1101, 1989. 1

Joao Gama. Knowledge discovery from data streams. CRC Press, 2010. 1

Joao Gama, Pedro Medas, Gladys Castillo, and Pedro Rodrigues. Learning with drift detection. In Advances in artificial intelligence, pages 286–295. Springer, 2004. 1, 2

Salvador García, Francisco Herrera, and John Shawetaylor. An extension on —statistical comparisons of classifiers over multiple data setsk for all pairwise comparisons. Journal of Machine Learning Research, pages 2677–2694. 5.2.1

Michael Bonnell Harries, Claude Sammut, and Kim Horn. Extracting hidden context. Machine learning, 32(2):101–126, 1998. 1

Geoff Hulten, Laurie Spencer, and Pedro Domingos. Mining time-changing data streams. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 97–106. ACM, 2001. 1, 3.3, 5

Ralf Klinkenberg. Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis, 8(3):281–300, 2004. 1

Ralf Klinkenberg and Thorsten Joachims. Detecting Concept Drift with Support Vector Machines. In ICML, pages 487–494, 2000. 1

Miroslav Kubat and Gerhard Widmer. Adapting to drift in continuous domains. In European Conference on Machine Learning, pages 307–310. Springer, 1995. 1, 3.1

Pat Langley, Wayne Iba, and Kevin Thompson. An analysis of Bayesian classifiers. In Aaai, volume 90, pages 223–228, 1992. 5.1

Leandro L Minku, Allan P White, and Xin Yao. The impact of diversity on online ensemble learning in the presence of concept drift. Knowledge and Data Engineering, IEEE Transactions on, 22(5):730–742, 2010. 2

Leandro L. Minku and Xin Yao. DDD: A new ensemble approach for dealing with concept drift. Knowledge and Data Engineering, IEEE Transactions on, 24(4):619–633, 2012. 4.2

Agustin Ortiz Diaz, Jose del Campo-Avila, Gonzalo Ramos-Jimenez, Isvani Frias Blanco, Yaile Caballero Mota, Antonio Mustelier Hechavarria, and Rafael Morales-Bueno. Fast Adapting Ensemble: A New Algorithm for Mining Data Streams with Concept Drift. The Scientific World Journal, 2014. 4.1

Nikunj C. Oza. Online bagging and boosting. In Systems, man and cybernetics, 2005 IEEE international conference on, volume 3, pages 2340–2345. IEEE, 2005. 4.2

Nikunj C. Oza and Stuart Russell. Online Bagging and Boosting. In Tommi Jaakkola and Thomas Richardson, editors, Eighth International Workshop on Artificial Intelligence and Statistics, pages 105–112, Key West, Florida. USA, January 2001. Morgan Kaufmann. 4.2

Michael J Pazzani. Searching for dependencies in Bayesian classifiers. In Learning from Data, pages 239–248. Springer, 1996. 5.1

Jeffrey C Schlimmer and Douglas Fisher. A case study of incremental concept induction. In AAAI, pages 496–501, 1986. 1

Kenneth O Stanley. Learning concept drift with a committee of decision trees. Informe tecnico: UT-AI-TR-03-302, Department of Computer Sciences, University of Texas at Austin, USA, 2003. 1

W. Nick Street and YongSeog Kim. A Streaming Ensemble Algorithm (SEA) for Large-scale Classification. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’01, pages 377–382, New York, NY, USA, 2001. ACM. 4.1

Haixun Wang, Wei Fan, Philip S. Yu, and Jiawei Han. Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 226–235. ACM, 2003. 4.1, 5.1

GerhardWidmer and Miroslav Kubat. Effective learning in dynamic environments by explicit context tracking. In Machine learning: ECML-93, pages 227–243. Springer, 1993. 1

Gerhard Widmer and Miroslav Kubat. Learning in the presence of concept drift and hidden contexts. Machine learning, 23(1):69–101, 1996. 1, 2

Descargas

Publicado

2017-06-01

Cómo citar

[1]
Verdecia Cabrera, A. et al. 2017. Estudio comparativo sobre ensambles de clasificadores en flujos de datos no estacionarios. Ciencias matemáticas. 31, 1 (jun. 2017), 49–60.

Número

Sección

Artículo Original