Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
Persistent growth of information in recent decades, along with the development of new information technologies for its management, have made it essential to develop systems that allow to synthesize this massive information or better known as big data. In this article, a feedback based system for massive processing of digital newspapers is presented. This system synthesizes the most relevant information from different news stories obtained from several sources. System is fed with information from the Internet using web scraping techniques. All this information is stored in a data lake which has been implemented using NoSQL databases. Next, data processing is performed, focusing on words, their relevance, and their correlation with other words from related content groups or headlines. In order to perform this aggrupation, machine learning Large Language Model (LLM), K Nearest Neighbors (KNN) and text mining techniques are used. New text mining algorithms are also developed to adjust thresholds during content aggregation and synthesis. Finally, the results visualization mechanism is presented which allow users to give a punctuation to the news stories. This mechanism represents a feedback punctuation for the system which will be considered into the global punctuation, which is the basis to show the results. This system can be useful to summarize all the information contained in the news stories which are stored in Internet, providing users a fast way to be informed.
Słowa kluczowe
Czasopismo
Rocznik
Tom
Strony
24--42
Opis fizyczny
Bibliogr. 32 poz., fig., tab.
Twórcy
- Escuela Politécnica Nacional, Departamento de Informática y Ciencias de la Computación, Ecuador
Bibliografia
- [1] Abramowicz, W. & Tolksdorf, R. (2010). Business information systems. 13th International Conference. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-12814-1
- [2] Aggarwal, C. C., & Zhai, C. (Eds.). (2012). Mining text data. Springer New York.
- [3] Almeida, I. (2023). Introduction to Large Language Models for business leaders: Responsible AI strategy beyond fear and hype. Now Next Later AI.
- [4] Amerland, D. (2013). Google Semantic Search: Search Engine Optimization (SEO) Techniques that get your company more traffic, increase brand impact, and amplify your online presence. Pearson Education.
- [5] Balusamy, B., Abirami, R. N., Kadry, S., & Gandomi, A. H. (2021). Big Data: Concepts, Technology, and Architecture. John Wiley & Sons.
- [6] Bao, Z., Borovica-Gajic, R., Qiu, R., Choudhury, F., & Yang, Z. (Eds.). (2023). Databases theory and applications. 34th Australasian Database Conference (ADC 2023). Springer Nature Switzerland.
- [7] Berry, M. W., & Kogan, J. (Eds.). (2010). Text Mining: Applications and theory. John Wiley & Sons.
- [8] Bobadilla, J. (2021). Machine Learning y Deep Learning: Usando Python, Scikit y Keras. Ediciones de la U.
- [9] Bustamante, N., & Guillén, S. (2020). Big Data y Mass Media. Aula Magna Proyecto clave McGraw Hill.
- [10] Campesato, O. (2023). Transformer, BERT, and GPT3: Including ChatGPT and Prompt Engineering. Mercury Learning and Information.
- [11] Cevallos, F. (2024, April 9). GitHub dataset for digital news classification and punctuation using Machine Learning and Text Mining techniques. Github, Inc. Retrieved from https://github.com/fcevallosepn/news
- [12] Chen, J., Huynh, V.-N., Tang, X., & Wu, J. (Eds.). (2023). Knowledge and systems science. 22nd International Symposium. Springer Nature Singapore.
- [13] De Ville, B. (2001). Microsoft data mining: Integrated business intelligence for e-commerce and knowledge management. Digital Press.
- [14] Gils, B. (2023). Data in context: Models as enablers for managing and using data. Springer Nature Switzerland.
- [15] Gorelik, A. (2019). The Enterprise Big Data lake: Delivering the promise of Big Data and data science. O'Reilly Media.
- [16] Hildebrandt, M., & Gutwirth, S. (2008). Profiling the European citizen: Cross-disciplinary. Springer Netherlands.
- [17] Johri, P., Verma, J. K., & Paul, S. (Eds.). (2020). Applications of Machine Learning (Algorithms for Intelligent Systems). Springer Nature Singapore.
- [18] Kannan, R., Rasool, R. U., Jin, H., & Balasundaram, S. R. (Eds.). (2016). Managing and processing Big Data in cloud computing. IGI Global. https://doi.org/10.4018/978-1-4666-9767-6
- [19] Koul, N., (2023). Prompt engineering for Large Language Models. Nimrita Koul.
- [20] Kumar, S. (2020). Can webometrics predict the academic rankings of institutes? The Journal of Prediction Markets, 14(2), 61-76. https://doi.org/10.5750/jpm.v14i2.1816
- [21] Nisbet, R., Miner, G., & Yale, K. (2017). Handbook of statistical analysis and data mining applications. Elsevier Science.
- [22] Ortega, J. M. (2022). Big data, machine learning y data science en python. RA-MA S.A. Editorial y Publicaciones.
- [23] Pasupuleti, P., & Purra, B. S. (2015). Data Lake Development with Big Data. Packt Publishing.
- [24] Rahman El Sheikh, A. A., & Alnoukari, M. (Eds.). (2012). Business Intelligence and Agile Methodologies for Knowledge-Based Organizations: Cross-Disciplinary Applications. IGI Global. https://doi.org/10.4018/978-1-61350-050-7
- [25] Rajaguru, H., & Prabhakar, S. K. (2017). KNN classifier and K-Means clustering for robust classification of epilepsy from EEG signals. A detailed analysis. Anchor Academic Publishing.
- [26] Ribeiro, J. A. (2019). Big Data for executives and market professionals - Second edition. Amazon Digital.
- [27] Rúa Pérez, J. (2009). Tecnologìa, innovación y empresa. Lulu Press, Incorporated.
- [28] Sánchez Trujillo, M., & Pérez Hernández, J. A. (2021). Metodología CRISP-DM en la gestión de proyecto de Data Mining. Caso enfermedades dermatológicas. International Conference on Project Management. EAN Universidad.
- [29] Sarkis, A. (2023). Training Data for Machine Learning. O'Reilly Media.
- [30] Suganthi, K., Karthik, R., Rajesh, G., & Ching, P. H. C. (Eds.). (2021). Machine Learning and Deep Learning techniques in wireless and Mobile Networking Systems. CRC Press.
- [31] Wang, L., Licheng, J., Shi, G., Li, X., & Liu, J. (Ed.). (2006). Fuzzy systems and knowledge discovery. Third International Conference. Springer Berlin Heidelberg.
- [32] Zong, C., Xia, R., & Zhang, J. (2021). Text Data Mining. Springer Nature Singapore.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-38ac3448-ff10-4f29-8f54-93d4ab6b2744
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.