Wyniki wyszukiwania - BazTech

1

Impact of n-stage latent Dirichlet allocation on analysis of headline classification

Guven Zekeriya Anil, Diri Banu, Cakaloglu Tolgahan

Computer Science

|

2022

|

T. 23 (3)

375--394

EN

Data analysis becomes difficult when the amount of the data increases. More specifically, extracting meaningful insights from this vast amount of data and grouping it based on its shared features without human intervention requires advanced methodologies. There are topic-modeling methods that help overcome this problem in text analyses for downstream tasks (such as sentiment analysis, spam detection, and news classification). In this research, we benchmark several classifiers (namely, random forest, AdaBoost, naive Bayes, and logistic regression) using the classical latent Dirichlet allocation (LDA) and n-stage LDA topic-modeling methods for feature extraction in headline classification. We ran our experiments on three and five classes of publicly available Turkish and English datasets. We have demonstrated that, as a feature extractor, n-stage LDA obtains state-of-the-art performance for any downstream classifier. It should also be noted that random forest was the most successful algorithm for both datasets.

2

Text-mining-based approach for conductingliterature review of selected meshfree methods

Sindhusuta S., Chi Sheng-Wei, Derrible Sybil

Computer Assisted Methods in Engineering and Science

|

2021

|

Vol. 28, no. 4

265--290

EN

The goal of this study is to review the literature in the field ofmeshfree methodsusing textmining. For this study, the abstracts of around 17330 relevant articles published from 1990to 2020 were collected from Scopus. Text mining techniques such as the latent Dirichletallocation (LDA), along with the calculation of term frequencies and co-occurrence coefficients were used to analyze the text. The study identified a few key topics in the field ofmeshfree methods and helped to see the evolution of the field over the past three decades.Furthermore, the trend in the number of publications and frequency map highlightedresearch trends and lack of focus in certain areas. The co-author network visualizationprovided interesting insights about collaboration between different researchers around theworld. Overall, this study facilitates a systematic literature review in the field of meshfreemethods and provides a broader perspective of the field to the research community.

3

Text mining in the identification of duties and responsibilities of the project manager

Wyskwarski Marcin

Zeszyty Naukowe. Organizacja i Zarządzanie / Politechnika Śląska

|

2020

|

z. 144

649--659

EN

Purpose: An attempt to identify the duties and responsibilities of the project manager by analysing job offers from a job website. An attempt to determine whether there were any changes between 2018 and 2019. Design/methodology/approach: Text mining was performed for fragments of job offers, describing the duties and responsibilities. The text mining analysis consisted of initial processing of the text, creation of a corpus of analysed documents, construction of a word frequency matrix and use of classical methods from the data mining are. Findings: The most common words in job offers are presented, as well as their correlation with other words. With the use of the Topic modeling algorithm, hidden topics describing the analysed job offers have been generated. These topics can also be used to identify the duties and responsibilities of a project manager. Research limitations/implications: Only the job offers meeting the following conditions were analysed: (1) they concerned the job of „project manager”; (2) the content was in Polish; (3) they were provided by www.pracuj.pl website; (4) they were collected from 09 to 11 April in 2018 and 2019. Practical implications: This method can be used by organizations training project managers, in order to modify and better adjust the curriculum to the needs of the labour market. Originality/value: Research has shown that text mining can be used to determine the responsibilities of a project manager by analysing job offers.

4

Identification of desired project manager competence using text mining analysis

Wyskwarski Marcin

Zeszyty Naukowe. Organizacja i Zarządzanie / Politechnika Śląska

|

2020

|

z. 149

735--749

EN

Purpose: An attempt to identify the competencies of the project manager desired by the employers and to determine whether changes have occurred over time. Design/methodology/approach: Job offers were automatically downloaded from website with job offers. An analysis of text mining of fragments of offers describing the competence was carried out. The analysis of text mining included initial text processing, creation of corpora of analyzed documents, creation of a document-term matrix, topic modeling algorithm and the use of classic methods derived from data mining. Findings: The most frequently used words/n-grams and the correlation of selected words/ n-grams with other words/n-grams were presented in the form of drawings. Based on the frequency of words/n-grams and the correlation value, efforts were made to identify the project manager competencies. The topic modeling algorithm was used to generate topics that can also be used to identify expected project manager competencies. Research limitations/implications: Only offers written in Polish, downloaded from one websites with job offers, which had the phrase “kierownik projektu” (“project manager”) in their job title, were analyzed. Data was collected from 09 to 11 April 2018 and from 09 to 11 April 2019. Practical implications: The method applied can be used by organizations preparing for the profession of a project manager, to modify and better adapt curricula to the needs of the labor market. Originality/value: Studies have shown that text mining of job offers can, to some extent, help determine the desired project manager competence.

5

Unsupervised dynamic topic model for extracting adverse drug reaction from health forums

Eslami Behnaz, Motlagh Mehdi Habibzadeh, Rezaei Zahra, Eslami Mohammad, Amini Mohammad Amin

Applied Computer Science

|

2020

|

Vol. 16, no 1

41--59

EN

The relationship between drug and its side effects has been outlined in two websites: Sider and WebMD. The aim of this study was to find the association between drug and its side effects. We compared the reports of typical users of a web site called: “Ask a patient” website with reported drug side effects in reference sites such as Sider and WebMD. In addition, the typical users’ comments on highly-commented drugs (Neurotic drugs, Anti-Pregnancy drugs and Gastrointestinal drugs) were analyzed, using deep learning method. To this end, typical users’ comments on drugs' side effects, during last decades, were collected from the website “Ask a patient”. Then, the data on drugs were classified based on deep learning model (HAN) and the drugs’ side effect. And the main topics of side effects for each group of drugs were identified and reported, through Sider and WebMD websites. Our model demonstrates its ability to accurately describe and label side effects in a temporal text corpus by a deep learning classifier which is shown to be an effective method to precisely discover the association between drugs and their side effects. Moreover, this model has the capability to immediately locate information in reference sites to recognize the side effect of new drugs, applicable for drug companies. This study suggests that the sensitivity of internet users and the diverse scientific findings are for the benefit of distinct detection of adverse effects of drugs, and deep learning would facilitate it.

6

Building semantic user profile for polish web news portal

Misztal-Radecka J.

Computer Science

|

2018

|

Vol. 19 (3)

307–-332

EN

The aim of this research is to construct meaningful user profiles that are the most descriptive of user interests in the context of the media content that they browse. We use two distinct state-of-the-art numerical text-representation techniques: LDA topic modeling and Word2Vec word embeddings. We train our models on the collection of news articles in Polish and compare them with a model built on a general language corpus. We compare the performance of these algorithms on two practical tasks. First, we perform a qualitative analysis of the semantic relationships for similar article retrieval, and then we evaluate the predictive performance of distinct feature combinations for user gender classification. We apply the algorithms to the real-world dataset of Polish news service Onet. Our results show that the choice of text representation depends on the task –Word2Vec is more suitable for text comparison, especially for short texts such as titles. In the gender classification task, the best performance is obtained with a combination of features: topics from the article text and word embeddings from the title.