Wyniki wyszukiwania - BazTech

1

Text mining in the identification of duties and responsibilities of the project manager

Wyskwarski Marcin

Zeszyty Naukowe. Organizacja i Zarządzanie / Politechnika Śląska

|

2020

|

z. 144

649--659

EN

Purpose: An attempt to identify the duties and responsibilities of the project manager by analysing job offers from a job website. An attempt to determine whether there were any changes between 2018 and 2019. Design/methodology/approach: Text mining was performed for fragments of job offers, describing the duties and responsibilities. The text mining analysis consisted of initial processing of the text, creation of a corpus of analysed documents, construction of a word frequency matrix and use of classical methods from the data mining are. Findings: The most common words in job offers are presented, as well as their correlation with other words. With the use of the Topic modeling algorithm, hidden topics describing the analysed job offers have been generated. These topics can also be used to identify the duties and responsibilities of a project manager. Research limitations/implications: Only the job offers meeting the following conditions were analysed: (1) they concerned the job of „project manager”; (2) the content was in Polish; (3) they were provided by www.pracuj.pl website; (4) they were collected from 09 to 11 April in 2018 and 2019. Practical implications: This method can be used by organizations training project managers, in order to modify and better adjust the curriculum to the needs of the labour market. Originality/value: Research has shown that text mining can be used to determine the responsibilities of a project manager by analysing job offers.

2

Identification of desired project manager competence using text mining analysis

Wyskwarski Marcin

Zeszyty Naukowe. Organizacja i Zarządzanie / Politechnika Śląska

|

2020

|

z. 149

735--749

EN

Purpose: An attempt to identify the competencies of the project manager desired by the employers and to determine whether changes have occurred over time. Design/methodology/approach: Job offers were automatically downloaded from website with job offers. An analysis of text mining of fragments of offers describing the competence was carried out. The analysis of text mining included initial text processing, creation of corpora of analyzed documents, creation of a document-term matrix, topic modeling algorithm and the use of classic methods derived from data mining. Findings: The most frequently used words/n-grams and the correlation of selected words/ n-grams with other words/n-grams were presented in the form of drawings. Based on the frequency of words/n-grams and the correlation value, efforts were made to identify the project manager competencies. The topic modeling algorithm was used to generate topics that can also be used to identify expected project manager competencies. Research limitations/implications: Only offers written in Polish, downloaded from one websites with job offers, which had the phrase “kierownik projektu” (“project manager”) in their job title, were analyzed. Data was collected from 09 to 11 April 2018 and from 09 to 11 April 2019. Practical implications: The method applied can be used by organizations preparing for the profession of a project manager, to modify and better adapt curricula to the needs of the labor market. Originality/value: Studies have shown that text mining of job offers can, to some extent, help determine the desired project manager competence.

3

Unsupervised dynamic topic model for extracting adverse drug reaction from health forums

Eslami Behnaz, Motlagh Mehdi Habibzadeh, Rezaei Zahra, Eslami Mohammad, Amini Mohammad Amin

Applied Computer Science

|

2020

|

Vol. 16, no 1

41--59

EN

The relationship between drug and its side effects has been outlined in two websites: Sider and WebMD. The aim of this study was to find the association between drug and its side effects. We compared the reports of typical users of a web site called: “Ask a patient” website with reported drug side effects in reference sites such as Sider and WebMD. In addition, the typical users’ comments on highly-commented drugs (Neurotic drugs, Anti-Pregnancy drugs and Gastrointestinal drugs) were analyzed, using deep learning method. To this end, typical users’ comments on drugs' side effects, during last decades, were collected from the website “Ask a patient”. Then, the data on drugs were classified based on deep learning model (HAN) and the drugs’ side effect. And the main topics of side effects for each group of drugs were identified and reported, through Sider and WebMD websites. Our model demonstrates its ability to accurately describe and label side effects in a temporal text corpus by a deep learning classifier which is shown to be an effective method to precisely discover the association between drugs and their side effects. Moreover, this model has the capability to immediately locate information in reference sites to recognize the side effect of new drugs, applicable for drug companies. This study suggests that the sensitivity of internet users and the diverse scientific findings are for the benefit of distinct detection of adverse effects of drugs, and deep learning would facilitate it.

4

Ensemble Methods for Improving Classification of Data Produced by Latent Dirichlet Allocation

Jankowski M.

Computer Science and Mathematical Modelling

|

2018

|

No. 8

17--28

EN

Topic models are very popular methods of text analysis. The most popular algorithm for topic modelling is LDA (Latent Dirichlet Allocation). Recently, many new methods were proposed, that enable the usage of this model in large scale processing. One of the problem is, that a data scientist has to choose the number of topics manually. This step, requires some previous analysis. A few methods were proposed to automatize this step, but none of them works very well if LDA is used as a preprocessing for further classification. In this paper, we propose an ensemble approach which allows us to use more than one model at prediction phase, at the same time, reducing the need of finding a single best number of topics. We have also analyzed a few methods of estimating topic number.

PL

Modelowanie tematyczne, jest popularną metodą analizy tekstów. Jednym z najbardziej popularnych algorytmów modelowania tematycznego jest LDA (Latent Dirichlet Allocation) [14]. W ostatnim czasie zostało zaproponowanych wiele nowych rozszerzeń tego modelu, które pozwalają na przetwarzanie dużych ilości danych. Jednym z problemów podczas użycia algorytmu LDA jest to, że liczba tematów musi zostać wybrana przed uruchomieniem algorytmu. Ten krok, wymaga wcześniejszej analizy i zaangażowania analityka danych. Powstało kilka metod, które pozwalają automatyzować ten krok, ale żadna z nich, nie działa dobrze, gdy LDA jest użyte do redukcji wymiarów przed klasyfikacją danych. W tej pracy, proponujemy podejście oparte o ensemble wielu modeli. Taki model, unika problemu wybrania jednego, najlepszego modelu LDA. Pokażemy, że takie podejście pozwala uzyskać niższy błąd klasyfikacji. Zaproponujemy również, dwie nowe metody wyboru liczby tematów, gdy chcemy użyć tylko pojedynczego modelu.