The work explores the portrayal of the sixth president of Ukraine, Volodymyr Zelensky, in Russian and Ukrainian media sources during the pre-electoral campaign in 2019. The study used network analysis, n-grams’ generation, and LDA-based topic modeling. The study reveals that Russia’s media focused on Zelensky as a media personality, while Ukrainian sources paid attention to the portrayal of a novel popular politician. The target audience of the candidate’s campaign was the Russian-speaking population of Ukraine. Media in Ukraine’s native language were more inclined to mention elections, the role of the other candidate Petro Poroshenko and the nationalist mood, while defining Zelensky as just an ordinary candidate in an electoral race. The article is based on academic resources concerning the history of the development of political and media contexts in Ukraine, paying particular attention to agenda-setting, framing and priming techniques, and the personality of Volodymyr Zelensky.
The aim of this research is to construct meaningful user profiles that are the most descriptive of user interests in the context of the media content that they browse. We use two distinct state-of-the-art numerical text-representation techniques: LDA topic modeling and Word2Vec word embeddings. We train our models on the collection of news articles in Polish and compare them with a model built on a general language corpus. We compare the performance of these algorithms on two practical tasks. First, we perform a qualitative analysis of the semantic relationships for similar article retrieval, and then we evaluate the predictive performance of distinct feature combinations for user gender classification. We apply the algorithms to the real-world dataset of Polish news service Onet. Our results show that the choice of text representation depends on the task –Word2Vec is more suitable for text comparison, especially for short texts such as titles. In the gender classification task, the best performance is obtained with a combination of features: topics from the article text and word embeddings from the title.
Data analysis becomes difficult when the amount of the data increases. More specifically, extracting meaningful insights from this vast amount of data and grouping it based on its shared features without human intervention requires advanced methodologies. There are topic-modeling methods that help overcome this problem in text analyses for downstream tasks (such as sentiment analysis, spam detection, and news classification). In this research, we benchmark several classifiers (namely, random forest, AdaBoost, naive Bayes, and logistic regression) using the classical latent Dirichlet allocation (LDA) and n-stage LDA topic-modeling methods for feature extraction in headline classification. We ran our experiments on three and five classes of publicly available Turkish and English datasets. We have demonstrated that, as a feature extractor, n-stage LDA obtains state-of-the-art performance for any downstream classifier. It should also be noted that random forest was the most successful algorithm for both datasets.
The relationship between drug and its side effects has been outlined in two websites: Sider and WebMD. The aim of this study was to find the association between drug and its side effects. We compared the reports of typical users of a web site called: “Ask a patient” website with reported drug side effects in reference sites such as Sider and WebMD. In addition, the typical users’ comments on highly-commented drugs (Neurotic drugs, Anti-Pregnancy drugs and Gastrointestinal drugs) were analyzed, using deep learning method. To this end, typical users’ comments on drugs' side effects, during last decades, were collected from the website “Ask a patient”. Then, the data on drugs were classified based on deep learning model (HAN) and the drugs’ side effect. And the main topics of side effects for each group of drugs were identified and reported, through Sider and WebMD websites. Our model demonstrates its ability to accurately describe and label side effects in a temporal text corpus by a deep learning classifier which is shown to be an effective method to precisely discover the association between drugs and their side effects. Moreover, this model has the capability to immediately locate information in reference sites to recognize the side effect of new drugs, applicable for drug companies. This study suggests that the sensitivity of internet users and the diverse scientific findings are for the benefit of distinct detection of adverse effects of drugs, and deep learning would facilitate it.
5
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
The goal of this study is to review the literature in the field ofmeshfree methodsusing textmining. For this study, the abstracts of around 17330 relevant articles published from 1990to 2020 were collected from Scopus. Text mining techniques such as the latent Dirichletallocation (LDA), along with the calculation of term frequencies and co-occurrence coefficients were used to analyze the text. The study identified a few key topics in the field ofmeshfree methods and helped to see the evolution of the field over the past three decades.Furthermore, the trend in the number of publications and frequency map highlightedresearch trends and lack of focus in certain areas. The co-author network visualizationprovided interesting insights about collaboration between different researchers around theworld. Overall, this study facilitates a systematic literature review in the field of meshfreemethods and provides a broader perspective of the field to the research community.
Purpose: An attempt to identify the duties and responsibilities of the project manager by analysing job offers from a job website. An attempt to determine whether there were any changes between 2018 and 2019. Design/methodology/approach: Text mining was performed for fragments of job offers, describing the duties and responsibilities. The text mining analysis consisted of initial processing of the text, creation of a corpus of analysed documents, construction of a word frequency matrix and use of classical methods from the data mining are. Findings: The most common words in job offers are presented, as well as their correlation with other words. With the use of the Topic modeling algorithm, hidden topics describing the analysed job offers have been generated. These topics can also be used to identify the duties and responsibilities of a project manager. Research limitations/implications: Only the job offers meeting the following conditions were analysed: (1) they concerned the job of „project manager”; (2) the content was in Polish; (3) they were provided by www.pracuj.pl website; (4) they were collected from 09 to 11 April in 2018 and 2019. Practical implications: This method can be used by organizations training project managers, in order to modify and better adjust the curriculum to the needs of the labour market. Originality/value: Research has shown that text mining can be used to determine the responsibilities of a project manager by analysing job offers.
Istotą jakościowych praktyk badawczych jest wieloparadygmatyczność, która rodzi współistnienie różnych podejść metodologicznych w analizie i badaniu ludzkich doświadczeń w świecie życia codziennego. Różnorodność ta jest szczególnie widoczna w dziedzinie badań i analizy danych narracyjnych. Celem artykułu jest refleksja metodologiczna nad tworzeniem typologii analiz narracyjnych i zarazem propozycja nowego sposobu typologizacji podejść analitycznych, opartego na łączeniu lingwistyki korpusowej i przetwarzania języka naturalnego z procedurami CAQDAS, analizy treści i Text Mining. Typologia ta jest oparta na analizie narracyjnych praktyk badawczych odzwierciedlonych w języku anglojęzycznych artykułów opublikowanych w pięciu uznanych na świecie jakościowych czasopismach metodologicznych w latach 2002–2016. W artykule wykorzystuję metodę słownikową w procesie kodowania artykułów, hierarchiczne grupowanie i modelowanie tematyczne w celu odkrywania w tych publikacjach różnych typów analiz narracyjnych i badania relacji semantycznych między nimi. Jednocześnie konfrontuję heurystyczną typologię Riessmana z podejściem opartym na lingwistyce i eksploracji danych w celu rozwijania spójnego obrazu metodologii analizy narracyjnej we współczesnej dziedzinie badań jakościowych. Ostatecznie przedstawiam nowy model myślenia o analizie narracyjnej.
EN
The nature of qualitative research practices is multiparadigmaticity which creates coexistence of different research and analytical approaches to the study of human experience in the living world. This diversity is particularly observed in the contemporary field of narrative research and data analysis. The purpose of this article is a methodological reflection on the process of developing typology and a proposition of new data-driven and practice-based typology of narrative analyses used by qualitative researchers in the lived experience research. I merge the CAQDAS, Corpus Linguistics, and Text Mining procedures to examine the analytical strategies inherited in a vivid language of English-language research articles, published in five influential qualitative methodological journals between 2002-2016. Using the dictionary-based content analysis in the coding process, hierarchical clustering, and topic modeling – a text-mining tool for discovering hidden semantic structures in a textual body – I confront Catherine Kohler Riessman’s heuristic typology with the data-driven approach in order to contribute the more coherent image of narrative analysis in the contemporary field of qualitative research. Finally, I propose a new model of thinking about the typology of narrative analyses based upon research practices.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.