Wyniki wyszukiwania - BazTech

1

The role of word and n-gram frequency analysis in inference of the content of scientific publication

Zdonek Iwona

Zeszyty Naukowe. Organizacja i Zarządzanie / Politechnika Śląska

|

2020

|

z. 142

21--31

EN

Purpose: The paper presents an analysis of a scientific publication with regard to the frequency of words and n-grams. The research problem addressed was the question to what extent the text mining analysis of a scientific publication will allow to infer its content. Design/methodology/approach: The main research method is the analysis of tokenized text using word count functions, bigrams, and trigrams in selected sections of a scientific publication. The results of text mining analysis were compared with the classic, non-automated text analysis of the publication. The presented study is a pilot project in the form of a case study. Findings: The proposed method of analyzing a scientific text using an analysis of the frequency of words and n-grams enables inference of the content of the paper with regard to the names of variables involved in the study, the statistical apparatus used and the key literature cited. It should be observed, however, that the discussed method does not make it possible to establish which variables are moderators and which are mediators. Originality/value: In this paper, the text mining technique was used differently in the discussed study than in previous works. The publication was not examined in its entirety, as previous researchers did, but text mining analysis was applied to individual parts of the paper, i.e. the part discussing theoretical foundations of the research and the part presenting the research method, research results, and their discussion. This allowed for obtaining more precise results regarding the content of the publication.

2

n-gram-based approach to composer recognition

Wołkowicz J., Kulka Z., Kešelj V.

Archives of Acoustics

|

2008

|

Vol. 33, No. 1

43-55

EN

This paper describes how tools provided by Natural Language Processing and Information Retrieval can be applied to music. A method of converting complex musical structure to features (n-grams) corresponding with words of text was introduced. Mutual correspondence between both representations was shown by demonstrating certain important regularities known from text processing, which may also be found in music. The problem of automatic composer attribution to which statistical analysis of n-gram profiles known from statistical NLP was applied served as a case study. A MIDI files corpus of piano pieces was chosen as the source of data.

3

Detecting approximately duplicate bibliographic records with text algorithms: experience of creating a union catalogue of libraries at the Warsaw University of Technology

Płoszajski G.

TASK Quarterly : scientific bulletin of Academic Computer Centre in Gdansk

|

2003

|

Vol. 7, No 2

294-297

EN

The paper describes a fault-tolerant method of selecting duplicate bibliographic records in catalogues. The method is based on the use of text algorithms; decisions are suggested to librarians who make the final decision. The method was applied to four library catalogues at the Warsaw University of Technology which were compared with the catalogue of the main library. Process of joining catalogues was conducted differently for non-duplicate records and for duplicate ones. Thanks to this method, a significant portion of records in the catalogues of the joining libraries had been found to be duplicate before the catalogues were added. The algorithms proved helpful in assuring high quality of information.