Nowa wersja platformy, zawierająca wyłącznie zasoby pełnotekstowe, jest już dostępna.
Przejdź na https://bibliotekanauki.pl
Ograniczanie wyników
Czasopisma help
Lata help
Autorzy help
Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 28

Liczba wyników na stronie
first rewind previous Strona / 2 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  stylometry
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 2 next fast forward last
1
Content available remote Struktury sekwencyjne w Kronice Dalimila: analiza stylometryczna
100%
EN
The object of this paper is a quantitative study of sequential structures in the medieval Czech chronicle Dalimilova Kronika. The authors analyses style changes in the chronicle and tries to answer some questions concerning its authorship. Another topic discussed in this paper concerns the relationship between orality and literacy at the threshold of the Middle Ages in Europe. A philological approach, combined with quantitative tools including trend analysis and time series modeling, is applied in this paper.
2
100%
EN
In audiovisual translation, stylometry can be used to measure formal-aesthetic fidelity. We present a corpus-based measure of syntactic complexity as a feature of language style. The methodology considers hierarchical dimensions of syntactic complexity, using syllable counting and dependency parsing. The test material are dialogues of several characters from the TV show “Two and a Half Men”. The results show that characters do not differ syntactically among themselves as much as might be expected, and that, despite a general tendency to level differences even more in translation, the changes in syntactic complexity between the original and translation depend mostly on the respective character-feature combination.
3
100%
EN
An open stylometric system based on multilevel text analysisStylometric techniques are usually applied to a limited number of typical tasks, such as authorship attribution, genre analysis, or gender studies. However, they could be applied to several tasks beyond this canonical set, if only stylometric tools were more accessible to users from different areas of the humanities and social sciences. This paper presents a general idea, followed by a fully functional prototype of an open stylometric system that facilitates its wide use through to two aspects: technical and research flexibility. The system relies on a server installation combined with a web-based user interface. This frees the user from the necessity of installing any additional software. At the same time, the system offers a variety of ways in which the input texts can be analysed: they include not only the usual lexical level, but also deep-level linguistic features. This enables a range of possible applications, from typical stylometric tasks to the semantic analysis of text documents. The internal architecture of the system relies on several well-known software packages: a collection of language tools (for text pre-processing), Stylo (for stylometric analysis) and Cluto (for text clustering). The paper presents: (1) The idea behind the system from the user’s perspective. (2) The architecture of the system, with a focus on data processing. (3) Features for text description. (4) The use of analytical systems such as Stylo and Cluto. The presentation is illustrated with example applications. Otwarty system stylometryczny wykorzystujący wielopoziomową analizę języka Zastosowania metod stylometrycznych na ogół ograniczają się do kilku typowych problemów badawczych, takich jak atrybucja autorska, styl gatunków literackich czy studia nad zróżnicowaniem stylistycznym kobiet i mężczyzn. Z pewnością dałoby się je z powodzeniem zastosować również do wielu innych problemów klasyfikacji tekstów, gdyby tylko owe metody oraz odpowiednie narzędzia były bardziej dostępne dla uczonych reprezentujących różne dyscypliny nauk humanistycznych i społecznych. Artykuł niniejszy omawia założenia teoretyczne oraz w pełni funkcjonalny prototyp otwartego systemu stylometrycznego, którego szerokie zastosowanie umożliwią dwie jego cechy: elastyczność techniczna oraz dostosowywalność do różnych pytań badawczych. System opiera się na instalacji serwerowej sprzęgniętej z sieciowym interfejsem użytkownika. Uwalnia to użytkownika od konieczności instalowania jakichkolwiek dodatkowych programów. Jednocześnie system oferuje wiele sposobów analizowania tekstów nie tylko na poziomie leksykalnym, lecz także poprzez cechy językowe niskiego poziomu. Daje to możliwość stosowania systemu na wiele różnych sposobów, od typowych testów stylometrycznych do analizy semantycznej dokumentów. Wewnętrzna architektura systemu składa się z wielu elementów znanych ze swej funkcjonalności, w tym z pakietu Stylo przeznaczonego do analiz stylometrycznych oraz pakietu Cluto służącego do zaawansowanej analizy skupień. Artykuł omawia: (1) Koncepcję całego systemu, postrzeganą z punktu widzenia użytkownika, (2) Architekturę systemu oraz jego elementy odpowiedzialne za przetwarzanie tekstu, (3) Cechy językowe służące do opisu dokumentów, (4) Zastosowanie modułów analizy danych, takich jak Stylo czy Cluto. W artykule zostały też przedstawione przykładowe zastosowania systemu.
EN
The paper focuses on the analysis of a sample of military language from the stylometric perspective. The corpus is the chronicle of the 8th Czech Armed Forces Guard Company, which operated at the Bagram Air Field base (BAF). We work on the assumptions that in the corpus, there will be (A) a prominent presence of military slang; (B) a high proportion of abbreviations; (C) frequent linguistic devices expressing mutuality and collectiveness of the soldiers’ enterprise. The texts were subjected to keyword and collocation analyses; these determined several stylistic features of theirs (such as use of English-based expressions, protocol-like language, or idiosyncratic collocations), which testify to the multifaceted character of the military chronicle genre.
EN
The purpose of the article is to compare selected features of the style of utterances of professors and students in an oral exam as a communication situation. The research material consists of 25 recordings of oral exam (9 examiners with 32 students). They come from a corpus collected as part of GeWiss – a study project on the spoken scientific language. The texts were divided into two subcorpora: E (examiners) and S (students). Corpus linguistics methods were used in analysis. Several characteristic features of scientific and official styles were compared: numerous structures proszę + infinitive; nominal structures (nominal style); extensive hypotaxis. The analysis showed numerous stylistic similarities between the examined subcorpora. The style of none of the texts in the subcorpora is strongly nominal. A clear difference between the subcorpora is the presence of structures with the word proszę – it appears in the utterances of examiners, while in the utterances of students it is almost non-existent. The distribution of means responsible for cohesion in both subcorpora is different (parataxis is more common than hypotaxis but is implemented differently); also, there are differences in lists of one hundred most frequently used lexemes in the subcorpora – these differences allow us to distinguish these texts with tools for automatic style similarity analysis.
EN
The article deals with the question of authorship of the thirteenth-century Chronica Polonorum (or Gesta principium Polonorum [The Deeds of the Princes of the Poles]), also known as The Polish Chronicle. It seeks to verify the hypothesis, recently reproposed by Tomasz Jasiński, whereby the author was of Venetian origin. The hypothesis is namely based on the textual similarities observed between Translatio Sancti Nicolai by an author referred to as the ‘Monk of Lido’ (Monachus Littorensis) and the Chronica. The attribution attempt put forth by M. Eder is based upon stylometric methods that measure the frequencies of the most frequent words in the texts under research (mainly, conjunctions, prepositions, pronouns, and particles) which are subsequently subjected to cluster analysis, multidimensional scaling, or principal components analysis. The outcome of the experiment in question has demonstrated a strong resemblance between the Translatio Sancti Nicolai and the Polish Chronicle, which may be regarded as an substantial argument in support of the Venetian background hypothesis.
8
86%
EN
The paper presents an open, web-based system for stylometric analysis named WebSty, which is a part of the CLARIN-PL research infrastructure. WebSty does not require local installation by users, can be used via any web browser, offers rich set-up, and runs on a computing cluster.We discuss the underlying ideas of the system, its architecture, a pipeline of language tools for processing Polish, and its integration with systems for clustering, visualizing the results of clustering, and identifying the features of the strongest discrimination power. The techniques used for feature weighting and text similarity measuring are also concisely overviewed. In conclusions, we present preliminary evaluation of WebSty on the corpus of 1000 literary works, and we report on the results of the first research applications of WebSty. Even if the system was initially focused on processing Polish texts, we also briefly discuss its development towards a multilingual system, which already supports English, German and Hungarian.
PL
This article discusses automatic extraction of relevant words from sets of texts. The author briefly presents three methods aimed to extract the words from the corpus of words with regard to their frequency, or words whose occurrence next to each other is not random. First, he focuses on the keyword analysis method, then he discusses the Zeta method developed by John Burrows and Hugh Craig, and the third method covered in the article is the topic modelling method, which is becoming very popular recently, and consists in finding clusters of words co-occurring in similar contexts. Topic modelling was intended for a quick content search in large collections of documents. On the basis of 100 Polish novels, the article presents how this method can be used for linguistic studies.
11
Content available Mikrokorpus Gronowy Polszczyzny 1830–1918
86%
EN
This paper is dedicated to the construction of a small cluster corpus of Polish texts from the period 1830–1918. The assumptions of the corpus, its micro- and macro-structure, as well as stylistic, regional and author diversity, and method of making it available are presented. Its application capabilities are illustrated on the example of orthographic, infl ectional, and syntactic studies.
12
Content available remote Ještě k Seifertovu ranému volnému verši
86%
EN
Jeremiah Curtin translated most works by Poland’s first literary Nobel Prize winner, Henryk Sienkiewicz. He was helped in this life-long task by his wife Alma Cardell Curtin. It was also Alma, who, after her husband’s death, produced the lengthy Memoirs she steadfastly ascribed to her husband for his, rather than hers, greater glory. This article investigates the possible textual influences Alma might have had on other works by her husband, including his travelogues, ethnographic and mythological studies, and the translations themselves. Lacking traditional authorial evidence, this study relies on stylometric methods comparing most frequent word usage by means of cluster analysis of z-scores. There is much in this statistics-based authorial attribution to show how Alma Cardell Curtin’s significantly affected at least two other original works of her husband and, possibly, at least two of his translations.
EN
Background: To recognize the authors of the texts by the use of statistical tools, one first needs to decide about the features to be used as author characteristics, and then extract these features from texts. The features extracted from texts are mostly the counts of so called function words. Objectives: The data extracted are processed further to compress as a data with less number of features, such a way that the compressed data still has the power of effective discriminators. In this case feature space has less dimensionality then the text itself. Methods/Approach: In this paper, the data collected by counting words and characters in around a thousand paragraphs of each sample book, underwent a principal component analysis performed using neural networks. Once the analysis was complete, the first of the principal components is used to distinguish the books authored by a certain author. Results: The achieved results show that every author leaves a unique signature in written text that can be discovered by analyzing counts of short words per paragraph. Conclusions: In this article we have demonstrated that based on analyzing counts of short words per paragraph authorship could be traced using principal component analysis. Methodology could be used for other purposes, like fraud detection in auditing.
EN
Henryk Sienkiewicz’s novel "Quo Vadis" made its way into Italy at the end of the 19 th century through the efforts of Neapolitan translator Federigo Verdinois. The first part of this paper outlines the history of the popularity of "Quo Vadis" by focusing on the operations of Milanese publishers that made the Polish novel part of their offer in a variety of ways (as translations, adaptations, reworkings, plagiarisms, etc.). Bibliometric methods are used to establish why so many publishing houses decided to publish Henryk Sienkiewicz’s Roman romance. The analysis of the bibliometric data of the published translations helped assess and describe the extent and the character of the popularity that the novel garnered among Milanese publishers. The second part of the paper relates the findings of a multi-method quantitative study of the same material. The number of word tokens was compared between the original and the translations. The lexical richness across the texts under study was compared by means of the moving average type-token ratio (MATTR). Sentence lengths were also compared, as was sentence length distribution as time series. Two different programmes ("WCopyFind and Tracer") yielded very similar results on the degree of the similarity of five-word phrases in pairs of translations, which was determined in network analysis.
IT
L’opera "Quo vadis" di Henryk Sienkiewicz arrivò in Italia alla fine del XIX secolo grazie al traduttore napoletano Federigo Verdinois. Lo scopo della prima parte del contributo è quello di presentare la storia della popolarità del romanzo "Quo vadis" attraverso le azioni delle case editrici milanesi, le quali hanno introdotto l’opera del polacco, in varie forme, nella sua offerta editoriale (come traduzioni, adattamenti, parafrasi, plagi). La ricostruzione della storia delle traduzioni del romanzo romano di Henryk Sienkiewicz è stata possibile grazie al metodo biblometrico che è stato ustato nella prima parte dell’articolo. L’analisi dei dati bibliografici raccolti ha permesso di valutare e descrivere la grandezza e il carattere della popolarità di "Quo vadis" tra gli editori milanesi nella prima parte del XX secolo. Nella seconda parte del contributo per meglio far luce sulle complicate sorti milanesi dell’opera di Sienkiewicz abbiamo usato alcuni metodi d’analisi quantitativa. Abbiamo paragonato il numero delle parole in originale e nelle traduzioni descritte nella prima parte. La ricchezza del vocabolario di tutti i testi esaminati è stata misurata e confrontata usando il calcolo della media mobile del rapporto del numero di parole alla lunghezza del testo (MATTR). Abbiamo confrontato anche le lunghezze delle frasi come serie temporali. I due programmi diversi ("WCopyFind e Tracer"), utili per le analisi delle reti, hanno dato risultati simili per il numero delle somiglianze delle frasi di pentagrammi verbali tra le traduzioni di "Quo vadis".
PL
Niniejszy tekst stanowi recenzję książki "Reading beyond the female: The relationship between perception of author gender and literary quality" holenderskiej badaczki Cornelii Koolen. Prezentowana książka podejmuje tematykę relacji między płcią autora, oceną jakości literackiej jego lub jej twórczości i rzeczywistymi cechami tekstów, wpisując się w tym samym w nurt badań nad stereotypami płciowymi w języku i literaturze. Dzięki innowacyjnemu zastosowaniu ilościowych metod analizy tekstu, stanowi też istotną pozycję w zakresie metodologii stylometrycznej, nadając całości pracy interdyscyplinarny charakter.
EN
Presented text is a review of the book "Reading beyond the female: The relationship between perception of author gender and literary quality" by Dutch researcher Cornelia Koolen. Discussed book undertakes the issues of relations between the gender of the author, evaluation of literary quality of their work and actual features of the texts, thus fitting in the larger trend of research on gender stereotypes in language and literature. The innovative use of quantitative methods also grants it an important place within literatureon stylometric, making it an interdisciplinary work.
PL
Dynamiczny wzrost treści generowanych przez użytkowników w sieci stanowi poważne wyzwanie w zakresie ochrony użytkowników Internetu przed narażeniem na obraźliwe materiały, takie jak cyberprzemoc i mowa nienawiści, i jednoczesnego ograniczania rozprzestrzeniania nieetycznych zachowań. Jednak projektowanie zautomatyzowanych modeli wykrywania obraźliwych treści pozostaje złożonym zadaniem, szczególnie w językach o ograniczonych publicznie dostępnych danych. W naszych badaniach współpracujemy z serwisem internetowym Wykop.pl w celu uczenia modelu przy użyciu rzeczywistych treści, które podlegały usunięciu w procesie moderacji. W niniejszym artykule skupiamy się na języku polskim i omawiamy pojęcie zbiorów danych i metod anotacji, a następnie przedstawiamy naszą analizę stylometryczną treści z serwisu Wykop.pl w celu zidentyfikowania struktur morfosyntaktycznych, które są powszechnie aplikowane w języku cyberprzemocy i mowie nienawiści. Dzięki naszym badaniom mamy nadzieję na wniesienie wkładu w toczącą się dyskusję na temat obraźliwego języka i mowy nienawiści w badaniach socjolingwistycznych, podkreślając potrzebę analizy treści generowanych przez użytkowników w sieci.
EN
The dynamic increase in user-generated content on the web presents significant challenges in protecting Internet users from exposure to offensive material, such as cyberbullying and hate speech, while also minimizing the spread of wrongful conduct. However, designing automated detection models for such offensive content remains complex, particularly in languages with limited publicly available data. To address this issue, our research collaborates with the Wykop.pl web service to fine-tune a model using genuine content that has been banned by professional moderators. In this paper, we focus on the Polish language and discuss the notion of datasets and annotation frameworks, presenting our stylometric analysis of Wykop.pl content to identify morpho-syntactic structures that are commonly applied in cyberbullying and hate speech. By doing so, we contribute to the ongoing discussion on offensive language and hate speech in sociolinguistic studies, emphasizing the need to consider user-generated online content.
EN
In the following paper author discuss the natural language processing method (NLP) usage in polish academic literature. In the analysis three fields were pointed out: sociology, political science and literature science. Three groups of texts were presented from Marek Troszyński, Paweł Matuszewski and Maciej Eder. As the result of the conducted analysis author emphasized the most important methodological aspects of NLP usage: contexts, opportunities and risks. Finally, author indicated areas for the further research where NLP would be beneficial method.
PL
W niniejszym artykule autor przedstawia stosowanie metod analizy przetwarzania języka naturalnego (NLP) w obszarze polskich badań. W analizie uwzględniono trzy pola badawcze: socjologiczne, politologiczne oraz literaturoznawcze. Omówione zostały prace takich badaczy, jak Marek Troszyński, Paweł Matuszewski oraz Maciej Eder. Efektem przeprowadzonej analizy było nakreślenie najważniejszych aspektów metodologicznych związanych z używaniem metody NLP: kontekstów, możliwości oraz zagrożeń. Finalnie wskazano dalsze perspektywy badawcze, w których stosowanie omawianych metod może przynieść potencjalnie pozytywne rezultaty.
19
61%
EN
This articles presents the results of a quantitative analysis of frequently appearing words in a data set of over 2,500 Polish texts: Polish literature from the fourteenth to twenty-first century, and Polish translations from English, French, Russian and (to a lesser degree) other languages. The data set reveals a visible signal by type and by original language. The results also point to a definite stylometric specificity of Polish translations of Shakespeare, and their stylometric resemblance to Polish romantic and neoromantic dramas.
PL
W artykule przedstawiono wyniki analizy ilościowej najczęstszych słów korpusu ponad 2500 tekstów polskich: literatura polska od XIV do XXI wieku oraz polskie przekłady z angielskiego, francuskiego, rosyjskiego i (w mniejszym stopniu) innych języków. Wykazano istnienie w korpusie silnego sygnału rodzajowego i sygnału języka wyjściowego. Wyniki wskazują również na wyraźną odrębność stylometryczną języka polskich przekładów szekspirowskich i ich bardzo silne podobieństwo stylometryczne do polskiego dramatu romantycznego i neoromantycznego.
PL
Badanie tekstów przynależnych do odmiany retorycznej dawnej komunikacji politycznej wymaga nawiązywania do innych niż językoznawstwo dyscyplin nauki. Uprawianie omawianego w artykule obszaru badań pozwala na łączenie narzędzi wypracowanych w ramach językoznawstwa i nauk politycznych, dlatego ów obszar określa się też mianem lingwistyki politycznej (politolingwistyki). Pomocne jest również ujęcie socjolingwistyczne, czerpiące inspirację z nauk socjologicznych i traktujące teksty powstałe w ramach działalności politycznej jako świadectwo socjolektu szlachty. Czytelne zależności ujawniają się poza tym między specyfiką ustroju I Rzeczpospolitej i obowiązującym w niej systemem wartości a ideą przemawiania, która jest manifestacją wolności szlacheckiej. Zjawiska komunikacyjne w tym wypadku znajdują uzasadnienie w naukach historycznych. Ponadto rozwój badań korpusowych oraz przyrastający zasób zdygitalizowanych tekstów powstałych przed 1795 rokiem stwarza okazję do wykorzystywania narzędzi językoznawstwa statystycznego oraz stylometrii.
EN
Linguistics proves insufficient in researching texts that represent the rhetoric variety of former political communication. The study of the area discussed in the article requires a combination of scientific tools developed by linguistics and political sciences; therefore the area in question is referred to as political linguistics. A sociolinguistic approach is also helpful here as it draws on sociological sciences and treats the texts that result from political activity as an evidence of the sociolect of the nobility. Additionally, one may observe various links between the specific character of the political system of the Polish-Lithuanian Commonwealth, including the binding value system, and the idea of public speeches – a manifestation of the nobility’s freedom. The development of corpus research and the constantly growing digitalised repository of pre-1795 texts offer an opportunity to implement the tools of static linguistics and stylometry.
first rewind previous Strona / 2 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.