Analysis of the shape of a Laplacian spectrogram is a new line of research used in graph spectral clustering. More precisely, we observed that (properly normalized) plots of the eigenvalues of sub-Laplacians characterizing different groups of documents differ in their shape. Thus, by computing the distance between these plots, we can solve the problem of clustering and classifying new observations. This idea is developed in a number of our papers and as such, can be considered a pioneering approach to cluster analysis. In an attempt to answer why it is so useful, in this paper we consider the hypothesis that the shape of a spectrogram could be attributed to the writing style of the authors of the document group in the cluster. We explore this hypothesis for several models of word distribution. In particular, we assume that the writing style is reflected in the word distribution of texts of an author or a group of them. We check if changing of distribution parameters of a widely accepted log-normal word distribution model changes in fact the Laplacian eigenvalue spectrogram in such a way as to distinguish between document groups. We found that in fact variation of each of the distribution parameters leads to distinct groups of documents. These findings justify the usage of Laplacian spectrograms to distinguish (cluster or classify) groups of documents.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.