Wyniki wyszukiwania - BazTech

1

A method to integrate word sense disambiguation and translation memory for english to hindi machine translation system

Rawat Sunita

Computer Assisted Methods in Engineering and Science

|

2022

|

Vol. 29, no. 1-2 spec.

125--144

EN

Word sense disambiguation deals with deciding the word’s precise meaning in a certainspecific context. One of the major problems in natural language processing is lexical-semantic ambiguity, where a word has more than one meaning. Disambiguating the senseof polysemous words is the most important task in machine translation. This researchwork aims to design and implement English to Hindi machine translation. The designmethodology addresses improving the speed and accuracy of the machine translation process. The algorithm and modules designed in this research work have been deployed on theHadoop infrastructure, and test cases are designed to check the feasibility and reliabilityof this process. The research work presented describes the methodologies to reduce datatransmission by adding a translation memory component to the framework. The speed ofexecution is increased by replacing the modules in the machine translation process withlightweight modules, which reduces infrastructure and execution time.

2

Towards semantic-rich word embeddings

Beringer Grzegorz, Jabłoński Mateusz, Januszewski Piotr, Sobecki Andrzej, Szymański Julian

Annals of Computer Science and Information Systems

|

2019

|

Vol. 18

273-–276

EN

In recent years, word embeddings have been shown to improve the performance in NLP tasks such as syntactic parsing or sentiment analysis. While useful, they are problematic in representing ambiguous words with multiple meanings, since they keep a single representation for each word in the vocabulary. Constructing separate embeddings for meanings of ambiguous words could be useful for solving the Word Sense Disambiguation (WSD) task. In this work, we present how a word embeddings average- based method can be used to produce semantic-rich meaning embeddings, and how they can be improved with distance optimization techniques. We also open-source a WSD dataset that was created for the purpose of evaluating methods presented in this research.

3

Feature Words Selection for Knowledge-based Word Sense Disambiguation with Syntactic Parsin

Lu W., Huang H., Zhu C.

Przegląd Elektrotechniczny

|

2012

|

R. 88, nr 1b

82-87

EN

Feature words are crucial clues for word sense disambiguation. There are two methods to select feature words: window-based and dependency-based methods. Both of them have some shortcomings, such as irrelevant noise words or paucity of feature words. In order to solve the problems of the existing methods, this paper proposes two methods to select feature words with syntactic parsing, which are based on phrase structure parsing tree (PTree) and dependency parsing tree (DTree). With the help of syntactic parsing, the proposed methods can select feature words more accurately, which can alleviate the effect of noise words of window-based method and can avoid the paucity of feature words of dependency-based method. Evaluation is performed on a knowledge-based WSD system with a publicly available lexical sample dataset. The results show that both of the proposed methods are superior to window-based and dependency-based methods, and the method based on PTree is better than the method based on DTree. Both of them are preferred strategies to select feature words to disambiguate ambiguous words.

PL

W artykule zaproponowano dwie metody selekcji cech słowa bazujące na analizie składni struktury frazy oraz analizie składni zależności. Badania przeprowadzono przy wykorzystaniu rożnych baz danych. Proponowana metoda ma większą dokładność niż dotychczas stosowane metody: okna i zależności.

4

Evaluating lexicographer controlled semi-automatic word sense disambiguation method in a large scale experiment

Broda B., Piasecki M.

Control and Cybernetics

|

2011

|

Vol. 40, no 2

419-436

EN

Word Sense Disambiguation in text remains a difficult problem as the best supervised methods require laborious and costly manual preparation of training data. On the other hand, the unsupervised methods yield significantly lower precision and produce results that are not satisfying for many applications. Recently, an algorithm based on weakly-supervised learning for WSD called Lexicographer-Controlled Semi-automatic Sense Disambiguation (LexCSD) was proposed. The method is based on clustering of text snippets including words in focus. For each cluster we find a core, which is labelled with a word sense by a human, and is used to produce a classifier. Classifiers, constructed for each word separately, are applied to text. The goal of this work is to evaluate LexCSD trained on large volume of untagged text. A comparison showed that the approach is better than most frequent sense baseline in most cases.

5

Adjective Sense Disambiguation at the Border Between Unsupervised and Knowledge-Based Techniques

Hristea F., Popescu M.

Fundamenta Informaticae

|

2009

|

Vol. 91, nr 3-4

547-562

EN

The present paper extends a new word sense disambiguation method [9] to the case of adjectives. The method lies at the border between unsupervised and knowledge-based techniques. It performs unsupervised word sense disambiguation based on an underlying Näive Bayes model, while using WordNet as knowledge source for feature selection. The proposed extension of the disambiguation method makes ample use of the WordNet semantic relations that are typical of adjectives. Its performance is compared to that of previous approaches that rely on completely different feature sets. Test results show that feature selection using a knowledge source of type WordNet is more effective in the disambiguation of adjective senses than local type features (like part-of-speech tags) are.

6

Word Sense Disambiguation by Machine Learning Approac : A Short Survey

Tatar D.

Fundamenta Informaticae

|

2005

|

Vol. 64, nr 1-4

433-442

EN

There is a renewed interest in word sense disambiguation (WSD) as it contributes to various applications in natural language processing. In this paper we survey vector-based methods for WSD in machine learning. All the methods are corpus-based and use definition of context in the sense introduced by S. Marcus [11].