Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 4

Liczba wyników na stronie
first rewind previous Strona / 1 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  statistical machine translation
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 1 next fast forward last
EN
Word sense disambiguation deals with deciding the word’s precise meaning in a certainspecific context. One of the major problems in natural language processing is lexical-semantic ambiguity, where a word has more than one meaning. Disambiguating the senseof polysemous words is the most important task in machine translation. This researchwork aims to design and implement English to Hindi machine translation. The designmethodology addresses improving the speed and accuracy of the machine translation process. The algorithm and modules designed in this research work have been deployed on theHadoop infrastructure, and test cases are designed to check the feasibility and reliabilityof this process. The research work presented describes the methodologies to reduce datatransmission by adding a translation memory component to the framework. The speed ofexecution is increased by replacing the modules in the machine translation process withlightweight modules, which reduces infrastructure and execution time.
2
Content available remote A Translation Evaluation Function based on Neural Network
EN
In this paper, we study the feasibility of using a neural network to learn a fitness function for a machine translation system based on a genetic algorithm termed GAMaT. The neural network is learned on features extracted from pairs of source sentences and their translations. The fitness function is trained in order to estimate the BLEU of a translation as precisely as possible. The estimator has been trained on a corpus of more than 1.3 million data. The performance is very promising: the difference between the real BLEU and the one given by the estimator is equal to 0.12 in terms of Mean Absolute Error.
EN
Text alignment and text quality are critical to the accuracy of Machine Translation (MT) systems, some NLP tools, and any other text processing tasks requiring bilingual data. This research proposes a language-independent bisentence filtering approach based on Polish (not a position-sensitive language) to English experiments. This cleaning approach was developed on the TED Talks corpus and also initially tested on the Wikipedia comparable corpus, but it can be used for any text domain or language pair. The proposed approach implements various heuristics for sentence comparison. Some of the heuristics leverage synonyms as well as semantic and structural analysis of text as additional information. Minimization of data loss has been? ensured. An improvement in MT system scores with text processed using this tool is discussed.
4
Content available remote An Efficient Framework for Extracting Parallel Sentences from Non-Parallel Corpora
EN
Automatically building a large bilingual corpus that contains millions of words is always a challenging task. In particular in case of low-resource languages, it is difficult to find an existing parallel corpus which is large enough for building a real statistical machine translation. However, comparable non-parallel corpora are richly available in the Internet environment, such as in Wikipedia, and from which we can extract valuable parallel texts. This work presents a framework for effectively extracting parallel sentences from that resource, which results in significantly improving the performance of statistical machine translation systems. Our framework is a bootstrapping-based method that is strengthened by using a new measurement for estimating the similarity between two bilingual sentences. We conduct experiment for the language pair of English and Vietnamese and obtain promising results on both constructing parallel corpora and improving the accuracy of machine translation from English to Vietnamese.
first rewind previous Strona / 1 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.