Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 12

Liczba wyników na stronie
first rewind previous Strona / 1 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  optical character recognition
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 1 next fast forward last
PL
Przedsiębiorstwa i organizacje przetwarzają ogromne ilości dokumentacji papierowej, co angażuje pracowników do żmudnej i błędogennej pracy. Artykuł opisuje techniki, które można zastosować, aby zautomatyzować ten proces w celu wydobywania istotnych informacji z dokumentów takich jak podmiot i przedmiot umowy, terminy i daty, lokalizacja, dane techniczne obiektów i inne, specyficzne dla danego typu dokumentu. System iDoc stosuje elementy sztucznej inteligencji w rozpoznawaniu treści dokumentów, pozwala osiągnąć 10-krotne przyspieszenie przetwarzania przy zachowaniu wysokiej dokładności, a także umożliwia ręczną weryfikację danych.
EN
In today's business landscape, companies and organizations grapple with processing extensive volumes of paper documents, burdening their employees with tedious and error-prone tasks. This article presents innovative techniques for automating this process by efficiently extracting critical information from various documents, including contract subjects and objects, dates, deadlines, locations, technical data about devices, and other specific contents pertaining to distinct document types. Leveraging Artificial Intelligence, the iDoc system identifies document contents, enabling users to process data ten times faster while maintaining a high level of accuracy. By adopting iDoc, manual data processing becomes obsolete, while still allowing users to validate extracted information.
EN
Currently, handwritten character recognition (HCR) technology has become an interesting and immensely useful technology; it has been explored with impressive performance in many languages. However, few HCR systems have been proposed for the Amazigh (Berber) language. Furthermore, the validation of any Amazigh handwritten character-recognition system remains a major challenge due to the lack of availability of a robust Amazigh database. To address this problem, we first created two new data sets for Tifinagh and Amazigh Latin characters by extending the well-known EMNIST database with the Amazigh alphabet. Then, we proposed a handwritten character recognition system that is based on a deep convolutional neural network to validate the created data sets. The proposed convolutional neural network (CNN) has been trained and tested on our created data sets, the experimental tests showed that it achieves satisfactory results in terms of accuracy and recognition efficiency.
EN
In this paper we present an approach to text area detection using binary images, Constrained Run Length Algorithm and other noise reduction methods of removing the artefacts. Text processing includes various activities, most of which are related to preparing input data for further operations in the best possible way, that will not hinder the OCR algorithms. This is especially the case when handwritten manuscripts are considered, and even more so with very old documents. We present our methodology for text area detection problem, which is capable of removing most of irrelevant objects, including elements such as page edges, stains, folds etc. At the same time the presented method can handle multi-column texts or varying line thickness. The generated mask can accurately mark the actual text area, so that the output image can be easily used in further text processing steps.
4
Content available remote Robotic process automation of unstructured data with machine learning
EN
In this paper we present our work in progress on building an artificial intelligence system dedicated to tasks regarding the processing of formal documents used in various kinds of business procedures. The main challenge is to build machine learning (ML) models to improve the quality and efficiency of business processes involving image processing, optical character recognition (OCR), text mining and information extraction. In the paper we introduce the research and application field, some common techniques used in this area and our preliminary results and conclusions.
5
Content available Preprocessing Photos of Receipts for Recognition
EN
The subject of this work is methods of image pre-processing, applied to receipts photos. The purpose is to improve their quality, allowing to increase the efficiency of the conventional text recognition software (OCR). The authors had mainly difficult cases in mind – photos taken freehand in unfavorable lighting conditions. The work describes the analyzed methods of filtering, binarization, searching for the edge of the image, image straightening, marking the area of interest, thinning. The preliminary results with OCR software on a small data set were also presented. Thanks to pre-processing, character recognition efficiency has been improved by 25%. The final part presents conclusions and plans for future work.
PL
Tematem tej pracy są metody przetwarzania wstępnego obrazów, zastosowane do zdjęć przedstawiających paragony. Celem jest poprawa ich jakości, pozwalająca zwiększyć skuteczność działania oprogramowania do rozpoznawania tekstu. Autorzy mieli na uwadze głównie trudne przypadki – zdjęć robionych „z ręki”, przy słabym oświetleniu. Praca opisuje przeanalizowane metody filtrowania, binaryzacji, wyszukiwania krawędzi, prostowania obrazu, oznaczania obszaru zainteresowania, ścieniania. Przedstawiono również wstępne wyniki testów z oprogramowaniem OCR na niewielkiej bazie obrazów. Przetwarzanie wstępne pozwoliło na poprawę identyfikacji znaków o 25%. W końcowej części przedstawiono wnioski oraz plany przyszłej pracy.
EN
We propose a method that enables effective code reuse between evolutionary runs that solve a set of related visual learning tasks. We start with introducing a visual learning approach that uses genetic programming individuals to recognize objects. The process of recognition is generative, i.e., requires the learner to restore the shape of the processed object. This method is extended with a code reuse mechanism by introducing a crossbreeding operator that allows importing the genetic material from other evolutionary runs. In the experimental part, we compare the performance of the extended approach to the basic method on a real-world task of handwritten character recognition, and conclude that code reuse leads to better results in terms of fitness and recognition accuracy. Detailed analysis of the crossbred genetic material shows also that code reuse is most profitable when the recognized objects exhibit visual similarity.
EN
Text segmentation represents the key element in the optical character recognition process. Hence, testing procedure for text segmentation algorithms has significance importance. All previous works deal mainly with text database as a template. They are used for testing as well as for the evaluation of the text segmentation algorithm. However, because of inconsistencies in this process, some methodology for the experiments is required. In this manuscript, methodology for the evaluation of the algorithm for text segmentation based on errors type is proposed. It is established on the various multiline text samples linked with text segmentation. Final result is obtained by comparative analysis of cross linked data. At the end, its suitability for different type of scripts represents its main advantage.
PL
Segmentacja tekstu stanowi kluczowy element procesu optycznego rozpoznawania znaków. Wszystkie dotychczasowe prace dotyczą głównie bazy danych tekstu jako szablonu. Są one używane do testowania, jak i dla oceny algorytmu segmentacji tekstu. Jednak w taki, algorytmie występują nieścisłości. W pracy przedstawiono , metodologię oceny algorytmu segmentacji tekstu w oparciu o typ błędów. Badania przeprowadzono na różnych próbkach tekstu wielowierszowego. Końcowy wynik uzyskuje się poprzez analizę porównawczą danych.
8
Content available remote Basic experiments set for the evaluation of the text line segmentation
EN
Text line segmentation represents the key point of the optical character recognition process. All previous works deal primarily with various text database as a reference for the evaluation of the text line segmentation. Due to inconsistencies in measurement and evaluation of text line segmentation algorithm quality, some basic set of test experiments is required. In this paper, basic set of exepriments for the evaluation of the algorithm’s text line segmentation is proposed. This test set consists of a few experiments primarily linked to text line segmentation. Although they are mutually independent, the obtained results are strongly cross linked. At the end, its suitability for different types of letters and languages as well as its adaptability are its main advantages.
PL
Segmentacja linii tekstu jest ważnym elementem optycznego rozpoznawania znaków. Większość dotychczasowych metod opierało się na bazach danych tekstów. Na skutek niespójności w detekcji i ocenie linii tekstu wymagane jest przeprowadzenie pewnych podstawowych testów. W artykule zaproponowano kilka eksperymentów umożliwiających ocenę algorytmów segmentacji linii tekstu. Chociaż eksperymenty są wzajemnie niezależne otrzymane rezultaty są ze sobą powiązane. Są one możliwe do zastosowania dla różnych typów liter oraz różnych języków.
EN
With the establishment of commercial OCR systems for Latin text, recent research efforts have been directed at the design of recognition systems for non-Latin scripts, such as Japanese, Cyrillic, Chinese, Hindi, Tibetan, and in particular Arabic. The Unicode 4.0 standard supports 50 scripts that are used across the world, and many, such as Amharic (Ethiopic), have attracted virtually no attention from researchers. An extensive literature review reveals no papers which report on an OCR system for Amharic. This paper describes a normalised technique which can be used for recognition of isolated Arabic, Amharic and Latin characters. Two approaches are considered for identifying the characters by comparing them to a series of templates and using a signature template scheme. The degrees of similarity between pairs of Amharic, Arabic and typical Latin characters are presented in the confusion matrix, and the performance of the two approaches is compared for each of these three character sets.
10
Content available remote Optical recognition of car license numbers
EN
In this paper, we describe a system able to recognize the V.L.P. (Vehicle License Plate) of a car from an image of it. Our system uses various image processing techniques, such as mathematical morphology and O.C.R. theory. Results are good. This system has many practical applications, such as: parking accounting, traffic monitoring, stolen car detection and security systems of many kinds.
EN
This paper present results of applying decision tree to printed and handwritten character recognition. An automatic feature generation method was employed during the construction of the tree, which improved the recognition rate for the testing set. This learning technique significantly reduces the drawback of the tree classifiers that is thier rapid error accumulation with depth, while it does not influence the size of trees. It was shown that the proposed approach gives better results thsn increasing the size of the training sets used for construction of the trees. The recognition rate above 97% was obtained by means of a parallel classifier built of multiple decision trees despite no advanced preprocessing of input characters (like skeletonization or slant reduction) was performed.
12
Content available remote A new approach to OCR
EN
The paper presents an attempt to apply the Rough Sets Theory to Optical Character Recognition with purpose of accelerating the recognition process and decreasing the database of the characters which is usually very big. Simultaneously, it leads to lower performance and price requirements for automatic identification systems. In this approach specific characters features are referred to as an information systems. The Rough Sets Theory allows extracting the most important information from the system, neglecting the other - irrelevant. This process is fully automatic and does not require any human decision in the area of usefulness of certain characters' features. A discernibility matrix, which is built in this way, constitutes a reduced database for classification algorithms. A brief description of Classical Optical Character Recognition Theory and Rough Sets Theory as well as some selected research and experimental results are also presented. As it turns out to be, even 95% of information on the recognized characters may be neglected if certain criteria are met.
PL
W artykule zaprezentowano próbę zastosowania teorii zbiorów przybliżonych do rozpoznawania znaków. Ma to na celu przyspieszenie procesu rozpoznawania oraz zmniejszenie (zwykle bardzo dużej) bazy danych o poszeczegolnych znakach. Tym samym prowadzi to do obniżenia wymagań wydajnościowych, a zarazem ceny, stawianych systemom automatycznej identyfikacji znaków. W tym podejściu charakterystyczne cechy znaków (wyznaczone tradycyjnymi metodami) traktowane są jako system informacyjny. Teoria zbiorów przybliżonych pozwala na wyodrębnienie z niego najwazniejszych informacji (z punktu widzenia rozpoznawania), a odrzucenie pozostałych - nieistotnych. Proces ten jest w pełni automatyczny i nie wymaga od człowieka podejmowania żadnych decyzji, co do przydatności określonych cech znaków. Utworzona w ten sposób macierz rozróżnialności stanowi zredukowaną bazę danych dla algorytmów klasyfikujących. W artykule zamieszczono także krótki opis teorii rozpoznawania znakow i teorii zbiorów przybliżonych oraz wybrane wyniki przeprowadzonych badań i eksperymentów. Jak się okazuje, przy spełnieniu pewnych warunków, nawet 95% informacji o rozpoznawanych znakach może być pominięte bez uszczerbku dla jakosci klasyfikacji.
first rewind previous Strona / 1 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.