Arabic script is used by more than 1/4th population of the world in the form of different languages like Arabic, Persian, Urdu, Sindhi, Pashto etc but each language have its own words meaning and set of alphabets. The set of Urdu alphabets is a superset of the alphabets sets for all other Arabic script based languages. Arabic script based languages character recognition is one of the most difficult task due to complexities involved in this script not exist in any other script. This paper present a novel technique Ghost Character Recognition Theory that will helps to develop a Multilanguage character recognition system for Arabic script based languages based on Ghost Character Theory. The main benefit of proposed approach is that it will works for all Arabic script based languages by doing little effort for ghost character (basic skeleton) and developing dictionary for every language. Handling all Arabic script based languages has several issues like recognition rate is low as compared to system for specific languages and specific writing style i.e. Nastaliq or Naskh, but in general, this small difference of recognition rate is not a big issue for multilingual system and at the end we will get multilingual character recognition system.
PL
Języki arabskie są bardzo trudne do zaadaptowania w systemie automatycznego rozpoznawania znaków. W artykule opisano algorytm Ghost character umożliwiający realizację OCR większości języków arabskich.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.