Document image segmentation into text lines is one of the stages in unconstrained handwritten document recognition. This paper presents a new algorithm for text line separation in handwriting. The developed algorithm is based on a method using the projection profile. It employs thresholding, but the threshold value is variable. This permits determination of low or overlapping peaks of the graph. The proposed technique is shown to improve the recognition rate relative to traditional methods. The algorithm is robust in text line detection with respect to different text line lengths.
In the vast archives and libraries of the world, countless historical documents are tucked away, often difficult to access. Thankfully, the digitization process has made it easier to view these invaluable records. However, simply digitizing them is not enough – the real challenge lies in making them searchable and computer-readable. Many of these documents were handwritten, which means they need to undergo handwriting recognition. The first step in this process is to divide the document into lines. This article introduces a solution to this problem using tensor voting. The algorithm starts by conducting voting on the binary image itself. Then, using the local maxima found in the resulting tensor field, the lines of text are precisely tracked and labeled. To ensure its effectiveness, the algorithm’s performance was tested on the data-set delivered by the organizers of the ICDAR 2009 competition and evaluated using the criteria from this contest.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.