Segmentation For Handwritten Gujarati Text Documents: A Review
Kunal Shah.
Babu Madhav Institute of Information Technology,
Uka Tarsadia University Maliba Campus, Gopal Vidhyanagar,
Bardoli, Gujarat, India.
ABSTRACT
Optical Character Recognition is most use full thing in market. Optical Character Recognition very difficult task in information technology area but in some where achieve the solution by many researcher and experts people. In this paper you may know about what is OCR and its application, its varies technique and example, the main thing here I focus that is segmentation part mainly in Guajarati handwritten text. I tried to show difficulty of segmentation in Guajarati language. And also compare various approach that done by researcher…show more content… Consonants can be connected with vowel extensions.
Figure 2 : Diagram of Guajarati script
1.4 Problems in Gujarati text document segmentation
Sr. no. Types Problems Description Examples
1.
Line Segmentation 1) Modifier overlapping
The lower modifier of one line overlaps with the upper modifiers of lower line. Figure no 3a 2) Zigzag line/Word/Character , It creates curvature in the lines. text is not in proper line. Figure no 3b
3) Unusual line spacing Spacing is not proper between two or more then two lines Figure no 3c
2.
Word Segmentation 1) Unusual spacing in inter-word and intra-word Spacing between two word are not proper because of that spacing problem occurs. Figure no 4
3.
Character Segmentation 1) Upper region problems
i. Unusual size of upper modifier Figure no 5a
ii. Merging of lower modifier with consonant Figure no 5b
iii. Touching of upper modifier with another upper modifier Figure no…show more content… from above literature review they conclude that there is still so many problems comes in OCR of handwritten Guajarati characters for Segmentation. mainly problems comes in character segmentation phase. If want to better result then character must be in human readable form and also in proper manner. If some how work is done in Guajarati segmentation that is also for printed text not for handwritten text. Solving above problems we can increase the accuracy of recognition phase and get better result in OCR.
4. References:
[1] https://en.wikipedia.org/wiki/Optical_character_recognition#Applications.
[2] Prof S K Shah , “Design and Implementation of Optical Character Recognition System to Recognize Gujarati Script using Template Matching” ,IE (I) Journal- ET , vol 86 , 2006.
[3] Karmal filter based refernces
[4] A. Zahour, B. Taconet, P. Mercy, and S. Ramdane,“Arabic Hand-written Text-line Extraction”, Proceedings of the Sixth International. Conference on DocumentAnalysis and Recognition, ICDAR, pp. 281–285, 2001.
[5] N. Tripathy and U. Pal., “Handwriting Segmentationof unconstrained Oriya Text”, International Workshop on Frontiers in Handwriting Recognition, pp. 306–311, 2004.
[6] Naresh Kumar Garg ate.“A New Method for Line Segmentation of Handwritten Hindi Text” ,Seventh International Conference on Information