OFFLINE OCR ALGORITHM TO DETECT KURDISH/ ARABIC CHARACTERS IN SCANNED DOCUMENT

Authors

  • Sardar Omar Salih Department of Information Technology, Polytechnic University, Kurdistan Region –Iraq

DOI:

https://doi.org/10.25007/ajnu.v11n3a673

Keywords:

Script, Character, OCR, Kurdish/ Arabic characters.

Abstract

In this paper, Algorithm named (MRWL) Max Rightmost White Line is proposed to detect Kurdish/ Arabic characters’ segmentation in scanned document (printed document), it works in preprocess and segmentation stages of OCR processes, these two stages are significant parts of OCR and affects the accuracy of algorithm. The MRWL starts to remove text margins around document to reduce processing time, then, scans to find Top Line (TL) and Bottom Line (BL) for each sentence in paragraph which can be used to measure height of characters. Based on TL and BL, the Base Line (BSL) can be detected using horizontally Most Frequency Black Pixel (MFBP) which is useful to find characters’ segmentation (Atallah and Omar, 2008)

. Finding TL, BL and BSL of each sentence help to find characters location in document. Six phases involve in algorithm, each phase has its own functionally. The Algorithm is tested with different input documents and the average accurate rate of detected segmentations is recorded as 96.93%.

Downloads

Download data is not yet available.

References

AL­Shatnawi Atallah and Khairuddin Omar. Methods of arabic language baseline detection the state of art. IJCSNS, 8(10):137, 2008. Malayappan Shridhar and A. Badreldin. High accuracy character recognition algorithm using fourier and topological descriptors. Pattern Recognition, 17(5):515–524, 1984.
Hassan Althobaiti and Chao Lu. A survey on arabic optical character recognition and an isolated handwritten arabic character recognition algorithm using encoded freeman chain code. In 2017 51st Annual Conference on Information Sciences and Systems (CISS), pages 1–6. IEEE, 2017.
Al-Badr, B., & Haralick, R. M. (1998). A segmentation-free approach to text recognition with application to Arabic text. International Journal on Document Analysis and Recognition, 1(3), 147–166.
Haraty, R., & Catherine G., (2004). Arabic text recognition.
Tomeh, N., Habash, N., Roth, R., Farra, N., Dasigi, P., & Diab, M. (2013). Reranking with linguistic and semantic features for Arabic optical character recognition. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 549–555.
Aljarrah, I., Al-Khaleel, O., Mhaidat, K., Alrefai, M., Alzu’bi, A., & Rabab’ah, M. (2012). Automated system for Arabic optical character recognition. Proceedings of the 3rd International Conference on Information and Communication Systems, 1–6.

Maad Shatnawi. Off­line handwritten arabic character recognition: a survey. In Proceedings of the international conference on image processing, computer vision, and pattern recognition (IPCV), page 52. The Steering Committee of The World Congress in Computer Science, Computer …, 2015.

Optimizing the color­to­gray­scale conversion for image classification.

Mehmet Sezgin and Bülent Sankur. Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electronic imaging, 13(1):146–166, 2004.

Jawad H. AlKhateeb, Jinchang Ren, Stan S. Ipson, and Jianmin Jiang. Knowledge­based baseline detection and optimal thresholding for words segmentation in efficient pre­processing of handwritten Arabic text. In Fifth International Conference on Information Technology: New Generations (itng 2008), pages 1158–1159. IEEE, 2008.

Rasty Yaseen and Hossein Hassani. Kurdish optical character recognition. UKH Journal of Science and Engineering, 2(1):18–27, 2018.

Omar Al­Jarrah, Samer Al­Kiswany, Mohammad Fraiwan, and Hani Khasawneh. A new algorithm for arabic optical character recognition. WSEAS Transactions on Information Science and Applications, 3, 2006.

Published

2022-06-30

How to Cite

Omar Salih, S. (2022). OFFLINE OCR ALGORITHM TO DETECT KURDISH/ ARABIC CHARACTERS IN SCANNED DOCUMENT. Academic Journal of Nawroz University, 11(3), 162–169. https://doi.org/10.25007/ajnu.v11n3a673

Issue

Section

Articles