OFFLINE OCR ALGORITHM TO DETECT KURDISH/ ARABIC CHARACTERS IN SCANNED DOCUMENT
Keywords:Script, Character, OCR, Kurdish/ Arabic characters.
In this paper, Algorithm named (MRWL) Max Rightmost White Line is proposed to detect Kurdish/ Arabic characters’ segmentation in scanned document (printed document), it works in preprocess and segmentation stages of OCR processes, these two stages are significant parts of OCR and affects the accuracy of algorithm. The MRWL starts to remove text margins around document to reduce processing time, then, scans to find Top Line (TL) and Bottom Line (BL) for each sentence in paragraph which can be used to measure height of characters. Based on TL and BL, the Base Line (BSL) can be detected using horizontally Most Frequency Black Pixel (MFBP) which is useful to find characters’ segmentation (Atallah and Omar, 2008)
. Finding TL, BL and BSL of each sentence help to find characters location in document. Six phases involve in algorithm, each phase has its own functionally. The Algorithm is tested with different input documents and the average accurate rate of detected segmentations is recorded as 96.93%.
Hassan Althobaiti and Chao Lu. A survey on arabic optical character recognition and an isolated handwritten arabic character recognition algorithm using encoded freeman chain code. In 2017 51st Annual Conference on Information Sciences and Systems (CISS), pages 1–6. IEEE, 2017.
Al-Badr, B., & Haralick, R. M. (1998). A segmentation-free approach to text recognition with application to Arabic text. International Journal on Document Analysis and Recognition, 1(3), 147–166.
Haraty, R., & Catherine G., (2004). Arabic text recognition.
Tomeh, N., Habash, N., Roth, R., Farra, N., Dasigi, P., & Diab, M. (2013). Reranking with linguistic and semantic features for Arabic optical character recognition. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 549–555.
Aljarrah, I., Al-Khaleel, O., Mhaidat, K., Alrefai, M., Alzu’bi, A., & Rabab’ah, M. (2012). Automated system for Arabic optical character recognition. Proceedings of the 3rd International Conference on Information and Communication Systems, 1–6.
Maad Shatnawi. Offline handwritten arabic character recognition: a survey. In Proceedings of the international conference on image processing, computer vision, and pattern recognition (IPCV), page 52. The Steering Committee of The World Congress in Computer Science, Computer …, 2015.
Optimizing the colortograyscale conversion for image classification.
Mehmet Sezgin and Bülent Sankur. Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electronic imaging, 13(1):146–166, 2004.
Jawad H. AlKhateeb, Jinchang Ren, Stan S. Ipson, and Jianmin Jiang. Knowledgebased baseline detection and optimal thresholding for words segmentation in efficient preprocessing of handwritten Arabic text. In Fifth International Conference on Information Technology: New Generations (itng 2008), pages 1158–1159. IEEE, 2008.
Rasty Yaseen and Hossein Hassani. Kurdish optical character recognition. UKH Journal of Science and Engineering, 2(1):18–27, 2018.
Omar AlJarrah, Samer AlKiswany, Mohammad Fraiwan, and Hani Khasawneh. A new algorithm for arabic optical character recognition. WSEAS Transactions on Information Science and Applications, 3, 2006.
How to Cite
Copyright (c) 2022 Academic Journal of Nawroz University
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors retain copyright
The use of a Creative Commons License enables authors/editors to retain copyright to their work. Publications can be reused and redistributed as long as the original author is correctly attributed.
- The researcher(s), whether a single or joint research paper, must sell and transfer to the publisher (the Academic Journal of Nawroz University) through all the duration of the publication which starts from the date of entering this Agreement into force, the exclusive rights of the research paper/article. These rights include the translation, reuse of papers/articles, transmit or distribute, or use the material or parts(s) contained therein to be published in scientific, academic, technical, professional journals or any other periodicals including any other works derived from them, all over the world, in English and Arabic, whether in print or in electronic edition of such journals and periodicals in all types of media or formats now or that may exist in the future. Rights also include giving license (or granting permission) to a third party to use the materials and any other works derived from them and publish them in such journals and periodicals all over the world. Transfer right under this Agreement includes the right to modify such materials to be used with computer systems and software, or to reproduce or publish it in e-formats and also to incorporate them into retrieval systems.
- Reproduction, reference, transmission, distribution or any other use of the content, or any parts of the subjects included in that content in any manner permitted by this Agreement, must be accompanied by mentioning the source which is (the Academic Journal of Nawroz University) and the publisher in addition to the title of the article, the name of the author (or co-authors), journal’s name, volume or issue, publisher's copyright, and publication year.
- The Academic Journal of Nawroz University reserves all rights to publish research papers/articles issued under a “Creative Commons License (CC BY-NC-ND 4.0) which permits unrestricted use, distribution, and reproduction of the paper/article by any means, provided that the original work is correctly cited.
- Reservation of Rights
The researcher(s) preserves all intellectual property rights (except for the one transferred to the publisher under this Agreement).
- Researcher’s guarantee
The researcher(s) hereby guarantees that the content of the paper/article is original. It has been submitted only to the Academic Journal of Nawroz University and has not been previously published by any other party.
In the event that the paper/article is written jointly with other researchers, the researcher guarantees that he/she has informed the other co-authors about the terms of this agreement, as well as obtaining their signature or written permission to sign on their behalf.
The author further guarantees:
- The research paper/article does not contain any defamatory statements or illegal comments.
- The research paper/article does not violate other's rights (including but not limited to copyright, patent, and trademark rights).
This research paper/article does not contain any facts or instructions that could cause damages or harm to others, and publishing it does not lead to disclosure of any confidential information.