A NOVEL DATASET FOR QURANIC WORDS IDENTIFICATION AND AUTHENTICATION

Authors

Thabit Sabbah Faculty of Computing, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia
Ali Selamat Faculty of Computing, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor, Malaysia

DOI:

https://doi.org/10.11113/jt.v75.4993

Keywords:

Quranic words, identification, authentication, dataset, Arabic, diacritic words

Abstract

Quran is the holy book for Muslims around the world. For the past fourteen centuries after its revelation, ithas been preserved in all possible ways from any distortions. The huge increase in Internet usage and the spread of digital media lead to the development of many websites, services, and applications related to Quran. These efforts include the conversion of Quranic verses, translations, explanations,tafseer and other Quranic sciences into digital formats. Some of these efforts are foundless authentic. The authentication dependson correct identification of Quranic words in the text. In this paper, we introduce a novel dataset for Quranic words identification and authentication. The proposed dataset contains more than 93,000 samples with64 features for each extracted in numerical form.The validation tests of the proposed dataset resulted high accuracy average.

References

Aabed, M. A., et al. 2007. Arabic Diacritics based Steganography. Signal Processing and Communications, 2007. ICSPC 2007. IEEE International Conference on. 2007.

Alshareef, A. and A. E. Saddik. 2012. A Quranic Quote Verification Algorithm for Verses Authentication. Innovations in Information Technology (IIT), 2012 International Conference on. 2012.

Alsulamy, E. 1999. Fundamentalists Used Quran and Sunni to Extract the Rules of Fundamentalism. Riyadh: Al Rushed library.

Shamsudin, A. F. and A. Farooq. 2000. AI Natural Language in Meta-Synthetics of Al-Qur'an. TENCON 2000.

Noordin, M. F. and R. Othman. 2006. An Information Retrieval System for Quranic Texts: A Proposed System Design. 2nd Information and Communication Technologies, 2006. ICTTA '06.

Al-Khalifa, H. S., et al. 2009. SemQ: A Proposed Framework for Representing Semantic Opposition in the Holy Quran using Semantic Web technologies. International Conference on the in Current Trends in Information Technology (CTIT), 2009.

Shoaib, M., et al. 2009. Relational WordNet model for semantic search in Holy Quran. International Conference on Emerging Technologies, 2009. ICET 2009.

Baqai, S., et al. 2009. Leveraging Semantic Web Technologies for Standardized Knowledge Modeling and Retrieval from the Holy Qur'an and Religious Texts. Proceedings of the 7th International Conference on Frontiers of Information Technology 2009, ACM. Abbottabad, Pakistan. 1-6.

Yauri, A. R., et al. 2012. Quranic-based Concepts: Verse Relations Extraction using Manchester OWL syntax. International Conference on Information Retrieval & Knowledge Management (CAMP), 2012.

Mukhtar, T., H. Afzal, and A. Majeed. 2012. Vocabulary of Quranic Concepts: A semi-automatically created terminology of Holy Quran. 15th International in Multitopic Conference (INMIC), 2012.

Tanzil.net. 2013. Who is using Tanzil?. [Online]. From: http://tanzil.net/wiki/Who_is_using_Tanzil%3F. [Accessed on 16 May 2013].

Abbas, M. and K. Smaili. 2005. Comparison of Topic Identification Methods for Arabic Language. in RANLP05: Recent Advances in Natural Language Processing 2005. Borovets, Bulgary. 14-17.

Abbas, M., K. Smaili, and D. Berkani. 2011. Evaluation of Topic Identification Methods on Arabic Corpora. Journal Of Digital Information Management. 9(5): 8.

Abuaiadh, D. 2013. Dataset for Arabic document classification. 2013. [Online]. From: http://diab.edublogs.org/dataset-for-arabic-document-classification/. [Accessed on 26 June 2013].

Zaidan, O. F. and C. Callison-Burch. 2013. Arabic Dialect Identification. Computational Linguistics.

Zaidan, O. F. and C. Callison-Burch. 2011. The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content. In 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies 2011. Portland, Oregon, USA: Association for Computational Linguistics.

Selamat, A. 2011. Improved N-grams Approach for Web Page Language Identification, in Transactions on Computational Collective Intelligence V, N. Nguyen, Editor. 2011, Springer Berlin Heidelberg. 1-26.

Selamat, A. and C.C. Ng 2011. Arabic Script Web Page Language Identifications Using Decision Tree Neural Networks. Pattern Recognition. 44(1): 133-144.

Downloads

Published

2015-07-13

Issue

Vol. 75 No. 2: Computer Graphic and Visions

Section

Science and Engineering

License

Copyright of articles that appear in Jurnal Teknologi belongs exclusively to Penerbit Universiti Teknologi Malaysia (Penerbit UTM Press). This copyright covers the rights to reproduce the article, including reprints, electronic reproductions, or any other reproductions of similar nature.