TEXT CLASSIFICATION USING MODIFIED MULTI CLASS ASSOCIATION RULE

Authors

  • Siti Sakira Kamaruddin School of Computing, Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia
  • Yuhanis Yusof School of Computing, Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia
  • Husniza Husni School of Computing, Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia
  • Mohammad Hayel Al Refai School of Computing, Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia

DOI:

https://doi.org/10.11113/jt.v78.9553

Keywords:

Text mining, Frequent Pattern Mining, Associative Classification, Multi Class Association Rule.

Abstract

This paper presents text classification using a modified Multi Class Association Rule Method. The method is based on Associative Classification which combines classification with association rule discovery. Although previous work proved that Associative Classification produces better classification accuracy compared to typical classifiers, the study on applying Associative Classification to solve text classification problem are limited due to the common problem of high dimensionality of text data and this will consequently results in exponential number of generated classification rules. To overcome this problem the modified Multi-Class Association Rule Method was enhanced in two stages. In stage one the frequent pattern are represented using a proposed vertical data format to reduce the text dimensionality problem and in stage two the generated rule was pruned using a proposed Partial Rule Match to reduce the number of generated rules. The proposed method was tested on a text classification problem and the result shows that it performed better than the existing method in terms of classification accuracy and number of generated rules.

References

Dong, G., X. Zhang, L. Wong, and J. Li. 1999. CAEP: Classification By Aggregating Emerging Patterns. In Proceedings of the Second International Conference on Discovery Science (DS '99), SetsuoArikawa and Koichi Furukawa (Eds.). Springer-Verlag, London, UK. 30-42.

Li, W., J. Han, and J. Pei, 2001. CMAR: Accurate And Efficient Classification Based On Multiple Class-Association Rules, In Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM '01), Nick Cercone, Tsau Young Lin, and Xindong Wu (Eds.). IEEE Computer Society, Washington, DC, USA. 369-376.

Simon, G. J., V. Kumar, andP. W. Li. 2011. A Simple Statistical Model And Association Rule Filtering For Classification, In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge discovery and data mining (KDD '11). ACM, New York, NY, USA. 823-831.

Thabtah, F., Q. Mahmood, L. McCluskey, H. Abdel-Jaber, 2010. A New Classification Based on Association Algorithm. Journal of Information & Knowledge Management. 9: 55-64.

Baralis, E., S. Chusano and P. Garza, 2004. On support Thresholds In Associative Classification, Proceedings of the 2004 ACM Symposium on Applied Computing, (SAC '04). ACM, New York, NY, USA. 553-558.

Yoon Y. & G. G. Lee, 2008. Text Categorization Based On Boosting Association Rules, IEEE International Conference on Semantic Computing.136-143.

Thabtah, F. P. Cowling, and Y. Peng. 2005. MCAR: Multi-Class Classification Based On Association Rule, Proceeding of the 3rd IEEE International Conference on Computer Systems and Applications. 33.

Omurca, S. I., S.BaÅŸ, E. Ekinci, 2015. An Efficient Document Categorization Approach for Turkish Based Texts. International Journal of Intelligent Systems and Applications in Engineering. 3(1): 7-13.

Hong, S. S., W. Lee, and M. M. Han, 2015. The Feature Selection Method based on Genetic Algorithm for Efficient of Text Clustering and Text Classification. International Journal of Advances in Soft Computing & Its Applications. 7(1): 22-40.

Abdelhamid, N., A. Ayesh, A., and F. Thabtah, 2015. Emerging Trends in Associative Classification Data Mining. International Journal of Electronics and Electrical Engineering. 3(1): 50-53.

Zaki M. J. and K. Gouda, 2003. Fast Vertical Mining Using Diffsets. In Proceedings of the ninth ACM Washington D.C. 326-335.

Agrawal, R., T. Imielinski, A. Swami, 1993. Mining Association Rules Between Sets Of Items In Large Databases. In Proceedings of the 1993 ACM-SIGMOD International Conference on Management of Data (SIGMOD’93), Washington DC. 207–216.

Agrawal, R. and R. Srikant, R. 1994. Fast Algorithms For Mining Association Rules. In Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile. 487-499.

Mannila, H., H. Toivonen, A. I. Verkamo, 1994. Efficient Algorithms For Discovering Association Rules. In Proceeding of the AAAI’94 Workshop Knowledge Discovery in Databases (KDD’94), Seattle, WA. 181–192.

Han, J. and M. Kamber, 2011. Data Mining: Concepts And Techniques. Morgan Kaufmann Pub.

Zaki, M. J., S. Parthasarathy, M. Ogihara, and W. Li, 1997. New Algorithms For Fast Discovery Of Association Rules. In 3rd KDD Conference New York.

Lichman, M. 2013. UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Thabtah F. and S. Hammoud, 2013. Mr-Arm: A Map-Reduce Association Rule Mining Framework, Parallel Processing Letters. (23): 1350012.

Thabtah, F., P. Cowling and Y. Peng 2004. MMAC: A New Multi-Class, Multi-Label Associative Classification Approach. In Proc. Fourth IEEE Int. Conf. on Data Mining (ICDM'04), Brighton, UK. 217–224.

Niu, Q., S. X. Xia and L. Zhang, 2009. Association Classification Based on Compactness of Rules. In Second International Workshop on Knowledge Discovery and Data Mining. 245-247.

H. Ishibuchi, I. Kuwajima, and Y. Nojima, 2007. Prescreening Of Candidate Rules Using Association Rule Mining And Pareto-Optimality In Genetic Rule Selection, Knowledge-Based Intelligent Information and Engineering Systems (4693) Lecture Notes in Computer Science. 509-516.

Liu, B., W. Hsu, and Y. Ma, 1998. Integrating Classification And Association Rule Mining. Knowledge Discovery And Data Mining. 80–86.

Yusof Y. and M. H. Refai, 2012 MMCAR: Modified Multi-Class Classification Based On Association Rule, Information Retrieval & Knowledge Management (CAMP): 6-11.

Refai M. H. and Y. Yusof. 2014. Partial Rule Match for Filtering Rules in Association Classification. Journal of Computer Science. (10): 570-577.

Downloads

Published

2016-08-04

How to Cite

TEXT CLASSIFICATION USING MODIFIED MULTI CLASS ASSOCIATION RULE. (2016). Jurnal Teknologi, 78(8-2). https://doi.org/10.11113/jt.v78.9553