A COMPARATIVE STUDY OF STATISTICAL AND NATURAL LANGUAGE PROCESSING TECHNIQUES FOR SENTIMENT ANALYSIS

Authors

  • Wai-Howe Khong Faculty of Computer Informatics, Multimedia University, Malaysia
  • Lay-Ki Soon Faculty of Computer Informatics, Multimedia University, Malaysia
  • Hui-Ngo Goh Faculty of Computer Informatics, Multimedia University, Malaysia

DOI:

https://doi.org/10.11113/jt.v77.6502

Keywords:

Natural language processing, sentiment analysis, word sense disambiguation

Abstract

Sentiment analysis has emerged as one of the most powerful tools in business intelligence. With the aim of proposing an effective sentiment analysis technique, we have performed experiments on analyzing the sentiments of 3,424 tweets using both statistical and natural language processing (NLP) techniques as part of our background study.  For statistical technique, machine learning algorithms such as Support Vector Machines (SVMs), decision trees and Naïve Bayes have been explored. The results show that SVM consistently outperformed the rest in both classifications. As for sentiment analysis using NLP techniques, we used two different tagging methods for part-of-speech (POS) tagging.  Subsequently, the output is used for word sense disambiguation (WSD) using WordNet, followed by sentiment identification using SentiWordNet.  Our experimental results indicate that adjectives and adverbs are sufficient to infer the sentiment of tweets compared to other combinations. Comparatively, the statistical approach records higher accuracy than the NLP approach by approximately 17%.

References

Koppel, M., & Schler, J. 2006. The Importance of Neutral Examples for Learning Sentiment. Computational Intelligence. Retrieved from http://onlinelibrary.wiley.com/doi/10.1111/j.1467-8640.2006.00276.x/abstract.

B. Pang, L. Lee, H. Rd, and S. Jose. 2002. Thumbs Up ? Sentiment Classification using Machine Learning Techniques. July: 79-86.

H. Jin, M. Huang, and X. Zhu. 2012. Sentiment Analysis with Multi-source Product Reviews. Intell. Comput. Technol. 301-308.

T. Pedersen and S. Banerjee. 2005. Maximizing Semantic Relatedness to Perform. March.

Baccianella, S., Esuli, A., & Sebastiani, F. 2008. SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. 0: 2200-2204.S.

R. Passonneau. 2011. Sentiment Analysis of Twitter Data. June: 30-38.

Ku, L., Liang, Y., Chen, H., Lun-Wei, K., Yu-Ting, L., & Hsin-Hsi, C. 2006. Opinion Extraction, Summarization and Tracking in News and Blog Corpora. In Artificial Intelligence. http://doi.org/citeulike-article-id:2913694. 100-107.

A. Montejo-Ráez, E. Martínez-Cámara, M. T. Martín-Valdivia, and L. A. Ureña-López. 2014. Ranked WordNet graph for Sentiment Polarity Classification in Twitte. Comput. Speech Lang. 28(1): 93-107.

M. Hu, B. Liu, and S. M. Street. 2004. Mining and Summarizing Customer Reviews.

Naive Bayes classifier. 2014. [Online]. Available: http://en.wikipedia.org/wiki/Naive_Bayes_classifier.

Support Vector Machines (SVM). 2014. [Online]. Available: http://www.statsoft.com/textbook/support-vector-machines. [Accessed: 03-Nov-2014].

Decision Tree Classifier. 2014. [Online]. Available: http://mines.humanoriented.com/classes/2010/fall/csci568/portfolio_exports/lguo/decisionTree.html. [Accessed: 03-Nov-2014].

V. Vryniotis. 2013. The importance of Neutral Class in Sentiment Analysis | DatumBox. [Online]. Available: http://blog.datumbox.com/the-importance-of-neutral-class-in-sentiment-analysis/. [Accessed: 10-Apr-2014].

Downloads

Published

2015-11-26

How to Cite

A COMPARATIVE STUDY OF STATISTICAL AND NATURAL LANGUAGE PROCESSING TECHNIQUES FOR SENTIMENT ANALYSIS. (2015). Jurnal Teknologi (Sciences & Engineering), 77(18). https://doi.org/10.11113/jt.v77.6502