SENTIMENT CLASSIFICATION WITH CONCEPT DRIFT AND IMBALANCED CLASS DISTRIBUTIONS
DOI:
https://doi.org/10.11113/jt.v78.10120Keywords:
Sentiment classification, Concept drift, Imbalanced data, Ensemble learning, Instance selectionAbstract
Document-level sentiment classification aims to automate the task of classifying a textual review, which is given on a single topic, as expressing a positive or negative sentiment. In general, people express their opinions towards an entity based on their characteristics which may change over time. User‘s opinions are changed due to evolution of target entities over time. However, the existing sentiment classification approaches did not considered the evolution of User‘s opinions. They assumed that instances are independent, identically distributed and generated from a stationary distribution, while generated from a stream distribution. They used the static classification model that builds a classifier using a training set without considering the time that reviews are posted. However, time may be very useful as an important feature for classification task. In this paper, a stream sentiment classification framework is proposed to deal with concept drift and imbalanced data distribution using ensemble learning and instance selection methods. The experimental results show the effectiveness of the proposed method in compared with static sentiment classification.Â
References
Pang, B., Lee, L. and Vaithyanathan, S. 2002. Thumbs Up?: Sentiment Classification Using Machine Learning Techniques. Proceedings Of The ACL-02 Conference On Empirical Methods In Natural Language Processing. Association for Computational Linguistics. 10: 79-86.
Abbasi, A. 2010. Intelligent Feature Selection For Opinion Classification. IEEE Intelligent Systems. 25: 75-79.
Wang, G., Sun, J., Ma, J., Xu, K., & Gu, J. 2014. Sentiment Classification: The Contribution Of Ensemble Learning. Decision Support Systems. 57: 77-93.
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. 2014. A Survey On Concept Drift Adaptation. ACM Computing Surveys (CSUR). 46(4): 44.
Gao, J., Ding, B., Fan, W., Han, J., & Yu, P. S. 2008. Classifying Data Streams With Skewed Class Distributions And Concept Drifts. IEEE Internet Computing. 12(6): 37-49.
Jalilvand, A. and Salim, N. 2016. Feature Unionization: A Novel Approach For Dimension Reduction. Applied Soft Computing. doi:10.1016/j.asoc.2016.08.031.
Gama, J. 2010. Knowledge Discovery from Data Streams. CRC Publishers.
J. Z. Kolter and M. A. Maloof, 2007. Dynamic Weighted Majority: An Ensemble Method For Drifting Concepts. Journal of Machine Learning Research. 8: 2755-2790.
M. D. Muhlbaier and R. Polikar, 2007. Multiple Classifiers Based Incremental Learning Algorithm for Learning in Nonstationary Environments. IEEE International Conference on Machine Learning and Cybernetics (ICMLC 2007). 6: 3618-3623.
N. V. Chawla, N. Japkowicz, and A. Kotcz, 2004. Editorial: Special Issue on Learning from Imbalanced Data Sets. SIGKDD Explorations. 6(1): 1-6.
Hart, P. E. 1968. The Condensed Nearest Neighbor Rule. IEEE Trans Inf Theory. 14: 515-516.
Ritter, G. L., Woodruff, H. B., Lowry, S. R., Isenhour, T. L. 1975. An Algorithm For A Selective Nearest Neighbor Decision Rule. IEEE Trans Inf Theory. 21(6): 665-669.
Wilson, D. L. 1972. Asymptotic Properties Of Nearest Neighbor Rules Using Edited Data. IEEE Trans Syst Man Cybern. 2: 408-421.
G. W. Gates. 1972. The Reduced Nearest Neighbor Rule. IEEE Trans. Information Theory. 18(3): 431-433.
Stojanović, M. B., Božić, M. M., Stanković, M. M., & Stajić, Z. P. 2014. A Methodology For Training Set Instance Selection Using Mutual Information In Time Series Prediction. Neurocomputing. 141: 236-245.
Tsai, C. F., Hsu, Y. F. and Yen, D. C. 2014. A Comparative Study Of Classifier Ensembles For Bankruptcy Prediction. Applied Soft Computing. 24: 977-984.
F. Sebastiani. 2002. Machine Learning In Automated Text Categorization. ACM Computing Surveys (CSUR). 34: 1-47.
Chang, C. C. and Lin, C. J. 2011. LIBSVM: A Library For Support Vector Machines. ACM Transactions on Intelligent Systems and Technology (TIST). 2(3): 27.
Nan, J. 2016. Accessed from http://www.openpr.org.cn/.
Blitzer, J., Dredze, M., & F. Pereira. 2007. Biographies, Bollywood, Boom-Boxes And Blenders: Domain Adaptation For Sentiment Classification. Proceedings Of The 45th Annual Meeting Of The Association Of Computational Linguistics. 440-447.
Downloads
Published
Issue
Section
License
Copyright of articles that appear in Jurnal Teknologi belongs exclusively to Penerbit Universiti Teknologi Malaysia (Penerbit UTM Press). This copyright covers the rights to reproduce the article, including reprints, electronic reproductions, or any other reproductions of similar nature.