IMPUTATION OF MISSING DATA WITH DIFFERENT MISSINGNESS MECHANISM

Authors

  • HO MING KANG Department of Mathematical Sciences, Faculty of Science, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor Darul Ta'azim, Malaysia
  • FADHILAH YUSOF Department of Mathematical Sciences, Faculty of Science, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor Darul Ta'azim, Malaysia
  • ISMAIL MOHAMAD Department of Mathematical Sciences, Faculty of Science, Universiti Teknologi Malaysia, 81310 UTM Johor Bahru, Johor Darul Ta'azim, Malaysia

DOI:

https://doi.org/10.11113/jt.v57.1523

Keywords:

Missing data, expectation maximization, mean imputation

Abstract

This paper presents a study on the estimation of missing data. Data samples with different missingness mechanism namely Missing Completely At Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR) are simulated accordingly. Expectation maximization (EM) algorithm and mean imputation (MI) are applied to these data sets and compared and the performances are evaluated by the mean absolute error (MAE) and root mean square error (RMSE). The results showed that EM is able to estimate the missing data with minimum errors compared to mean imputation (MI) for the three missingness mechanisms. However the graphical results showed that EM failed to estimate the missing values in the missing quadrants when the situation is MNAR.

References

E. Acuna, C. Rodriguez. 2004. The Treatment of Missing Values and its Effect in the Classifier Accuracy. In Classification, Clustering and Data Mining Application. 639-648.

M. Janssen, Donders, A. R. T., Harrell, F. E., Vergouwe, Y. et al. 2009. Missing Covariate Data in Medical Research: to Impute is Better than to Ignore. Journal of Clinical Epidemiology. 63: 721-727.

R. Presti, E. Barca, G. Passarella. 2010. A Methodology for Treating Missing Data Applied to Daily Rainfall Data in the Candelaro River Basin (Italy). Environ. Monit. Assess 160: 1-22.

M. Firat, F. Dikbas, A. C. Koc, M. Gungor. 2010. Missing Data Analysis and Homogeneity Test for Turkish Precipitation Series. Sadhana. 35(6): 707-720.

F. V. Nelwamondo, S. Mohamed, T. Marwala. 2007. Missing Data: A Comparison of Neural Network and Expectation Maximization Techniques. Current Science. 93(11): 1514-1520.

M. Nakai. 2011. Analysis of Imputation Methods for Missing Data in AR(1) Longitudinal Dataset. Int. Journal of Math. Analysis. 5(45): 2217-2227.

R. J. A. Little, D. B. Rubin. 1987. Statistical Analysis with Missing Data. Unites States in America: John Wiley & Sons Inc.

I. Mohamad. 2003. Data Analysis in the Presence of Missing Data. PhD Thesis, Lancaster University.

W. Y. Chin, Z. M. Khalid, M. K. Ho. 2011. Analysis of Repeated Measures via Simulation. Simposium Kebangsaan Sains Matematik ke-19 (SKSM 19), UiTM Pulau Pinang, 9-11 November 2011.

Y. L. Xia, P. Fabian, A. Stohl, M. Winterhalter. 1999. Forest Climatology: Estimation of Missing Values for Bavaria, Germany. Agricultural and Forest Meteorology. 96: 131-144.

Downloads

Published

2012-02-15

How to Cite

IMPUTATION OF MISSING DATA WITH DIFFERENT MISSINGNESS MECHANISM. (2012). Jurnal Teknologi, 57(1). https://doi.org/10.11113/jt.v57.1523