MISSING DATA IN MULTIVARIATE ANALYSIS: IMPLICATIONS FOR TEST VALIDATION AND PERSONNEL SELECTION RESEARCH
- Author:
- Raymond, Mark R.
- Physical Description:
- 71 pages
- Additional Creators:
- Pennsylvania State University
Access Online
- Summary:
- The purpose of this research was to evaluate procedures for treating missing data in multivariate analysis. A review of several Monte Carlo experiments indicates that estimating missing values via multiple regression is superior to other methods for handling missing values in the context of correlation and multiple regression analysis. An experiment was designed to extend previous findings to data encountered in test validation and personnel selection research. Data were simulated to conform to covariance patterns taken from the relevant literature. Two, six, and ten percent of the values were deleted at random from one of three predictor variables in sample sizes of 50, 100, and 200. These incomplete data matrices were treated by four methods: (1) discard cases with incomplete records; (2) replace missing values with the mean of the respective variable; (3) replace missing values with an estimate derived by regressing the incomplete variable on its most common covariate; (4) replace missing values with an estimate derived from an iterated multiple regression routine. The treated data matrices were subjected to multiple regression and the resulting beta weights and multiple correlation coefficients (R('2)) were compared to those obtained from the original data prior to having its values deleted, i.e., the complete sample regression statistics. While the results clearly support the use of methods 3 or 4, the advantages are often minimal. Overall, regression-based estimates produced R('2)s about .005 to .010 closer to the complete sample value than R('2)s obtained from data treated by method 1. Method 1 did provide nonbiased R('2)s while methods 3 and 4 resulted in a slight degree of bias (-.007). Method 2 performed less well than methods 3 and 4 under most conditions. Estimation by regression is most advantageous when sample size is less than 100 and the percentage of missing values exceeds five percent. Although these findings are generally consistent with the results of previous research, the benefits of estimating appear to be less impressive under the conditions and criteria presently investigated.
- Other Subject(s):
- Dissertation Note:
- Ph.D. The Pennsylvania State University 1985.
- Note:
- Source: Dissertation Abstracts International, Volume: 46-06, Section: A, page: 1608.
- Part Of:
- Dissertation Abstracts International
46-06A
View MARC record | catkey: 13612554