Thank you very much, Dr. Harrell. I thought it probably had to do
with that particular variable (missing values have been artificially
added to my data). I did discover iregImpute() and it seems to work
very well. It also has some very helpful attributes. In particular,
it will maintain the appropriate range of the imputed variables,
something that I couldn't figure out how to do with the imputation
techniques in the S+MissingData package. They are probably in the
MissingData library, but I didn't find them.
I have MICE as well and have played with it, too, but your package
has more easily accessible desirable characteristics for my purposes.
Again, thanks very much!
Kim Elmore
At 05:07 AM 12/12/2007, you wrote:
Kim Elmore wrote:
I have a data set with missing data for which I wish to perform
imputations. I'm very new at imputing data, but I've been looking
into transcan() and it seems to have many agreeable attributes, so
I tried it. However, I get the following message:
transcan is good for single conditional mean imputation; generally
multiple imputation is preferred; see the Hmisc aregImpute function
or the Mice package for this.
Fewer than 3 unique knots. Frequency table of variable:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23 24 25 26
1189 13 4 14 9 7 7 6 2 3 4 7 4 3 3 2 2 1 1 1 1 3 2 3 3 2 3
27 28 29 30 31 34 35 36 37 41 42 43 46 48 49 50 51 52 53 57 58 61 62 63 64
1 2 4 1 2 1 2 1 1 2 1 1 2 3 1 1 1 1 1 1 1 2 1 1 1
65 66 70 72 73 74 75 78 83 88 94 97 99 114 119 180 182 267
1 1 1 3 1 1 1 1 1 1 1 1 2 1 1 1 1 1
All of the data are continuous numeric. I have 21 variables
(columns) and 1356 observations (rows). I believe the NAs to be
randomly distributed, but some variables have many more missing
values than others.
How do I interpret what transcan() is telling me?
You have a variable with a huge number (1189) of zeros. It is
difficult to fit a nonlinear spline function with that. You might
force it to be linear using I(variable name) in the
formula. Someday we should add other options such as linear splines
or quadratic effects for such variables.
Frank Harrell
Kim Elmore
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
|