Kim Elmore wrote:
I have a data set with missing data for which I wish to perform
imputations. I'm very new at imputing data, but I've been looking into
transcan() and it seems to have many agreeable attributes, so I tried
it. However, I get the following message:
transcan is good for single conditional mean imputation; generally
multiple imputation is preferred; see the Hmisc aregImpute function or
the Mice package for this.
Fewer than 3 unique knots. Frequency table of variable:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26
1189 13 4 14 9 7 7 6 2 3 4 7 4 3 3 2 2 1 1 1 1 3 2 3 3 2 3
27 28 29 30 31 34 35 36 37 41 42 43 46 48 49 50 51 52 53 57 58 61 62 63 64
1 2 4 1 2 1 2 1 1 2 1 1 2 3 1 1 1 1 1 1 1 2 1 1 1
65 66 70 72 73 74 75 78 83 88 94 97 99 114 119 180 182 267
1 1 1 3 1 1 1 1 1 1 1 1 2 1 1 1 1 1
All of the data are continuous numeric. I have 21 variables (columns)
and 1356 observations (rows). I believe the NAs to be randomly
distributed, but some variables have many more missing values than others.
How do I interpret what transcan() is telling me?
You have a variable with a huge number (1189) of zeros. It is difficult
to fit a nonlinear spline function with that. You might force it to be
linear using I(variable name) in the formula. Someday we should add
other options such as linear splines or quadratic effects for such
variables.
Frank Harrell
Kim Elmore
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
|