s-news
[Top] [All Lists]

Re: Question about transcan() status message

To: Frank E Harrell Jr <f.harrell@vanderbilt.edu>, S-News Mail List <s-news@lists.biostat.wustl.edu>
Subject: Re: Question about transcan() status message
From: Kim Elmore <Kim.Elmore@noaa.gov>
Date: Wed, 12 Dec 2007 08:47:17 -0600
In-reply-to: <475FC0EC.2080102@vanderbilt.edu>
References: <475F14B5.5040704@noaa.gov> <475FC0EC.2080102@vanderbilt.edu>
Thank you very much, Dr. Harrell. I thought it probably had to do with that particular variable (missing values have been artificially added to my data). I did discover iregImpute() and it seems to work very well. It also has some very helpful attributes. In particular, it will maintain the appropriate range of the imputed variables, something that I couldn't figure out how to do with the imputation techniques in the S+MissingData package. They are probably in the MissingData library, but I didn't find them.

I have MICE as well and have played with it, too, but your package has more easily accessible desirable characteristics for my purposes.

Again, thanks very much!

Kim Elmore



At 05:07 AM 12/12/2007, you wrote:
Kim Elmore wrote:
I have a data set with missing data for which I wish to perform imputations. I'm very new at imputing data, but I've been looking into transcan() and it seems to have many agreeable attributes, so I tried it. However, I get the following message:

transcan is good for single conditional mean imputation; generally multiple imputation is preferred; see the Hmisc aregImpute function or the Mice package for this.


Fewer than 3 unique knots.  Frequency table of variable:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
1189 13 4 14 9 7 7 6 2 3  4  7  4  3  3  2  2  1  1  1  1  3  2  3  3  2  3
27 28 29 30 31 34 35 36 37 41 42 43 46 48 49 50 51 52 53 57 58 61 62 63 64
 1  2  4  1  2  1  2  1  1  2  1  1  2  3  1  1  1  1  1  1  1  2  1  1  1
65 66 70 72 73 74 75 78 83 88 94 97 99 114 119 180 182 267
 1  1  1  3  1  1  1  1  1  1  1  1  2   1   1   1   1   1
All of the data are continuous numeric. I have 21 variables (columns) and 1356 observations (rows). I believe the NAs to be randomly distributed, but some variables have many more missing values than others.
How do I interpret what transcan() is telling me?

You have a variable with a huge number (1189) of zeros. It is difficult to fit a nonlinear spline function with that. You might force it to be linear using I(variable name) in the formula. Someday we should add other options such as linear splines or quadratic effects for such variables.

Frank Harrell

Kim Elmore


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University


<Prev in Thread] Current Thread [Next in Thread>