Hello Frank,
Frank E Harrell Jr <f.harrell@vanderbilt.edu> a écrit sur 2009/05/26
18:05:29 :
> gerald.jean@dgag.ca wrote:
> > Hello there,
> >
> > TIBCO Spotfire S+ Version 8.1.1 for Linux 2.6.9-34.EL, 64-bit : 2008
> >
> > I am trying to impute missing values from roughly 100 variables in a
data.
> > frame. Running "transcan" is no problem, is converges and gives
sensible
> > results. The problem arises when trying to impute the missing values
from
> > the original data set. The goal is getting a new data frame from the
old
> > one, with all missing values imputed. Due to the large number of
variables
> > to be imputed it is not practical to use, as in the help file,
> >
> >> var.name <- impute(x.trans, var.name)
> >
> > I know that I can use, to impute all variables at once:
> >
> >> impute(x.trans)
> >
> > But with a large number of variables and a large number of objects in
the
> > working directory it clutters this directory with too many objects.
The
> > method suggested in the help file:
> >
> > ## Not run:
> > xt <- transcan(~. , data=mine,
> > imputed=TRUE, shrink=TRUE, n.impute=10, trantab=TRUE)
> > attach(mine, pos=1, use.names=FALSE)
> > impute(xt, imputation=1) # use first imputation
> > detach(1, 'mine2')
> > ## End(Not run)
>
> Gerald,
>
> # Here is how to create a completed dataset
> d <- data.frame(x1, x2)
> z <- transcan(~x1 + I(x2), n.impute=5, data=d)
> imputed <- impute(z, imputation=1, data=d,
> list.out=TRUE, pr=FALSE, check=FALSE)
> sapply(imputed, function(x)sum(is.imputed(x)))
> sapply(imputed, function(x)sum(is.na(x)))
>
> In my test NAs are imputed. I don't know if there is a difference
> between S+ and R here. Before doing anything else give a data= argument
> to transcan and see what happens.
>
> Frank
>
first thanks for your reply, I tested your example and it works like a
charm in S+. This puzzled me since I was using the "data" argument of the
impute function. Hence I searched what was different in your use of
transcan and impute than my own use. I saw that you were using the
"n.impute" argument of transcan and the "imputation" argument of impute. I
didn't use those since my application didn't require multiple imputation, I
then tried to re-run my problem with those arguments and it worked no
problem. I then re-run your example without doing multiple imputation and
it did not work? Here is exactly what I did:
ttt.x1 <- rnorm(20, 0, 1)
ttt.x2 <- rnorm(20, 0.25, 1)
ttt.x2[c(2, 8, 11, 20)] <- NA
ttt.x2
library(section = "Hmisc", pos = 3)
ttt.d <- data.frame(ttt.x1, ttt.x2)
ttt.z <- transcan(~ttt.x1 + I(ttt.x2), imputed = TRUE, data = ttt.d)
ttt.imputed <- impute(ttt.z, data = ttt.d,
list.out=TRUE, pr=FALSE, check=FALSE)
sapply(ttt.imputed, function(x)sum(is.imputed(x)))
ttt.x1 ttt.x2
0 0
sapply(ttt.imputed, function(x)sum(is.na(x)))
ttt.x1 ttt.x2
0 4
Hence, if I get this right, the "n.impute" argument to transcan MUST be
used for this type of application?
Thanks again, and thanks for such great librairies, I use them,
particularly Hmisc, all the time!
Gérald
> >
> > may work in R but not in S+, the "save" argument to "detach" is not
> > currently supported.
> >
> > I then tried setting the "list.out" argument to TRUE.
> >
> > ttt.impute <- impute(RTA.Socio.transcan, data = TelePV.RTA.Socio,
list.out
> > = TRUE)
> >
> > According to the help file:
> >
> > list.out
> > If var is not specified, you can set list.out=TRUE to have
> > impute.transcan return a list containing variables with
needed
> > values imputed. This list will contain a single imputation.
> >
> > It runs OK but the values are not imputed:
> >
> >> lapply(ttt.impute, FUN = function(x) sum(is.na(x)))
> > $adhesion:
> > [1] 161
> > $X1.c06poparea:
> > [1] 8
> > $X2.c06dwlgarea:
> > [1] 8
> > $X3.pctpop0.14:
> > [1] 9
> > $X40.avg.persfam:
> > [1] 8
> > $X41.pct.move1an:
> > [1] 11
> > ...
> >
> > exactly the original missing values? Am I missing something, I am
banging
> > my head since yesterday trying to do this, to no avail? I can do it
> > manually through a loop but I thought this was the purpose of "impute".
> >
> > Thanks for any insights,
> >
> > Gérald Jean
> > Conseiller senior en statistiques,
> > VP Planification et Développement des Marchés,
> > Desjardins Groupe d'Assurances Générales
> > télephone : (418) 835-4900 poste (7639)
> > télecopieur : (418) 835-6657
> > courrier électronique: gerald.jean@dgag.ca
> >
> > "In God we trust, all others must bring data" W. Edwards Deming
> >
> >
>
> --
> Frank E Harrell Jr Professor and Chair School of Medicine
> Department of Biostatistics Vanderbilt University
>
Gérald Jean
Conseiller senior en statistiques,
VP Planification et Développement des Marchés,
Desjardins Groupe d'Assurances Générales
télephone : (418) 835-4900 poste (7639)
télecopieur : (418) 835-6657
courrier électronique: gerald.jean@dgag.ca
"In God we trust, all others must bring data" W. Edwards Deming
Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés
uniquement aux personnes identifiées et peuvent contenir des informations
privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez reçu
ce message par erreur, veuillez le détruire.
This communication ( and/or the attachments ) is intended for named recipients
only and may contain privileged or confidential information which is
not to be disclosed. If you received this communication by mistake please
destroy all copies.
Faites bonne impression et imprimez seulement au besoin !
Think green before you print !
Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés
uniquement aux personnes identifiées et peuvent contenir des informations
privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez reçu
ce message par erreur, veuillez le détruire.
This communication (and/or the attachments) is intended for named recipients
only and may contain privileged or confidential information which is not to be
disclosed. If you received this communication by mistake please destroy all
copies.
|