gerald.jean@dgag.ca wrote:
Hello there,
TIBCO Spotfire S+ Version 8.1.1 for Linux 2.6.9-34.EL, 64-bit : 2008
I am trying to impute missing values from roughly 100 variables in a data.
frame. Running "transcan" is no problem, is converges and gives sensible
results. The problem arises when trying to impute the missing values from
the original data set. The goal is getting a new data frame from the old
one, with all missing values imputed. Due to the large number of variables
to be imputed it is not practical to use, as in the help file,
var.name <- impute(x.trans, var.name)
I know that I can use, to impute all variables at once:
impute(x.trans)
But with a large number of variables and a large number of objects in the
working directory it clutters this directory with too many objects. The
method suggested in the help file:
## Not run:
xt <- transcan(~. , data=mine,
imputed=TRUE, shrink=TRUE, n.impute=10, trantab=TRUE)
attach(mine, pos=1, use.names=FALSE)
impute(xt, imputation=1) # use first imputation
detach(1, 'mine2')
## End(Not run)
Gerald,
# Here is how to create a completed dataset
d <- data.frame(x1, x2)
z <- transcan(~x1 + I(x2), n.impute=5, data=d)
imputed <- impute(z, imputation=1, data=d,
list.out=TRUE, pr=FALSE, check=FALSE)
sapply(imputed, function(x)sum(is.imputed(x)))
sapply(imputed, function(x)sum(is.na(x)))
In my test NAs are imputed. I don't know if there is a difference
between S+ and R here. Before doing anything else give a data= argument
to transcan and see what happens.
Frank
may work in R but not in S+, the "save" argument to "detach" is not
currently supported.
I then tried setting the "list.out" argument to TRUE.
ttt.impute <- impute(RTA.Socio.transcan, data = TelePV.RTA.Socio, list.out
= TRUE)
According to the help file:
list.out
If var is not specified, you can set list.out=TRUE to have
impute.transcan return a list containing variables with needed
values imputed. This list will contain a single imputation.
It runs OK but the values are not imputed:
lapply(ttt.impute, FUN = function(x) sum(is.na(x)))
$adhesion:
[1] 161
$X1.c06poparea:
[1] 8
$X2.c06dwlgarea:
[1] 8
$X3.pctpop0.14:
[1] 9
$X40.avg.persfam:
[1] 8
$X41.pct.move1an:
[1] 11
...
exactly the original missing values? Am I missing something, I am banging
my head since yesterday trying to do this, to no avail? I can do it
manually through a loop but I thought this was the purpose of "impute".
Thanks for any insights,
Gérald Jean
Conseiller senior en statistiques,
VP Planification et Développement des Marchés,
Desjardins Groupe d'Assurances Générales
télephone : (418) 835-4900 poste (7639)
télecopieur : (418) 835-6657
courrier électronique: gerald.jean@dgag.ca
"In God we trust, all others must bring data" W. Edwards Deming
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
|