I am starting a project to develop a system for analyzing many small data
sets (on the order of 100 observations and 10 variables) that are related
to each other by two sets of factors. The number of factors could be as
large as 1000 by 50, but more typically 50 to 100 by 50. This system will
be developed on a Windows PC for S-plus 2000 or S-plus 6. Each data set
must be analyzed independently because the specific analysis depends on the
total amount of data, missing values, and censored values.
It is impractical to manage that many data sets and their status and
analyses with individual data objects. I would like some feedback on an
idea for managing these data. I would like to create something like a
by-object to store the data frame, status, results. That way, I could
reference the data.frame by a statement like object["factor1",factor2"]$df.
This works in a test file that I have, but is rather slow. Is there a
better way? a faster way? What might be the limiting factors of this
approach?
Thanks much.
Dave
|