Partly to provide information, partly to be contrary perhaps, let me
comment a bit on Tim's comments.
The "problem" -- Gary Sabot would like the results of x[, aSingleColum],
where x is a data frame, to retain the row names of x as labels.
The problem behind the problem: the newest releases of Splus, those based on
the "version 4" engine from John Chambers at Lucent, have built into them
a dependence on a new class structure. That new class structure has a
severe shortcoming (at least one) that makes it impossible to impliment ANY
solution to Gary's problem. Tim stated that "unfortunately it (the new
[.data.frame) makes assumptions about the type of data that is included in
the data frame that may be unjustified." Precisely -- the assumption is that
you won't run into one of the restrictions due to new-style classes.
Tim's other two reasons-- that names slow things down, and that the names
that get "glued on" might not be the ones you really want -- are good
arguments for leaving the "as shipped" default behavior as is. However, in the
spirit of the S goal "To turn ideas into software, quickly and faithfully"
(Chambers), he should be able to impliment his default for his machine.
I will distinguish between being able to retain the names, and having those
names automatically printed in all cases -- the second is much harder because
of the many specialized print functions.
The new style classes have the significant restriction that absolutely
no "extra" information may be attached to such an object, and have it remain
of the original class. This may be good computer science, but the notion that
every necessary attribute of a class will be visualized at the class's
conception is naive in practicality. After 10+ years of working with the
survival code, I still make additions to the basic objects. (Perhaps I'm
just slow?)
Thus, an integer vector with names is no longer an integer vector, it is
an object of another type. A special class "named" was created to allow for
named integer, double, character and logical vectors, but no such work has
been done for timeDates, factors, Surv objects, etc, etc, etc. If any of
these is the contents of the selected column of X, there is no way to keep
both the object and a list of associated names. We have encountered a
similar problem with our local version of sas.get, which retained the SAS
label attribute of each element of the data frame. Luckily, the number of
kinds of variable that can be created is small, so we have built a set of
local classes for labeled integers, doubles, characters, factors, and dates.
(I have heard that the 'named' class itself caused many a headache for
Seattle.)
I personally think that although the new class structure is useful for
certain very simple objects (such as a timeDate), conversion of the results
of a model fit (lm, glm, coxph) to this form will be a gross and near
crippling mistake.
Terry Therneau
|