I would like to give the strongest possible second
to Terry's note. The current design of SV4 is
seriously broken. I have turned to some of the
greatest S programmers in the world to try to
fix the attribute and multiple inheritance problems
in S language version 4 but there is simply no fix
that is useful and practical.
One thing I've always bragged about to SAS users
is the ability of S programmers to add attributes
to objects "on the fly". This advantage has
vanished in SV4.
In my Hmisc and Design libraries I tried to convert
one of the functions (latex) to use the new class
system and the task turned out to be impossible
due to SV4's need for all specific methods (in my
case latex conversion methods) to have the same arguments.
This is not a reasonable requirement, as conversion
of different objects to LaTeX code requires different
options (e.g., converting a regression model fit
to LaTeX algebraic form requires vastly different
options than converting a matrix to a table).
So I gave up and Hmisc and Design make no use
of the new SV4 class mechanism. I have had to
write my on [.data.frame to preserve "label"
attributes of variables.
I found that porting my libraries to R is taking
less time than making them compatible with SV4.
This simply should not be. I hope that some
major rethinking is taking place.
I am puzzled how R developers seeks so much user input
before making major changes while the commercial product
does not.
Please forgive me for taking such a negative
tone today. I know that I am on edge because
of the tragedies that have taken place. But
I wanted others to know that Terry Therneau is
not alone in his concerns about SV4.
Frank Harrell
Terry Therneau wrote:
>
> Partly to provide information, partly to be contrary perhaps, let me
> comment a bit on Tim's comments.
>
> The "problem" -- Gary Sabot would like the results of x[, aSingleColum],
> where x is a data frame, to retain the row names of x as labels.
>
> The problem behind the problem: the newest releases of Splus, those based on
> the "version 4" engine from John Chambers at Lucent, have built into them
> a dependence on a new class structure. That new class structure has a
> severe shortcoming (at least one) that makes it impossible to impliment ANY
> solution to Gary's problem. Tim stated that "unfortunately it (the new
> [.data.frame) makes assumptions about the type of data that is included in
> the data frame that may be unjustified." Precisely -- the assumption is that
> you won't run into one of the restrictions due to new-style classes.
>
> Tim's other two reasons-- that names slow things down, and that the names
> that get "glued on" might not be the ones you really want -- are good
> arguments for leaving the "as shipped" default behavior as is. However, in
> the
> spirit of the S goal "To turn ideas into software, quickly and faithfully"
> (Chambers), he should be able to impliment his default for his machine.
> I will distinguish between being able to retain the names, and having those
> names automatically printed in all cases -- the second is much harder because
> of the many specialized print functions.
>
> The new style classes have the significant restriction that absolutely
> no "extra" information may be attached to such an object, and have it remain
> of the original class. This may be good computer science, but the notion that
> every necessary attribute of a class will be visualized at the class's
> conception is naive in practicality. After 10+ years of working with the
> survival code, I still make additions to the basic objects. (Perhaps I'm
> just slow?)
> Thus, an integer vector with names is no longer an integer vector, it is
> an object of another type. A special class "named" was created to allow for
> named integer, double, character and logical vectors, but no such work has
> been done for timeDates, factors, Surv objects, etc, etc, etc. If any of
> these is the contents of the selected column of X, there is no way to keep
> both the object and a list of associated names. We have encountered a
> similar problem with our local version of sas.get, which retained the SAS
> label attribute of each element of the data frame. Luckily, the number of
> kinds of variable that can be created is small, so we have built a set of
> local classes for labeled integers, doubles, characters, factors, and dates.
> (I have heard that the 'named' class itself caused many a headache for
> Seattle.)
>
> I personally think that although the new class structure is useful for
> certain very simple objects (such as a timeDate), conversion of the results
> of a model fit (lm, glm, coxph) to this form will be a gross and near
> crippling mistake.
>
> Terry Therneau
>
>
> ---------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu. To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message: unsubscribe s-news
--
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
|