s-news
[Top] [All Lists]

Re: dimnames in Sparc Splus 6.0

To: Terry Therneau <therneau@mayo.edu>
Subject: Re: dimnames in Sparc Splus 6.0
From: Tim Hesterberg x319 <timh@insightful.com>
Date: Wed, 12 Sep 2001 15:53:41 -0700
Cc: s-news@lists.biostat.wustl.edu
In-reply-to: <200109052235.RAA28156@rocky.mayo.edu> (message from Terry Therneau on Wed, 5 Sep 2001 17:35:22 -0500 (CDT))
References: <200109052235.RAA28156@rocky.mayo.edu>
Reply-to: timh@insightful.com (Tim Hesterberg)
This message contains
* comments on implementation details of versions of [.data.frame
  posted Sept 6
* recommend that you not mask [.data.frame; safer alternative
* comments on old-style vs new-style classes.

I wrote:
>>Gary Sabot just posted a version of [.data.frame for which
>>        X[,aSingleColumn]          # where X is a data frame
>>adds the row names of the data frame as names to the vector
>>that is returned.
>>
>>Unfortunately, that version of [.data.frame makes assumptions about
>>the type of data included in data frames that are often unjustified.
>>This causes it to mess up with some kinds of data, including: ...

Terry Therneau wrote:
>  Partly to provide information, partly to be contrary perhaps, let me
>comment a bit on Tim's comments.
>
>  The "problem" -- Gary Sabot would like the results of x[, aSingleColum],
>where x is a data frame, to retain the row names of x as labels.
>
>  The problem behind the problem: the newest releases of Splus, those based on
>the "version 4" engine from John Chambers at Lucent, have built into them
>a dependence on a new class structure.  That new class structure has a
>severe shortcoming (at least one) that makes it impossible to impliment ANY
>solution to Gary's problem.  Tim stated that "unfortunately it (the new
>[.data.frame) makes assumptions about the type of data that is included in
>the data frame that may be unjustified."  Precisely -- the assumption is that
>you won't run into one of the restrictions due to new-style classes.

I'm thinking of something different, an implementation detail.  This
is not an issue of old classes vs new classes (though I will touch on
that later).

The version of [.data.frame first posted by Sabot uses some dangerous,
and unnecessary, manipulations involving unlist() and as.vector() to
manipulate a data frame containing a single variable.  These will
destroy the structure of various objects, whether new-style or
old-style classed objects (including factors and matrices).

The second posted version is safer; explicitly extract
the variable from the data frame using subscripting rather than unlist(),
and then do:
    names(result) <- namesResult

This would work as long as "names<-" is well-behaved for the object.


Terry is correct about the inflexibility of new-style classes.
For such an object, one cannot just do
        names(object) <- result
and expect it to work, unless the designer of the class has done
things right, by defining a "names<-" method for the class.
This could do one of:
* add the names to part of the object, contained in one slot;
  for example see:
        library("missing"); selectMethod("names<-", "miVariable")
   where the key line is:
        names(x@Data) <- value
* return an object with a different class, with the same slots
  and one additional for the names.
* discard the names, give a warning, and return the original object
  (consider this a last resort).


When I sent that second version of [.data.frame to Sabot I warned that it
might not work for all objects.  It would fail for matrices (yes, one
can include a matrix in a data frame).  That particular problem could
be fixed by replacing
        names(result) <- namesResult
with
        rowIds(result) <- namesResult
but there are still likely to be other problems.


I recommend that you, gentle user, NOT mask a fundamental function
like [.data.frame.  This could lead to bugs that are hard to track
down.  You'll be happier, and Insightful tech support will thank you :-)
Instead use a function like extractVariableAddNames specifically for
that purpose.


Terry continued to comment on the difference between old-style
and new-style classes:
>  The new style classes have the significant restriction that absolutely
>no "extra" information may be attached to such an object, and have it remain
>of the original class.  This may be good computer science, but the notion that
>every necessary attribute of a class will be visualized at the class's
>conception is naive in practicality.  After 10+ years of working with the
>survival code, I still make additions to the basic objects.  (Perhaps I'm
>just slow?)
>...
>   I personally think that although the new class structure is useful for
>certain very simple objects (such as a timeDate), conversion of the results
>of a model fit (lm, glm, coxph) to this form will be a gross and near
>crippling mistake.   

I concur -- old-style objects have a strong advantage when one may add
to a class in the future.  I'm in the middle of a project now where
we're adding optional components to the bootstrap and related objects.
One could add an optional component to a new-style object by defining
an inheriting class that adds the component, but this can get
unwieldy; you'd need 2^k classes to handle k optional components.

A second advantage of old-style object is that many functions
work on the objects without additional work.
For example, consider an old-style object consisting of
a numeric object (vector or array) with some attributes.
Various functions such as length(), dim(), names(), names<-,
and numerical functions such as mean(), stdev(), sin(),
all handle the object without additional work, operating on the
numeric data.  You only need to write methods for functions which
need to do something special with the attributes.
To convert that class to a new-style object, with the numeric
data in one slot and the attributes in other slots, requires that
you write methods for the functions mentioned and many others.

On the other hand, new-style classes have some advantages, e.g. their
unambiguous definition means they can be manipulated directly by C++
code.

========================================================
| Tim Hesterberg       Research Scientist              |
| timh@insightful.com  Insightful Corp.                |
| (206)283-8802x319    1700 Westlake Ave. N, Suite 500 |
| (206)283-6310 (fax)  Seattle, WA 98109-3044, U.S.A.  |
========================================================
Formerly known as MathSoft, Insightful Corporation provides analytical
solutions leveraging S-PLUS, StatServer, and Consulting services

<Prev in Thread] Current Thread [Next in Thread>