s-news
[Top] [All Lists]

Re: dimnames in Sparc Splus 6.0

To: Tim Hesterberg x319 <timh@insightful.com>
Subject: Re: dimnames in Sparc Splus 6.0
From: Prof Brian D Ripley <ripley@stats.ox.ac.uk>
Date: Thu, 6 Sep 2001 07:53:45 +0100 (BST)
Cc: <s-news@lists.biostat.wustl.edu>
In-reply-to: <200109052101.OAA26368@tomato.statsci.com>
On Wed, 5 Sep 2001, Tim Hesterberg x319 wrote:

> Gary Sabot just posted a version of [.data.frame for which
>       X[,aSingleColumn]          # where X is a data frame
> adds the row names of the data frame as names to the vector
> that is returned.
>
> Unfortunately, that version of [.data.frame makes assumptions about
> the type of data included in data frames that are often unjustified.
> This causes it to mess up with some kinds of data, including:
> * factors  (e.g. try fuel.frame[,"Type"] with and without that version)
> * objects with new-style classes.  This makes it incompatible with
>   with library("missing") (a library in S-PLUS 6.0 for handling
>   missing data using multiple imputations).
>
> More generally, adding names to vectors is often undesirable; it may
> * substantially increase the size of the resulting object,
> * slow down some S-PLUS computations, and
> * cause bugs for code that expects data without names; also

(right, so users should be able to remove names or extract without names.
They should also be able to extract with names.)

> * there is no way to determine which variables in a data frame had
>   names originally.  So names may be added to variables that should
>   not have them, or which are incorrect for the variable.

That's not the point.  In a data frame, the row.names apply to the rows, so
they *do* apply to each variable, and whether or not the variable had names
originally, or even if the variable had different names.  If not, the
object should not be of class "data.frame".

And of course, row names in data frames have long been documented to be
unique, so dup.names.ok in data.frame() must be a design error.

I believe the root problem is that instead of creating a new class for the
purpose, the semantics of the existing class "data.frame" have been
changed, apparently for Insightful internal convenience.

John Chambers has said that it is time for some S standardization effort,
and I back that.  Concepts like data frames should be frozen for all time.
It will help the users, who may be using two or three S implementations
simultaneously.  We are about to move our teaching from 2000 to 6.0,
and my colleagues expect all their material to continue to work
unchanged ....

[...]

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


<Prev in Thread] Current Thread [Next in Thread>