s-news
[Top] [All Lists]

summary:convert to weights

To: S-news <s-news@lists.biostat.wustl.edu>
Subject: summary:convert to weights
From: Andrew Beckerman <a.p.beckerman@stir.ac.uk>
Date: Fri, 29 Jun 2001 11:30:32 +0100
Thanks to Jean Adams, Bill Dunlap, Gerald Jean, Carlisle Thacker, Andrzej Galecki and Nick Ellis, Alec Stephenson and Jean Adams for quick replies.

The question was how to convert a data frame from raw, repeated data, to counts (or weights).

Many people suggestd tapply() and table() but these require some reconstruction on your own.

aggregate() was also suggested, but seemed to fail on a large (>95000) row dataset.

Bill Dunlap provided code for a(his?) function agg.length() that did the trick:

agg.length <- function(by, sort.it = T) {
        if(!is.data.frame(by))
                by <- data.frame(by)
        if(sort.it) {
                ord <- do.call("order", by)
                by <- by[ord,  ]
        }
        logical.diff <- function(group)
        group[-1] != group[ - length(group)]
        change <- logical.diff(by[[1]])
        for(i in seq(along = by)[-1])
                change <- change | logical.diff(by[[i]])
        by <- by[c(T, change),  , drop = F]
        by$length <- diff(c(0, seq(len = length(change) + 1)[c(change, T)]))
        by
}

as did the function from Gerald Jean by Bill Dunlap and Phil Corbett?

"TableRows" <- function(dum)
  {
### Author : Bill Dunlap and Phil Corbett(I think?)
### Date   : 29 Nov 2000, 09:02
### Purpose: will return the unique rows of the data.frame "dum" with an added
###          column giving the number of times the unique rows were repeated in
###          the original data.frame.
### ----------------------------------------------------------------------
### Arguments:
###  dum : input data.frame containing duplicated rows.
### ----------------------------------------------------------------------

        dum.list        <- as.list(data.frame(dum))
        names(dum.list) <- NULL
        dum.strings <- do.call("paste", c(list(sep = "\001"), dum.list))
tab <- table(factor(dum.strings, levels = unique(dum.strings)))
        cbind(dum[match(unique(dum.strings), dum.strings),  ], count = tab)
}

The fucntion summarize() provided by Jean Adams summarize() [if you use hmisc, you may want to change the name of this], may work, but was taking an inordinate amount of time on a large data set.

summarize <- function(sumry, index, fcn = "mean", ..., newnames =
dimnames(sumry)[[2]]) {
# sumry is a matrix of variables that will be summarized
# index is a matrix or data frame of discrete variables for which data is
summarized
# fcn is the summary function
     ind <- apply(index, 1, function(x)
     paste(format(x), collapse = " "))
     res <- apply(sumry, 2, tapply, ind, fcn, ...)
     matchup <- tapply(sumry[, 1], ind)
     rows <- match(1:num.uniq(ind), matchup)
     df <- data.frame(index[rows,  ], res)
     names(df)[ - (1:dim(index)[2])] <- newnames
     df
}

and Alec Stephenson pointed out an example from S-Programming:

data.frame(expand.grid(lapply(tmp,levels)),
weight=as.vector(do.call("table",tmp)))

Dr. Andrew Beckerman
Institute of Biological Science
University of Stirling, Stirling FK9 4LA, Scotland, UK
phone:+44 (0)1786 467808 fax: +44 (0)1786 464994
--
The University of Stirling is a university established in Scotland by
charter at Stirling, FK9 4LA.  Privileged/Confidential Information may
be contained in this message.  If you are not the addressee indicated
in this message (or responsible for delivery of the message to such
person), you may not disclose, copy or deliver this message to anyone
and any action taken or omitted to be taken in reliance on it, is
prohibited and may be unlawful.  In such case, you should destroy this
message and kindly notify the sender by reply email.  Please advise
immediately if you or your employer do not consent to Internet email
for messages of this kind.  Opinions, conclusions and other
information in this message that do not relate to the official
business of the University of Stirling shall be understood as neither
given nor endorsed by it.


<Prev in Thread] Current Thread [Next in Thread>