s-news
[Top] [All Lists]

Summary: Working with descriptive statistics

To: s-news@lists.biostat.wustl.edu
Subject: Summary: Working with descriptive statistics
From: matthew.austin@quintiles.com
Date: Sat, 18 Nov 2000 10:45:56 -0600
Thanks to all who took the time to respond.  I am always impressed by the
variety of solutions that are offered to the questions that are asked on
this list.  I have pasted the answers that I received to my post at the end
of this message.  The original question was how to turn output from tapply
()

response <- rnorm(200)
treatment <- rep(c("A","B"),100)
visit <- rep(1:4,each=50)
t1.data <- data.frame(response, treatment, visit)
rm(response, treatment, visit)
t2.data <- tapply(t1.data$response, list(t1.data$visit, t1.data$treatment),
   mean)
> t2.data
            A          B
1 -0.09757641 -0.2794609
2  0.28397480 -0.1517181
3 -0.32732973  0.1674076
4  0.06026633 -0.4264220

into output which is easier for me to use in later plots and analyses.

 temp.data
    response treatment visit
 -0.09757641         A     1
  0.28397480         A     2
 -0.32732973         A     3
  0.06026633         A     4
 -0.27946094         B     1
 -0.15171808         B     2
  0.16740755         B     3
 -0.42642198         B     4

Thanks again,

Matthew D. Austin
Biostatistician

Quintiles, Inc.
P. O. Box 9708
Kansas City, MO 64134
Phone:  (816) 767 3771  Fax:  (816) 767 7372
email:  matthew.austin@quintiles.com

Jean V. Adams:

attach(df)
response <- tapply(x, paste(treatment, visit), mean)
indx <- tapply(x, paste(treatment, visit))
newdf <- cbind(response, df[match(seq(a), indx), c("treatment", "visit")])
newdf

Don MacQueen:

The Hmisc library at
   http://hesweb1.med.virginia.edu/biostat/s/splus.html
has a function named summarize() that should do this.

Ed Kademan:

 > dfr <- data.frame(as.vector(t2.data), expand.grid(dimnames(t2.data)))
 > names(dfr) <- c("response", "visit", "treatment")

Note that the "treatment" and "visit" columns are interchanged.  If
it's really important you can easily swap them.

Frank Harrell:

The summarize function in the Hmisc library does this.


Z. Todd Taylor:

dfify <- function(arr, value.name = "value", dn.names =
names(dimnames(arr)))
{
        Version <- "$Id: dfify.sfun,v 1.1 1995/10/09 16:06:12 d3a061 Exp $"
        dn <- dimnames(arr <- as.array(arr))
        if(is.null(dn))
                stop("Can't data-frame-ify an array without dimnames")
        names(dn) <- dn.names
        ans <- cbind(expand.grid(dn), as.vector(arr))
        names(ans)[ncol(ans)] <- value.name
        ans
}

Use it like:

   t2.data <- tapply(...)
   t2.dataframe <- dfify(t2.data)

Or, to make things more readable/useful:

   t2.dataframe <- dfify(t2.data,
                         value.name="response",
                         dn.names=c("treatment", "visit")
                        )
Thomas Jagger Phd:

> makedataframe
function(x, Names = NULL)
{
        dx <- dimnames(x)
        if(is.null(dx))
                dx <- list(paste("row", 1:nrow(x), sep = ""), paste("col",
                        1:ncol(x), sep = ""))
        else {
                if(is.null(dx[[1]]))
                        dx[[1]] <- paste("row", 1:nrow(x), sep = "")
                if(is.null(dx[[2]]))
                        dx[[2]] <- paste("col", 1:ncol(x), sep = "")
        }
        xx <- data.frame(data = as.vector(unlist(x)), rows = dx
[[1]][as.vector(
                row(x))], cols = dx[[2]][as.vector(col(x))])
        if(!is.null(Names)) {
                if(length(Names) == 3)
                        names(xx) <- Names
                else warning("Invalid Names length must be 3")
        }
        xx
}

Charles C. Berry:

Something like

result <- cbind( c(t2.data), do.call("expand.grid",dimnames(t2.data)) )

followed by

names(result) <- ...

should do.

Bill Venables:

There is a standard function called as.data.frame.array in the Trellis
library (I think, but it should be automatically visible) that does
precisely this.  You should look at the help information.  In any case here
is a simple alternative that should do what you want:

cvt.array <- function(A) {
     dat <- do.call("expand.grid", dimnames(A))
     dat$Y <- as.vector(A)
     dat
}


<Prev in Thread] Current Thread [Next in Thread>