How about this?
> theData
<-
data.frame(Center=paste("CTR",as.character(round(runif(1000,min=0,max=99),0)),sep=""),
testDone=rbinom(1000,1,0.3))
>
by(theData,theData$Center,function(x) {mean(x$testDone)})
theData$Center:CTR0
[1] 0.5714286
---------------------------------------------------------------------------------------------------------
theData$Center:CTR1
[1] 0.1
---------------------------------------------------------------------------------------------------------
theData$Center:CTR10
[1] 0.1111111
---------------------------------------------------------------------------------------------------------
theData$Center:CTR11
[1] 0.2
---------------------------------------------------------------------------------------------------------
theData$Center:CTR12
[1] 0.3076923
---------------------------------------------------------------------------------------------------------
theData$Center:CTR13
[1] 0.4
---------------------------------------------------------------------------------------------------------
theData$Center:CTR14
[1] 0.25
. . .
Note that Center must be a factor in order
for this to work (according to the documentation for “by”).
Hope this helps and Happy Holidays,
Alan
Alan Hochberg
VP, Research
ProSanos Corporation
225 Market St. Ste. 502,
Harrisburg, PA 17101
Tel
717-635-2124 * Fax 717-635-2575
From:
s-news-owner@lists.biostat.wustl.edu [mailto:s-news-owner@lists.biostat.wustl.edu]
On Behalf Of Hunsicker, Lawrence
Sent: Monday, December 22, 2008
5:06 PM
To: s-news@lists.biostat.wustl.edu
Subject: [S] How to create a
"summary" data frame
Hi,
folks, and Happy Holidays to all:
I
have a data frame with about 11,000 patients
from about 600 different centers. Roughly half of these patients have
had a certain test done, and the other half have not had
the test. But the fraction with the test varies from center to
center. I’d like to add a column to the data frame indicating the
fraction of patients at each center that had the test done. I tried
doing this using the GUI to calculate the average of 0 (no) and 1 (yes) values,
doing the average by center, and saving the result. I get a list with a
single value (the average) for each center, but the center IDs
are
not included, so that I can’t do a merge on center ID. How can I
create a data frame with two columns, the first column being the center number,
and the second being the fraction of patients with the test
done?
Larry
Hunsicker