Hi there,
As Matt Austin has mentioned, the way you have created the data frame means
that all columns are character. To illustrate this we can check the class
of each column in the data frame you created:-
> sapply(tempdata, class) # Data frame created using original code
yearmon h0 h1 name
"character" "character" "character" "character"
If we create the data frame as per Matt Austin's instructions we get the
correct structure:-
> sapply(tempdata, class) # Data frame created using Matt Austin's
code
yearmon h0 h1 name
"integer" "integer" "integer" "factor"
The "by" function takes a data frame and a list of factors, and executes the
supplied function on the sub "data frame" supplied. To illustrate this,
here is the output from the "class" function applied to the data set using
"by":-
> by(tempdata[, 1, drop = F], tempdata$name, class) # Apply class function
tempdata$name:abvol
[1] "data.frame"
-----------------------------------------
tempdata$name:accr
[1] "data.frame"
That means that any function that won't work on a data frame will complain
about the type of data passed in. To use "by", you need to make sure the
function you supply to "by" can execute on a given data frame.
For this type of analysis, I would recommend the use of the "aggregate"
function. The "aggregate" function works in a similar way to "by", but
applies the function provided to the columns of the sub data frame:-
> aggregate(tempdata[, 1:3], tempdata$name, min)
tempdata.name yearmon h0 h1
abvol abvol 200602 1 3
accr accr 199106 3 1
Of course, if we are applying a function to a single column of a dataset (as
in your example) we can more easily use tapply:-
> tapply(tempdata$yearmon, tempdata$name, min)
abvol accr
200602 199106
Hope this helps,
Rich.
S-PLUS and R Consulting and Training
mangosolutions
Tel +44 1249 467 467
Fax +44 1249 467 468
-----Original Message-----
From: s-news-owner@lists.biostat.wustl.edu
[mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Austin, Matt
Sent: 14 April 2006 03:45
To: 'Kurbat, Matt BGI SF'; s-news@lists.biostat.wustl.edu
Subject: Re: [S] "by" question
I'm not sure why the min does not work in the by() function. I have
provided a method that does work.
Also, please be careful about the way you create the data. When you mixed
character and numerics in the vectors, it coerces all to characters. When
you created your dataframe, everything comes out as character. Check out my
example as how to create the dataframe I think you wanted.
tempdata<-as.data.frame(matrix(NA,4,4))
names(tempdata)<-c("yearmon","h0","h1","name")
tempdata[1,]<-c(200602,1,4,"abvol")
tempdata[2,]<-c(200603,2,3,"abvol")
tempdata[3,]<-c(199106,3,2,"accr")
tempdata[4,]<-c(199107,4,1,"accr")
is.numeric(tempdata$yearmon)
tempdata <- data.frame(yearmon= c(200602, 200603, 199106, 199107),
h0 = 1:4,
h1 = 4:1,
name = c(rep(c('abvol', 'accr'), each = 2)))
is.numeric(tempdata$yearmon)
by(tempdata[, 1], tempdata$name, mean)
by(tempdata[, 1], tempdata$name, function(x) min(x))
--Matt
Matt Austin
Statistician
Amgen, Inc
800 9AMGEN9 x77431
805-447-7431
-----Original Message-----
From: s-news-owner@lists.biostat.wustl.edu
[mailto:s-news-owner@lists.biostat.wustl.edu]On Behalf Of Kurbat, Matt
BGI SF
Sent: Thursday, April 13, 2006 7:15 PM
To: s-news@lists.biostat.wustl.edu
Subject: [S] "by" question
Hello,
I've recently returned to Splus after a long layoff (forced SAS treatments).
I'm trying to apply "by" for various input functions and datasets and having
trouble getting it right.
For example, I can create a dataframe as follows:
tempdata<-as.data.frame(matrix(NA,4,4))
names(tempdata)<-c("yearmon","h0","h1","name")
tempdata[1,]<-c(200602,1,4,"abvol")
tempdata[2,]<-c(200603,2,3,"abvol")
tempdata[3,]<-c(199106,3,2,"accr")
tempdata[4,]<-c(199107,4,1,"accr")
With results that look like this:
> tempdata
yearmon h0 h1 name
1 200602 1 4 abvol
2 200603 2 3 abvol
3 199106 3 2 accr
4 199107 4 1 accr
I want to compute various functions (min, max, mean, stdev, kurtosis, etc.)
over the groups defined in the "name" variable using the "by" function.
On some data sets I can get "mean" to work fine but not the others.
On the example above, I get the following results using "mean"
by(as.data.frame(tempdata[,1]), tempdata$name, mean)
> by(as.data.frame(tempdata[, 1]), tempdata$name, mean)
tempdata$name:abvol
[1] NA
-----------------------------------------------------------
tempdata$name:accr
[1] NA
On the example above, I get the following results using "min"
by(as.data.frame(tempdata[,1]), tempdata$name, mean)
> by(as.data.frame(tempdata[, 1]), as.factor(tempdata$name),
min)
Problem in NextMethod(.Generic): Can't find the generic
function "FUN"
Use traceback() to see the call stack
Would someone please explain to me how I can make these work?
I'm running Splus 7.0 on Windows 2000.
Thanks!
Matt
This message and any attachments are confidential, proprietary, and may be
privileged. If this message was misdirected, Barclays Global Investors
(BGI) does not waive any confidentiality or privilege. If you are not the
intended recipient, please notify us immediately and destroy the message
without disclosing its contents to anyone. Any distribution, use or copying
of this e-mail or the information it contains by other than an intended
recipient is unauthorized. The views and opinions expressed in this e-mail
message are the author's own and may not reflect the views and opinions of
BGI, unless the author is authorized by BGI to express such views or
opinions on its behalf. All email sent to or from this address is subject
to electronic storage and review by BGI. Although BGI operates anti-virus
programs, it does not accept responsibility for any damage whatsoever caused
by viruses being passed.
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news
|