Dear S-Plus Users,
Thank you all for your prompt response to my question on data manipulation.
I received 14 responses, some include the code as well as the instruction
and constructive suggestion. Stated below are the two "simplest" one line
S-plus command :
tapply(DBH, Plot, function(x) mean(rev(sort(x))[1:2]))
sapply(split(DBH, Plot), function(x)mean(rev(sort(x))[1:2]))
The responses from each of the expert is shown below:
Thanks again for your kind and helpful assistance.
Abd Rahman Kassim
Hill Forest Silviculture
Forest Research Institute Malaysia (FRIM)
Kepong 52109
Kuala Lumpur
=========================
Use tapply() with the following small function:
tapply(DBH, Plot, function(x) mean(rev(sort(x))[1:2]))
=========================
split your data by plot
listofdata_split(DBH,Plot)
That creates a list containing all DBH in different subsets, so that you
can use lapply on that list. lapply is used to compute the same function on
each element of a list.
* Apply a function that calc your mean for each element of the list
lapply(listofdata, FUN=function(x)return(mean(rev(sort(x))[1:2])))
___
So you can do only one call
> cbind(Plot,DBH)
Plot DBH
[1,] 1 2
[2,] 1 5
[3,] 1 9
[4,] 1 3
[5,] 2 15
[6,] 2 9
[7,] 2 7
[8,] 2 10
> lapply(split(DBH,Plot),FUN=function(x)return(mean(rev(sort(x))[1:2])))
$"1":
[1] 7
$"2":
[1] 12.5
==========================
foo <- function(x, n=2){
x.sorted <- rev(sort(x))
mean(x.sorted[1:n])
}
## Example:
dat <- data.frame(Plot=rep(1:2, c(4,4)),
DBH=c(2,5,9,3,15,9,7,10))
tapply(dat$DBH, dat$Plot, FUN=foo)
1 2
7 12.5
=============================
Try (I have not tried):
by(df$DBH, as.factor(df$Plot), fun=function(x){mean(rev(sort(x))[1:2])})
(this does not handle the case where there is only one DBH for one Plot.
==============================
Here is one way of doing it with 'tapply'. The 'min' function takes care
of the
case where there is one 1 value of dbh in plot.
> a.1_list(plot=c(1,1,1,1,2,2,2,2),dbh=c(2,5,9,3,15,9,7,10))
> a.1
$plot
[1] 1 1 1 1 2 2 2 2
$dbh
[1] 2 5 9 3 15 9 7 10
>
tapply(a.1$dbh,a.1$plot,function(x){mean(rev(sort(x))[1:min(2,length(x))
])})
1 2
7.0 12.5
============================
tapply(DBH,Plot,function(x){mean(rev(sort(x))[1:2])})
1 2
7 12.5
============================
tapply(DBH, Plot,
function(x){mean(rev(sort(x))[1:2])}
)
============================
> foo
Plot DBH
1 1 6
2 1 1
3 1 9
4 2 4
5 2 12
6 2 1
>
tapply(foo$DBH,list(foo$Plot),mean.2)
1 2
7.5 8
======================
Hi. Here's a good opportunity to use the tapply() function. First, write
the
function to compute the mean of the two largest entries in a vector:
twomean <- function (x) {
x <- rev(sort (x))
mean (x[1:2])
}
Now apply that function to each group. If your data is in a matrix named
"death", this will work:
> tapply (death[,"DBH"], death[,"Plot"], twomean)
1 2
7 12.5
=================================
This may not be the fastest, or the cleverest, but it seems to work:
The brute force way is to sort DBHs for each plot, reverse the resultant
vector, and get the mean of the first 2 elements. Alternatively, you
could get the length of each vector and take the mean of the last two
elements. For example (2nd approach)
plotnames <- unique(my.data$plot)
len1 <- length(plotnames)
my.means <- rep(0,len1)
for (i in 1:len1)
{
temp.vec <- sort(my.data$DBH[my.data$plot==plotnames[i]])
len2 <- length(temp.vec)
my.means[i] <- mean(temp.vec[len2-1:len2])
}
I think this should work if your dataset isn't too big. No doubt other
subscribers would have more elegant solutions, however..
=====================
1) use tapply(DBH,plot,function(x) mean(rev(sort(x))[1:2]))
(2) reading the manual or one of the many available books and learn the
S language (particularly since someone already provided you with an
example of using tapply for one of your previous questions).
=====================
>Plot <- rep(1:2,c(4,4))
>DBH <- c(2,5,9,3,15,9,7,10)
>a <- tapply(DBH,Plot,sort)
>n <- tapply(DBH,Plot,length)
>N <- unique(Plot)
> for(i in 1:length(N)){
+ print(mean(a[[i]][n[i]:(n[i]-1)]))
+ }
[1] 7
[1] 12.5
you could easily wrap this as a function.
=======================
sapply(split(DBH, Plot), function(x)mean(rev(sort(x))[1:2]))
1 2
7 12.5
If you have <2 samples in some plot then this will return NA for
that plot. Change 1:2 to seq(min(2,length(x))) to take the mean
of the whole sample if the sample size is less than 2 in a plot.
=======================
The following code is probably not the most elegant solution to your
problem but should do what you want, provided there are at least two
elements within each level of Plot.
t1 <- data.frame(Plot=rep(1:2,c(4,4)),DBH=c(2,5,9,3,15,9,7,10))
t1 <- t1[order(t1$Plot,t1$DBH),]
t2 <- table(t1$Plot)
t3 <- t1[sort(c(outer(cumsum(t2),0:1,"-"))),]
tapply(t3$DBH,t3$Plot,mean)
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news
|