s-news
[Top] [All Lists]

Re: count occurrences

To: Ita Cirovic <zag_cirovic@yahoo.com>, S-NEWS <s-news@lists.biostat.wustl.edu>
Subject: Re: count occurrences
From: sbackwards@comcast.net (SD Chasalow)
Date: Mon, 06 Aug 2007 17:59:51 +0000
I believe "adding up the rows", as you say, is exactly what you requested in 
your original email.  I usually find it easier to solve a problem as described 
than to solve the problem someone meant to describe but didn't.  Of course, had 
this been a live consulting session I would have tried to make sure that the 
real problem and the problem as described were the same thing.

By the way, in the output below, row + 1 is NOT "the summation of the previous 
rows".  Rather, it looks like the value in row i is the value in row i - 1 plus 
either 1106 or 1107.

With my code, out[i, j] is the number of elements in x[, j] that are less than 
upb[i, j].  I can only guess what it is you really would like.  If, for 
example, you really want the counts of values in x that fall in bins with 
boundaries given by the elements of upb, that would not require a big change to 
the code.  You would want to require that the values in each column of upb are 
sorted.  Then, you could do something like this:

d <- dim(x)
n <- d[1]
du <- dim(upb)
upb.low <- rbind(min(upb) - 1, upb[-du[1], , drop = FALSE])
x <- x[rep(seq(length = n), rep(du[1], n)), , drop = FALSE]
upb <- upb[rep(seq(length = du[1]), n), , drop = FALSE]
upb.low <- upb.low[rep(seq(length = du[1]), n), , drop = FALSE]
cmat <- t(x < upb & x >= upb.low)
dim(cmat) <- c(du[2], du[1], n)
out <- t(rowSums(cmat, na.rm = TRUE, dims = 2))
out

Of course, if I have not chosen the inequality signs used to create cmat in 
quite the way you would like, that's easy enough to change.  This just affects 
what you want to happen when an element of x is on a bin boundary.

Cheers,
Scott 
 
 -------------- Original message ----------------------
From: Ita Cirovic <zag_cirovic@yahoo.com>
> Generally this works half way as it adds up the rows , i.e. row+1 is the 
> summation of the previous rows. I will try to remedy this and see if it will 
> work then. Thanks for the input.
> 
> > out
>        [,1]  [,2] 
>  [1,]  1106  1107
>  [2,]  2213  2213
>  [3,]  3320  3320
>  [4,]  4426  4426
>  [5,]  5532  5532
>  [6,]  6639  6639
>  [7,]  7745  7745
>  [8,]  8852  8852
>  [9,]  9958  9958
> [10,] 11064 11064
> 
> and the row summations should be of equal (+/- 1) number of observations, so 
> the 
> first row is ok. The reason for this is the upb matrix is structured in that 
> way.Changing the upb matrix will of course change the out matrix but right 
> now I 
> would like to test with this.
> 
> Ita
> 
> ----- Original Message ----
> From: SD Chasalow <sbackwards@comcast.net>
> To: S-NEWS <s-news@lists.biostat.wustl.edu>
> Sent: Friday, August 3, 2007 7:35:10 PM
> Subject: Re: [S] count occurrences
> 
> Something like this should work:
> 
> Suppose dim(x) is c(11000, 2).
> 
> d <- dim(x)
> n <- d[1]
> p <- d[2]
> du <- dim(upb)
> x <- x[rep(seq(length = n), rep(du[1], n)), , drop = FALSE] 
> upb <- upb[rep(seq(length = du[1]), n), , drop = FALSE]
> cmat <- t(x < upb)
> dim(cmat) <- c(du[2], du[1], n)
> out <- t(rowSums(cmat, na.rm = TRUE, dims = 2))
> out
> 
> The concept: expand x and upb so that an element-by-element comparison of the 
> two expanded matrices gives you every comparison you wish to make.  Do the 
> comparison.  Then modify the dimensions of the resulting comparison matrix, 
> "cmat", so that you easily can sum up the comparisons over the desired 
> dimensions.  In this case, I transpose, and then transform into a 3D array.  
> This allows me to use a single call to rowSums to sum up the comparisons for 
> every element of upb.
> 
> To follow this kind of thing, I find it really helps to (a) draw lots of 
> pictures of the matrices and arrays; and (b) carefully inspect all the 
> intermediate objects, with test data for which you easily can see what the 
> answers should be.
> 
> Cheers,
> Scott
> 
> ==================================
> Scott D. Chasalow
> Associate Director
> Statistical Genetics and Biomarkers
> Bristol-Myers Squibb Company
> 
> Email: scott.chasalow <AT> bms.com
> ================================== 
> 
> Ita Cirovic wrote:
> > Given two data sets I would like to count the occurrences of one 
> > dependent on the other. For example,
> > 
> > upb is defined as follows
> > 
> >                 P4         P5
> >  [1,] -0.0406703026 0.02952575
> >  [2,]  0.0008282428 0.06102947
> >  [3,]  0.0098109756 0.08774035
> >  [4,]  0.0183787962 0.11517816
> >  [5,]  0.0275845430 0.14899661
> >  [6,]  0.0390078215 0.19359835
> >  [7,]  0.0541145248 0.26253533
> >  [8,]  0.0772375923 0.37398115
> >  [9,]  0.1228048769 0.58875809
> > [10,]  0.9980051666 7.41044290
> > 
> > then I have a data set which consists of the original values of the 
> > variable sP4 and P5 with the number of observations of around 11000. 
> > What I would like to do is to count how many observations of P4 are 
> > smaller than opb[1,1] and then for upb[2,1] and so on. Also this I would 
> > like to do for the other variable P5. The results would be stored in 
> > another matrix say obs with dim(obs) = 10x2.
> > 
> > I have been trying to do this using for loops and if statements, but was 
> > wondering whether there was an easier way, for example some S-PLUS 
> > function that would do count given the condition. Thanks.
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news

--- Begin Message ---
To: SD Chasalow <sbackwards@comcast.net>, S-NEWS <s-news@lists.biostat.wustl.edu>
Subject: Re: [S] count occurrences
From: Ita Cirovic <zag_cirovic@yahoo.com>
Date: Fri, 3 Aug 2007 18:05:00 +0000
Generally this works half way as it adds up the rows , i.e. row+1 is the summation of the previous rows. I will try to remedy this and see if it will work then. Thanks for the input.

> out
       [,1]  [,2]
 [1,]  1106  1107
 [2,]  2213  2213
 [3,]  3320  3320
 [4,]  4426  4426
 [5,]  5532  5532
 [6,]  6639  6639
 [7,]  7745  7745
 [8,]  8852  8852
 [9,]  9958  9958
[10,] 11064 11064

and the row summations should be of equal (+/- 1) number of observations, so the first row is ok. The reason for this is the upb matrix is structured in that way.Changing the upb matrix will of course change the out matrix but right now I would like to test with this.

Ita

----- Original Message ----
From: SD Chasalow <sbackwards@comcast.net>
To: S-NEWS <s-news@lists.biostat.wustl.edu>
Sent: Friday, August 3, 2007 7:35:10 PM
Subject: Re: [S] count occurrences

Something like this should work:

Suppose dim(x) is c(11000, 2).

d <- dim(x)
n <- d[1]
p <- d[2]
du <- dim(upb)
x <- x[rep(seq(length = n), rep(du[1], n)), , drop = FALSE]
upb <- upb[rep(seq(length = du[1]), n), , drop = FALSE]
cmat <- t(x < upb)
dim(cmat) <- c(du[2], du[1], n)
out <- t(rowSums(cmat, na.rm = TRUE, dims = 2))
out

The concept: expand x and upb so that an element-by-element comparison of the two expanded matrices gives you every comparison you wish to make.  Do the comparison.  Then modify the dimensions of the resulting comparison matrix, "cmat", so that you easily can sum up the comparisons over the desired dimensions.  In this case, I transpose, and then transform into a 3D array.  This allows me to use a single call to rowSums to sum up the comparisons for every element of upb.

To follow this kind of thing, I find it really helps to (a) draw lots of pictures of the matrices and arrays; and (b) carefully inspect all the intermediate objects, with test data for which you easily can see what the answers should be.

Cheers,
Scott

==================================
Scott D. Chasalow
Associate Director
Statistical Genetics and Biomarkers
Bristol-Myers Squibb Company

Email: scott.chasalow <AT> bms.com
==================================

Ita Cirovic wrote:
> Given two data sets I would like to count the occurrences of one
> dependent on the other. For example,
>
> upb is defined as follows
>
>                 P4         P5
>  [1,] -0.0406703026 0.02952575
>  [2,]  0.0008282428 0.06102947
>  [3,]  0.0098109756 0.08774035
>  [4,]  0.0183787962 0.11517816
>  [5,]  0.0275845430 0.14899661
>  [6,]  0.0390078215 0.19359835
>  [7,]  0.0541145248 0.26253533
>  [8,]  0.0772375923 0.37398115
>  [9,]  0.1228048769 0.58875809
> [10,]  0.9980051666 7.41044290
>
> then I have a data set which consists of the original values of the
> variable sP4 and P5 with the number of observations of around 11000.
> What I would like to do is to count how many observations of P4 are
> smaller than opb[1,1] and then for upb[2,1] and so on. Also this I would
> like to do for the other variable P5. The results would be stored in
> another matrix say obs with dim(obs) = 10x2.
>
> I have been trying to do this using for loops and if statements, but was
> wondering whether there was an easier way, for example some S-PLUS
> function that would do count given the condition. Thanks.
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news



Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us.
--- End Message ---
<Prev in Thread] Current Thread [Next in Thread>