s-news
[Top] [All Lists]

Re: count occurrences

To: s-news@lists.biostat.wustl.edu (S-NEWS)
Subject: Re: count occurrences
From: sbackwards@comcast.net (SD Chasalow)
Date: Fri, 03 Aug 2007 17:35:10 +0000
Something like this should work:

Suppose dim(x) is c(11000, 2).

d <- dim(x)
n <- d[1]
p <- d[2]
du <- dim(upb)
x <- x[rep(seq(length = n), rep(du[1], n)), , drop = FALSE] 
upb <- upb[rep(seq(length = du[1]), n), , drop = FALSE]
cmat <- t(x < upb)
dim(cmat) <- c(du[2], du[1], n)
out <- t(rowSums(cmat, na.rm = TRUE, dims = 2))
out

The concept: expand x and upb so that an element-by-element comparison of the 
two expanded matrices gives you every comparison you wish to make.  Do the 
comparison.  Then modify the dimensions of the resulting comparison matrix, 
"cmat", so that you easily can sum up the comparisons over the desired 
dimensions.  In this case, I transpose, and then transform into a 3D array.  
This allows me to use a single call to rowSums to sum up the comparisons for 
every element of upb.

To follow this kind of thing, I find it really helps to (a) draw lots of 
pictures of the matrices and arrays; and (b) carefully inspect all the 
intermediate objects, with test data for which you easily can see what the 
answers should be.

Cheers,
Scott

==================================
Scott D. Chasalow
Associate Director
Statistical Genetics and Biomarkers
Bristol-Myers Squibb Company

Email: scott.chasalow <AT> bms.com
================================== 

Ita Cirovic wrote:
> Given two data sets I would like to count the occurrences of one 
> dependent on the other. For example,
> 
> upb is defined as follows
> 
>                 P4         P5
>  [1,] -0.0406703026 0.02952575
>  [2,]  0.0008282428 0.06102947
>  [3,]  0.0098109756 0.08774035
>  [4,]  0.0183787962 0.11517816
>  [5,]  0.0275845430 0.14899661
>  [6,]  0.0390078215 0.19359835
>  [7,]  0.0541145248 0.26253533
>  [8,]  0.0772375923 0.37398115
>  [9,]  0.1228048769 0.58875809
> [10,]  0.9980051666 7.41044290
> 
> then I have a data set which consists of the original values of the 
> variable sP4 and P5 with the number of observations of around 11000. 
> What I would like to do is to count how many observations of P4 are 
> smaller than opb[1,1] and then for upb[2,1] and so on. Also this I would 
> like to do for the other variable P5. The results would be stored in 
> another matrix say obs with dim(obs) = 10x2.
> 
> I have been trying to do this using for loops and if statements, but was 
> wondering whether there was an easier way, for example some S-PLUS 
> function that would do count given the condition. Thanks.

<Prev in Thread] Current Thread [Next in Thread>