I believe "adding up the rows", as you say, is exactly what you requested in
your original email. I usually find it easier to solve a problem as described
than to solve the problem someone meant to describe but didn't. Of course, had
this been a live consulting session I would have tried to make sure that the
real problem and the problem as described were the same thing.
By the way, in the output below, row + 1 is NOT "the summation of the previous
rows". Rather, it looks like the value in row i is the value in row i - 1 plus
either 1106 or 1107.
With my code, out[i, j] is the number of elements in x[, j] that are less than
upb[i, j]. I can only guess what it is you really would like. If, for
example, you really want the counts of values in x that fall in bins with
boundaries given by the elements of upb, that would not require a big change to
the code. You would want to require that the values in each column of upb are
sorted. Then, you could do something like this:
d <- dim(x)
n <- d[1]
du <- dim(upb)
upb.low <- rbind(min(upb) - 1, upb[-du[1], , drop = FALSE])
x <- x[rep(seq(length = n), rep(du[1], n)), , drop = FALSE]
upb <- upb[rep(seq(length = du[1]), n), , drop = FALSE]
upb.low <- upb.low[rep(seq(length = du[1]), n), , drop = FALSE]
cmat <- t(x < upb & x >= upb.low)
dim(cmat) <- c(du[2], du[1], n)
out <- t(rowSums(cmat, na.rm = TRUE, dims = 2))
out
Of course, if I have not chosen the inequality signs used to create cmat in
quite the way you would like, that's easy enough to change. This just affects
what you want to happen when an element of x is on a bin boundary.
Cheers,
Scott
-------------- Original message ----------------------
From: Ita Cirovic <zag_cirovic@yahoo.com>
> Generally this works half way as it adds up the rows , i.e. row+1 is the
> summation of the previous rows. I will try to remedy this and see if it will
> work then. Thanks for the input.
>
> > out
> [,1] [,2]
> [1,] 1106 1107
> [2,] 2213 2213
> [3,] 3320 3320
> [4,] 4426 4426
> [5,] 5532 5532
> [6,] 6639 6639
> [7,] 7745 7745
> [8,] 8852 8852
> [9,] 9958 9958
> [10,] 11064 11064
>
> and the row summations should be of equal (+/- 1) number of observations, so
> the
> first row is ok. The reason for this is the upb matrix is structured in that
> way.Changing the upb matrix will of course change the out matrix but right
> now I
> would like to test with this.
>
> Ita
>
> ----- Original Message ----
> From: SD Chasalow <sbackwards@comcast.net>
> To: S-NEWS <s-news@lists.biostat.wustl.edu>
> Sent: Friday, August 3, 2007 7:35:10 PM
> Subject: Re: [S] count occurrences
>
> Something like this should work:
>
> Suppose dim(x) is c(11000, 2).
>
> d <- dim(x)
> n <- d[1]
> p <- d[2]
> du <- dim(upb)
> x <- x[rep(seq(length = n), rep(du[1], n)), , drop = FALSE]
> upb <- upb[rep(seq(length = du[1]), n), , drop = FALSE]
> cmat <- t(x < upb)
> dim(cmat) <- c(du[2], du[1], n)
> out <- t(rowSums(cmat, na.rm = TRUE, dims = 2))
> out
>
> The concept: expand x and upb so that an element-by-element comparison of the
> two expanded matrices gives you every comparison you wish to make. Do the
> comparison. Then modify the dimensions of the resulting comparison matrix,
> "cmat", so that you easily can sum up the comparisons over the desired
> dimensions. In this case, I transpose, and then transform into a 3D array.
> This allows me to use a single call to rowSums to sum up the comparisons for
> every element of upb.
>
> To follow this kind of thing, I find it really helps to (a) draw lots of
> pictures of the matrices and arrays; and (b) carefully inspect all the
> intermediate objects, with test data for which you easily can see what the
> answers should be.
>
> Cheers,
> Scott
>
> ==================================
> Scott D. Chasalow
> Associate Director
> Statistical Genetics and Biomarkers
> Bristol-Myers Squibb Company
>
> Email: scott.chasalow <AT> bms.com
> ==================================
>
> Ita Cirovic wrote:
> > Given two data sets I would like to count the occurrences of one
> > dependent on the other. For example,
> >
> > upb is defined as follows
> >
> > P4 P5
> > [1,] -0.0406703026 0.02952575
> > [2,] 0.0008282428 0.06102947
> > [3,] 0.0098109756 0.08774035
> > [4,] 0.0183787962 0.11517816
> > [5,] 0.0275845430 0.14899661
> > [6,] 0.0390078215 0.19359835
> > [7,] 0.0541145248 0.26253533
> > [8,] 0.0772375923 0.37398115
> > [9,] 0.1228048769 0.58875809
> > [10,] 0.9980051666 7.41044290
> >
> > then I have a data set which consists of the original values of the
> > variable sP4 and P5 with the number of observations of around 11000.
> > What I would like to do is to count how many observations of P4 are
> > smaller than opb[1,1] and then for upb[2,1] and so on. Also this I would
> > like to do for the other variable P5. The results would be stored in
> > another matrix say obs with dim(obs) = 10x2.
> >
> > I have been trying to do this using for loops and if statements, but was
> > wondering whether there was an easier way, for example some S-PLUS
> > function that would do count given the condition. Thanks.
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu. To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message: unsubscribe s-news
--- Begin Message ---
Generally this works half way as it adds up the rows , i.e. row+1 is the summation of the previous rows. I will try to remedy this and see if it will work then. Thanks for the input. > out [,1] [,2] [1,] 1106 1107 [2,] 2213 2213 [3,] 3320 3320 [4,] 4426 4426 [5,] 5532 5532 [6,] 6639 6639 [7,] 7745 7745 [8,] 8852 8852 [9,] 9958 9958 [10,] 11064 11064 and the row summations should be of equal (+/- 1) number of observations, so the first row is ok. The reason for this is the upb matrix is
structured in that way.Changing the upb matrix will of course change the out matrix but right now I would like to test with this. Ita ----- Original Message ---- From: SD Chasalow <sbackwards@comcast.net> To: S-NEWS <s-news@lists.biostat.wustl.edu> Sent: Friday, August 3, 2007 7:35:10 PM Subject: Re: [S] count occurrences Something like this should work:
Suppose dim(x) is c(11000, 2).
d <- dim(x) n <- d[1] p <- d[2] du <- dim(upb) x <- x[rep(seq(length = n), rep(du[1], n)), , drop = FALSE] upb <- upb[rep(seq(length = du[1]), n), , drop = FALSE] cmat <- t(x < upb) dim(cmat) <- c(du[2], du[1], n) out <- t(rowSums(cmat, na.rm = TRUE, dims = 2)) out
The concept: expand x and upb so that an element-by-element comparison of the two expanded matrices
gives you every comparison you wish to make. Do the comparison. Then modify the dimensions of the resulting comparison matrix, "cmat", so that you easily can sum up the comparisons over the desired dimensions. In this case, I transpose, and then transform into a 3D array. This allows me to use a single call to rowSums to sum up the comparisons for every element of upb.
To follow this kind of thing, I find it really helps to (a) draw lots of pictures of the matrices and arrays; and (b) carefully inspect all the intermediate objects, with test data for which you easily can see what the answers should be.
Cheers, Scott
================================== Scott D. Chasalow Associate Director Statistical Genetics and Biomarkers Bristol-Myers Squibb Company
Email: scott.chasalow <AT> bms.com ==================================
Ita Cirovic wrote: > Given
two data sets I would like to count the occurrences of one > dependent on the other. For example, > > upb is defined as follows > > P4 P5 > [1,] -0.0406703026 0.02952575 > [2,] 0.0008282428 0.06102947 > [3,] 0.0098109756 0.08774035 > [4,] 0.0183787962 0.11517816 > [5,] 0.0275845430 0.14899661 > [6,] 0.0390078215 0.19359835 > [7,] 0.0541145248 0.26253533 > [8,] 0.0772375923 0.37398115 > [9,] 0.1228048769 0.58875809 > [10,] 0.9980051666 7.41044290 > > then I have a data set which consists of the original values of
the > variable sP4 and P5 with the number of observations of around 11000. > What I would like to do is to count how many observations of P4 are > smaller than opb[1,1] and then for upb[2,1] and so on. Also this I would > like to do for the other variable P5. The results would be stored in > another matrix say obs with dim(obs) = 10x2. > > I have been trying to do this using for loops and if statements, but was > wondering whether there was an easier way, for example some S-PLUS > function that would do count given the condition. Thanks. -------------------------------------------------------------------- This message was distributed by s-news@lists.biostat.wustl.edu. To unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with the BODY of the message: unsubscribe s-news
Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us.
--- End Message ---
|