s-news
[Top] [All Lists]

Re: YADMQ - Yet Another Data Manipulation Question

To: "Thompson, David (MNR)" <David.John.Thompson@ontario.ca>
Subject: Re: YADMQ - Yet Another Data Manipulation Question
From: David L Lorenz <lorenz@usgs.gov>
Date: Fri, 29 Jun 2007 08:50:46 -0500
Cc: "s-news" <s-news@lists.biostat.wustl.edu>, s-news-owner@lists.biostat.wustl.edu
In-reply-to: <ECF21B71808ECF4F8918C57EDBEE121DD1585A@CTSPITDCEMMVX11.cihs.ad.gov.on.ca>

Dave,
  When I read the first sentence of your query, I immediately starting thinking of graphs. Of course, it quickly became obvious that plot and sub-plot did not refer to a graph!
  The restructuring is not easy, but I do have a suggestion for an approach. I don't have the time to develop any coding but here's basically an approach I'd take.
1. Use the by() function to split the data set by oplt and rplt. The by() function will pass the data set to a function, outlined in the following steps.
2. From the subsetted data, extract the data in the spN and hcN columns and create another column of the height classes to match the data extracted. Then, remove missing values. If the data were packed into a data.frame it would look like this (for oplt=1 and rplt=1).
     spp Kount  hc
sp11  Or     2 hc1
sp12  Iw     2 hc1
sp13  Mh     3 hc1
sp21  Iw     4 hc2
sp22  Be     2 hc2
sp31  Iw     2 hc3
sp41  Iw     2 hc4
sp51  He     2 hc5
sp52  Mh     2 hc5
  Note that the data that I called spp and hc should be of class factor and Kount should be integer.
3. Create a matrix of 5 columns (for the height classes) and the number of species rows. Do you want missing values or 0s? If I were doing the analysis I would want 0 to indicate that no members of that species were observed in a particular height class rather than NA. For me, NA implies something different. For the current subset, the matrix would have 5 rows and 5 columns.
4. Extract the integer values (that's why they need to be factors) of spp and hc (my names) and cbind them into a matrix to be used as a reference to the row and column of the matrix created in step 3. The data for those referenced cells comes from Kount (my name). Something like this:
mat[cbind(as.integer(spp), as.integer(hc))] <- Kount
5. Create vectors for oplt and rplt by copying those values to the number of rows in the matrix.
6. Create a data.frame from the oplt and rplt vectors, the levels of spp (my name), the matrix, and do something with com. Return this data.frame.
7. Use do.call("rbind", output of by()) to construct a data set like newdat.

  Good luck.
Dave




"Thompson, David (MNR)" <David.John.Thompson@ontario.ca>
Sent by: s-news-owner@lists.biostat.wustl.edu

06/28/2007 01:01 PM

To
"s-news" <s-news@lists.biostat.wustl.edu>
cc
Subject
[S] YADMQ - Yet Another Data Manipulation Question





Hello,

I have 54 plots (oplt) each containing 10 sub-plots (rplt) within which
stem counts were made for each species found in five height classes. The
data are organized (and I use the term very loosely ;-}) with a species
column (sp1-sp5) for each of the five height class columns (hc1-hc5). I
would like to pool the species codes into a single column (spp) with the
associated stem counts remaining in the hc1-hc5 columns but aligned in
the proper rows. This will expand the existing number of rows. Also,
there is ABSOLUTELY NO form of order in how each element was entered.
And there are many blank cells as only species found in each height
class at each location were recorded.

A sample of the two data arrangements are included below. Suggestions?
Also, would it be possible to generate a solution that would work in
both S and R?

Many thanks in advance, DaveT.

olddat <- data.frame(
rbind( c(1, 1, 'Or', 2, 'Iw', 4, 'Iw', 2, 'Iw', 2, 'He', 2, NA),
                c(1, 1, 'Iw', 2, 'Be', 2, NA, NA, NA, NA, 'Mh', 2, NA),
                c(1, 1, 'Mh', 3, NA, NA, NA, NA, NA, NA, NA, NA, NA),
                c(1, 2, 'Be', 2, 'Iw', 4, 'Iw', 4, 'Iw', 1, 'He', 2, NA),
                c(1, 2, 'Mh', 3, 'Mh', 2, NA, NA, NA, NA, 'Mh', 1, NA),
                c(1, 2, 'Iw', 4, NA, NA, NA, NA, NA, NA, NA, NA, NA),
                c(1, 2, 'Or', 1, NA, NA, NA, NA, NA, NA, NA, NA, NA)))
names(olddat) <- c('oplt', 'rplt', 'sp1', 'hc1', 'sp2', 'hc2', 'sp3',
'hc3', 'sp4', 'hc4', 'sp5', 'hc5', 'com')

newdat <- data.frame(rbind( c(1, 1, 'Or', 2, NA, NA, NA, NA, NA),
                c(1, 1, 'Iw', 2, 4, 2, 2, NA, NA),
                c(1, 1, 'Mh', 3, NA, NA, NA, 2, NA),
                c(1, 1, 'Be', NA, 2, NA, NA, NA, NA),
                c(1, 1, 'He', NA, NA, NA, NA, 2, NA),
                c(1, 2, 'Be', 2, NA, NA, NA, NA, NA),
                c(1, 2, 'Mh', 3, 2, NA, NA, 1, NA),
                c(1, 2, 'Iw', 4, 4, 4, 1, NA, NA),
                c(1, 2, 'Or', 1, NA, NA, NA, NA, NA),
                c(1, 2, 'He', NA, NA, NA, NA, 2, NA)))
names(newdat) <- c('oplt', 'rplt', 'spp', 'hc1', 'hc2', 'hc3', 'hc4',
'hc5', 'com')

> olddat
 oplt rplt sp1 hc1  sp2  hc2  sp3  hc3  sp4  hc4  sp5  hc5  com
1    1    1  Or   2   Iw    4   Iw    2   Iw    2   He    2 <NA>
2    1    1  Iw   2   Be    2 <NA> <NA> <NA> <NA>   Mh    2 <NA>
3    1    1  Mh   3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4    1    2  Be   2   Iw    4   Iw    4   Iw    1   He    2 <NA>
5    1    2  Mh   3   Mh    2 <NA> <NA> <NA> <NA>   Mh    1 <NA>
6    1    2  Iw   4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7    1    2  Or   1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
> newdat
  oplt rplt spp  hc1  hc2  hc3  hc4  hc5  com
1     1    1  Or    2 <NA> <NA> <NA> <NA> <NA>
2     1    1  Iw    2    4    2    2 <NA> <NA>
3     1    1  Mh    3 <NA> <NA> <NA>    2 <NA>
4     1    1  Be <NA>    2 <NA> <NA> <NA> <NA>
5     1    1  He <NA> <NA> <NA> <NA>    2 <NA>
6     1    2  Be    2 <NA> <NA> <NA> <NA> <NA>
7     1    2  Mh    3    2 <NA> <NA>    1 <NA>
8     1    2  Iw    4    4    4    1 <NA> <NA>
9     1    2  Or    1 <NA> <NA> <NA> <NA> <NA>
10    1    2  He <NA> <NA> <NA> <NA>    2 <NA>

*************************************
Silviculture Data Analyst
Ontario Forest Research Institute
Ontario Ministry of Natural Resources
david.john.thompson@ontario.ca
http://ofri.mnr.gov.on.ca
*************************************
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>