| To: | "Thompson, David (MNR)" <David.John.Thompson@ontario.ca> |
|---|---|
| Subject: | Re: YADMQ - Yet Another Data Manipulation Question |
| From: | David L Lorenz <lorenz@usgs.gov> |
| Date: | Fri, 29 Jun 2007 08:50:46 -0500 |
| Cc: | "s-news" <s-news@lists.biostat.wustl.edu>, s-news-owner@lists.biostat.wustl.edu |
| In-reply-to: | <ECF21B71808ECF4F8918C57EDBEE121DD1585A@CTSPITDCEMMVX11.cihs.ad.gov.on.ca> |
|
Dave, When I read the first sentence of your query, I immediately starting thinking of graphs. Of course, it quickly became obvious that plot and sub-plot did not refer to a graph! The restructuring is not easy, but I do have a suggestion for an approach. I don't have the time to develop any coding but here's basically an approach I'd take. 1. Use the by() function to split the data set by oplt and rplt. The by() function will pass the data set to a function, outlined in the following steps. 2. From the subsetted data, extract the data in the spN and hcN columns and create another column of the height classes to match the data extracted. Then, remove missing values. If the data were packed into a data.frame it would look like this (for oplt=1 and rplt=1). spp Kount hc sp11 Or 2 hc1 sp12 Iw 2 hc1 sp13 Mh 3 hc1 sp21 Iw 4 hc2 sp22 Be 2 hc2 sp31 Iw 2 hc3 sp41 Iw 2 hc4 sp51 He 2 hc5 sp52 Mh 2 hc5 Note that the data that I called spp and hc should be of class factor and Kount should be integer. 3. Create a matrix of 5 columns (for the height classes) and the number of species rows. Do you want missing values or 0s? If I were doing the analysis I would want 0 to indicate that no members of that species were observed in a particular height class rather than NA. For me, NA implies something different. For the current subset, the matrix would have 5 rows and 5 columns. 4. Extract the integer values (that's why they need to be factors) of spp and hc (my names) and cbind them into a matrix to be used as a reference to the row and column of the matrix created in step 3. The data for those referenced cells comes from Kount (my name). Something like this: mat[cbind(as.integer(spp), as.integer(hc))] <- Kount 5. Create vectors for oplt and rplt by copying those values to the number of rows in the matrix. 6. Create a data.frame from the oplt and rplt vectors, the levels of spp (my name), the matrix, and do something with com. Return this data.frame. 7. Use do.call("rbind", output of by()) to construct a data set like newdat. Good luck. Dave
Hello, I have 54 plots (oplt) each containing 10 sub-plots (rplt) within which stem counts were made for each species found in five height classes. The data are organized (and I use the term very loosely ;-}) with a species column (sp1-sp5) for each of the five height class columns (hc1-hc5). I would like to pool the species codes into a single column (spp) with the associated stem counts remaining in the hc1-hc5 columns but aligned in the proper rows. This will expand the existing number of rows. Also, there is ABSOLUTELY NO form of order in how each element was entered. And there are many blank cells as only species found in each height class at each location were recorded. A sample of the two data arrangements are included below. Suggestions? Also, would it be possible to generate a solution that would work in both S and R? Many thanks in advance, DaveT. olddat <- data.frame( rbind( c(1, 1, 'Or', 2, 'Iw', 4, 'Iw', 2, 'Iw', 2, 'He', 2, NA), c(1, 1, 'Iw', 2, 'Be', 2, NA, NA, NA, NA, 'Mh', 2, NA), c(1, 1, 'Mh', 3, NA, NA, NA, NA, NA, NA, NA, NA, NA), c(1, 2, 'Be', 2, 'Iw', 4, 'Iw', 4, 'Iw', 1, 'He', 2, NA), c(1, 2, 'Mh', 3, 'Mh', 2, NA, NA, NA, NA, 'Mh', 1, NA), c(1, 2, 'Iw', 4, NA, NA, NA, NA, NA, NA, NA, NA, NA), c(1, 2, 'Or', 1, NA, NA, NA, NA, NA, NA, NA, NA, NA))) names(olddat) <- c('oplt', 'rplt', 'sp1', 'hc1', 'sp2', 'hc2', 'sp3', 'hc3', 'sp4', 'hc4', 'sp5', 'hc5', 'com') newdat <- data.frame(rbind( c(1, 1, 'Or', 2, NA, NA, NA, NA, NA), c(1, 1, 'Iw', 2, 4, 2, 2, NA, NA), c(1, 1, 'Mh', 3, NA, NA, NA, 2, NA), c(1, 1, 'Be', NA, 2, NA, NA, NA, NA), c(1, 1, 'He', NA, NA, NA, NA, 2, NA), c(1, 2, 'Be', 2, NA, NA, NA, NA, NA), c(1, 2, 'Mh', 3, 2, NA, NA, 1, NA), c(1, 2, 'Iw', 4, 4, 4, 1, NA, NA), c(1, 2, 'Or', 1, NA, NA, NA, NA, NA), c(1, 2, 'He', NA, NA, NA, NA, 2, NA))) names(newdat) <- c('oplt', 'rplt', 'spp', 'hc1', 'hc2', 'hc3', 'hc4', 'hc5', 'com') > olddat oplt rplt sp1 hc1 sp2 hc2 sp3 hc3 sp4 hc4 sp5 hc5 com 1 1 1 Or 2 Iw 4 Iw 2 Iw 2 He 2 <NA> 2 1 1 Iw 2 Be 2 <NA> <NA> <NA> <NA> Mh 2 <NA> 3 1 1 Mh 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 4 1 2 Be 2 Iw 4 Iw 4 Iw 1 He 2 <NA> 5 1 2 Mh 3 Mh 2 <NA> <NA> <NA> <NA> Mh 1 <NA> 6 1 2 Iw 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 7 1 2 Or 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > newdat oplt rplt spp hc1 hc2 hc3 hc4 hc5 com 1 1 1 Or 2 <NA> <NA> <NA> <NA> <NA> 2 1 1 Iw 2 4 2 2 <NA> <NA> 3 1 1 Mh 3 <NA> <NA> <NA> 2 <NA> 4 1 1 Be <NA> 2 <NA> <NA> <NA> <NA> 5 1 1 He <NA> <NA> <NA> <NA> 2 <NA> 6 1 2 Be 2 <NA> <NA> <NA> <NA> <NA> 7 1 2 Mh 3 2 <NA> <NA> 1 <NA> 8 1 2 Iw 4 4 4 1 <NA> <NA> 9 1 2 Or 1 <NA> <NA> <NA> <NA> <NA> 10 1 2 He <NA> <NA> <NA> <NA> 2 <NA> ************************************* Silviculture Data Analyst Ontario Forest Research Institute Ontario Ministry of Natural Resources david.john.thompson@ontario.ca http://ofri.mnr.gov.on.ca ************************************* -------------------------------------------------------------------- This message was distributed by s-news@lists.biostat.wustl.edu. To unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with the BODY of the message: unsubscribe s-news |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Next by Date: | Quick SPlus Question on Regular Expressions.., Santosh |
|---|---|
| Next by Thread: | How to add text component to graph, Dean Monroe |
| Indexes: | [Date] [Thread] [Top] [All Lists] |