I have not had time yet to check if this works, but I have already received
a request for me to post any answers that I would have received to my
question.
I will look at this shortly and in the meantime, all my thanks to Nick
Ellis.
Richard
-----Original Message-----
From: Nick Ellis [mailto:nick.ellis@marine.csiro.au]
Sent: Friday, February 23, 2001 1:35 AM
To: 'Boylan, Richard'
Subject: RE: [S] merging two data sets
Reshaping data often crops up, and I have two functions that sometimes come
in handy: reshape and table.trans. See bottom for definitions and
attachments for help files.
Your example still had a typo so I have created an example dat2 with unique
column names and exactly two observations per year.
First reshape dat1
> dat1
id year.start year.end obs1 obs2 obs3
1 AL 1950 1952 1 2 3
2 AL 1953 1954 4 5 6
3 AK 1950 1954 21 22 23
> n <- rep(1:nrow(dat1),dat1$year.end-dat1$year.start+1)
> y <- unlist(apply(dat1[,2:3],1,function(x) x[1]:x[2]))
> dat1r <- data.frame(dat1[n,],year=y)
> dat1r
id year.start year.end obs1 obs2 obs3 year
X1 AL 1950 1952 1 2 3 1950
X11 AL 1950 1952 1 2 3 1951
X12 AL 1950 1952 1 2 3 1952
X2 AL 1953 1954 4 5 6 1953
X23 AL 1953 1954 4 5 6 1954
X3 AK 1950 1954 21 22 23 1950
X34 AK 1950 1954 21 22 23 1951
X35 AK 1950 1954 21 22 23 1952
X36 AK 1950 1954 21 22 23 1953
X37 AK 1950 1954 21 22 23 1954
Now reshape what I think you mean by dat2
> dat2
id Y1950.obs1 Y1951.obs1 Y1952.obs1 Y1953.obs1 Y1954.obs1 Y1950.obs2
Y1951.obs2 Y1952.obs2 Y1953.obs2 Y1954.obs2
AL AL 7 8 9 10 11 12
13 14 15 16
AK AK 17 18 19 20 21 22
23 24 25 26
> dat2r <- reshape(dat2,1,2:11)
> dat2r <- cbind(dat2r,unpaste(dat2r$label,"."))
> dat2r
id label value X2.1 X2.2
1 AL Y1950.obs1 7 Y1950 obs1
2 AK Y1950.obs1 17 Y1950 obs1
3 AL Y1951.obs1 8 Y1951 obs1
4 AK Y1951.obs1 18 Y1951 obs1
5 AL Y1952.obs1 9 Y1952 obs1
6 AK Y1952.obs1 19 Y1952 obs1
7 AL Y1953.obs1 10 Y1953 obs1
8 AK Y1953.obs1 20 Y1953 obs1
9 AL Y1954.obs1 11 Y1954 obs1
10 AK Y1954.obs1 21 Y1954 obs1
11 AL Y1950.obs2 12 Y1950 obs2
12 AK Y1950.obs2 22 Y1950 obs2
13 AL Y1951.obs2 13 Y1951 obs2
14 AK Y1951.obs2 23 Y1951 obs2
15 AL Y1952.obs2 14 Y1952 obs2
16 AK Y1952.obs2 24 Y1952 obs2
17 AL Y1953.obs2 15 Y1953 obs2
18 AK Y1953.obs2 25 Y1953 obs2
19 AL Y1954.obs2 16 Y1954 obs2
20 AK Y1954.obs2 26 Y1954 obs2
Want to reshape this so obs1 and obs2 are in separate columns
> dat2rr <- table.trans(dat2r$value,dat2r[,c(1,4,5)])
> dat2rr$year <- as.numeric(substring(dat2rr[,2],2))
> dat2rr
id X2.1 obs1 obs2 year
1 AL Y1950 7 12 1950
2 AK Y1950 17 22 1950
3 AL Y1951 8 13 1951
4 AK Y1951 18 23 1951
5 AL Y1952 9 14 1952
6 AK Y1952 19 24 1952
7 AL Y1953 10 15 1953
8 AK Y1953 20 25 1953
9 AL Y1954 11 16 1954
10 AK Y1954 21 26 1954
Now merge with dat1r
> merge(dat1r,dat2rr,by.x=c(1,7),by.y=c(1,5),all=T)
id year year.start year.end obs1.x obs2.x obs3 X2.1 obs1.y obs2.y
1 AK 1950 1950 1954 21 22 23 Y1950 17 22
2 AK 1951 1950 1954 21 22 23 Y1951 18 23
3 AK 1952 1950 1954 21 22 23 Y1952 19 24
4 AK 1953 1950 1954 21 22 23 Y1953 20 25
5 AK 1954 1950 1954 21 22 23 Y1954 21 26
6 AL 1950 1950 1952 1 2 3 Y1950 7 12
7 AL 1951 1950 1952 1 2 3 Y1951 8 13
8 AL 1952 1950 1952 1 2 3 Y1952 9 14
9 AL 1953 1953 1954 4 5 6 Y1953 10 15
10 AL 1954 1953 1954 4 5 6 Y1954 11 16
table.trans relies on all combinations of id, year and obs1/2 occurring.
> reshape
function(obj, col.copy = NULL, col.unfold = 1:dim(obj)[2], label = "label",
value = "value")
{
n <- dim(obj)[1]
p <- length(col.unfold)
res <- vector("list", length(col.copy) + 2)
names(res) <- c(if(is.null(col.copy)) NULL else dimnames(obj[,
col.copy,
drop = F])[[2]], label, value)
for(i in seq(along = col.copy)) {
res[[i]] <- rep(obj[, col.copy[i]], p)
}
res[[label]] <- rep(dimnames(obj[, col.unfold, drop = F])[[2]],
rep(n, p))
res[[value]] <- as.vector(data.matrix(obj[, col.unfold, drop = F]))
as.data.frame(res)
}
> table.trans
function(X, INDICES, ...)
{
len.ind <- length(INDICES)
IN1 <- interaction(INDICES[ - len.ind], drop = T)
IN2 <- INDICES[[len.ind]]
a <- tapply(X, data.frame(IN1, IN2), c, ...) # browser()
res <- cbind(INDICES[IN2 == levels(IN2)[1], - len.ind], a)
res
}
Nick Ellis
CSIRO Marine Research mailto:Nick.Ellis@marine.csiro.au
PO Box 120 ph +61 (07) 3826 7260
Cleveland QLD 4163 fax +61 (07) 3826 7222
Australia http://www.marine.csiro.au
> -----Original Message-----
> From: Boylan, Richard [mailto:rboylan@bus.olemiss.edu]
> Sent: Friday, 23 February 2001 15:00
> To: 'Nick Ellis '
> Subject: RE: [S] merging two data sets
>
>
>
> Sorry, the column names of the second data set are unique, just a typo
> in one row that I am corrcting below.
>
> I am completing the example to make it clearer.
> First data set has
>
> AL 1950 1952 1 2 3
> AL 1953 1954 4 5 6
> AK 1950 1954 21 22 23
>
> Second data set has
>
> 1950_obs1 1951_obs1 1952_obs1 1953_obs1 1952_obs1 1950_obs2
> 1951_obs2
> AL 7 8 9 10 11 12 13
> AK 14 15 16 17 18 19 20
>
> Let me do the mergin in two steps.
>
> First one, re-write the first data set as
>
> AL 1950 1 2 3
> AL 1951 1 2 3
> AL 1952 1 2 3
> AL 1953 4 5 6
> AL 1954 4 5 6
> AK 1950 21 22 23
> AK 1951 21 22 23
> AK 1951 21 22 23
> ...
>
> Rewrite the second dat set as
>
> AL 1950 7 12
> AL 1951 8 13
> AL 1952 9 14
> AL 1953 10 ...
> AK 1950 14 19
> AK 1951 15 19
>
> Then, combine the two data sets as follows:
>
> AL 1950 1 2 3 7 12
> AL 1951 1 2 3 8 13
> AL 1952 1 2 3 9 14
> AL 1953 4 5 6 ...
> AL 1954 4 5 6 ..
> AK 1950 21 22 23 14 19
> AK 1951 21 22 23 15 19
> AK 1951 21 22 23
>
>
> -----Original Message-----
> From: Nick Ellis
> To: 'Boylan, Richard'
> Sent: 2/22/01 6:11 PM
> Subject: RE: [S] merging two data sets
>
> I couldn't work out from your example what you wanted. The
> column names
> of
> the second data set are not unique. What happens to data values 10 and
> 11?
>
> Nick Ellis
> CSIRO Marine Research mailto:Nick.Ellis@marine.csiro.au
> PO Box 120 ph +61 (07) 3826 7260
> Cleveland QLD 4163 fax +61 (07) 3826 7222
> Australia http://www.marine.csiro.au
>
>
>
> > -----Original Message-----
> > From: s-news-owner@lists.biostat.wustl.edu
> > [mailto:s-news-owner@lists.biostat.wustl.edu]On Behalf Of Boylan,
> > Richard
> > Sent: Friday, 23 February 2001 5:49
> > To: s-news@lists.biostat.wustl.edu
> > Subject: [S] merging two data sets
> >
> >
> >
> > I need to combine two data sets and would like some
> > suggestions on how to go
> > about doing it.
> > I understand that you may not want to take the time to read
> > through the
> > problem, but if you don't
> > mind just skimming through it to point me towards the general
> > direction
> > (e.g, some people have told
> > me to use SAS instead of Splus for this type of problem).
> >
> > Thanks. Richard
> >
> >
> > What I want to get is a data set in which a row looks as follows:
> >
> > location_i year_j measurement1_{ij} measurement2_{ij}
> >
> > where measurement1_{ij} and measurement2_{ij} are two rows of
> > measurements.
> >
> > In the first data set, a row looks like this:
> >
> > location_i beginning_year end_year measurement1_i
> >
> > In the second data set, a row looks like this
> >
> > location_1 measurement2_{i1} measurement2_{i2} measurement2_{i3} ..
> >
> > Here is an example to make things more understandable.
> >
> > First data set has
> >
> > AL 1950 1952 1 2 3
> > AL 1953 1954 4 5 6
> > AK 1950 1954
> >
> > Second data set has
> >
> > 1950_obs1 1951_obs1 1952_obs1 1952_obs1 1952_obs1
> 1950_obs2
> > 1951_obs2 ..
> > AL 7 8 9 10
> > 11
> > 12 13
> > AK 14 15 16 17
> > 18
> > 19 20
> >
> > The data I want would be
> >
> > AL 1950 1 2 3 7 12
> > AL 1951 1 2 3 8 13
> > AL 1952 1 2 3 9 ...
> > AL 1953 4 5 6 ...
> > AK 1950 ....
> > AK 1951 ...
> > ...
> >
> >
> >
> >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > This message was distributed by s-news@lists.biostat.wustl.edu. To
> > unsubscribe send e-mail to
> s-news-request@lists.biostat.wustl.edu with
> > the BODY of the message: unsubscribe s-news
> >
>
table_trans.d
Description: Binary data
reshape.d
Description: Binary data
|