s-news
[Top] [All Lists]

Manipulating data problem

To: s-news@lists.biostat.wustl.edu
Subject: Manipulating data problem
From: Eric yang <yang_eric9@yahoo.com>
Date: Sun, 24 Sep 2006 09:40:16 -0700 (PDT)
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=RPr/yxar21bgCQKPZLcTg3ApUXCWq/TqXA9r6Pi1WFDxDsayRwZdFDYDrRSfhVbDLn5Z75cntnlbgLawoxITU4mgCdgs2pFtaM8H5NlcBwvCTZ7CS6KUYxD+iH3H/Vjg79RU4PAhhXV+ZmVLX4Bvdoqj12ulDMty8PmfKni9z6E= ;
 
Hi all,
 
Here is an example of the data manipulation problem that i'm experiencing. Suppose I have a data set:
 
my.data <- structure(c(sample(500, 20, T), sample(5, 20, T), sample(1000:5000, 20, T), sample(1000:5000, 20, T), sample(800:3500, 20, T), sample(2000:10000, 20, T)), dim=c(20,6), dimnames=list(NULL, c("TYPE", "Year", paste("Loss", LETTERS[1:4]))))
 
> my.data
    TYPE Year Loss A Loss B Loss C Loss D
[1,] 162 5 2180 4631 1533 9596
[2,] 360 4 4689 3040 2706 9045
[3,] 362 3 3767 2739 2325 4639
[4,] 137 2 4697 2054 2018 3938
[5,] 220 3 4390 4391 3169 3388
[6,] 298 2 2793 4300 2010 8702
[7,] 467 1 1328 4764 3320 3903
[8,] 24 2 2761 4735 2604 4251
[9,] 479 4 4772 3333 1883 4448
[10,] 24 2 2021 2151 1637 8537
[11,] 9 5 1554 4660 2899 4891
[12,] 333 2 4429 3068 1836 2320
[13,] 327 1 1489 3004 2729 9077
[14,] 9 5 3570 3751 959 6296
[15,] 2 2 2183 2405 2568 5580
[16,] 243 4 4559 3693 3374 8831
[17,] 231 4 1603 1769 1704 9543
[18,] 88 3 1152 3538 1854 5154
[19,] 161 3 4426 2494 2125 2995
[20,] 326 5 4251 3785 2285 5445
I would like to put all this information into an array for the unique TYPE (length 18) and Year running from 1 to 5 (length 5) , with Loss A - Loss D (length 4). In essence, for a given TYPE I want an array containing information like:
, , TYPE 2
         Loss A Loss B Loss C Loss D
Year 1 NA NA NA NA
Year 2 2183 2405 2568 5580
Year 3 NA NA NA NA
Year 4 NA NA NA NA
Year 5 NA NA NA NA
 
, , TYPE 9
         Loss A Loss B Loss C Loss D
Year 1 NA NA NA NA
Year 2 NA NA NA NA
Year 3 NA NA NA NA
Year 4 NA NA NA NA
Year 5 5124 8411 3858 11187
 
Note: Year 5 = (1554 4660 2899 4891) + (3570 3751 959 6296)
etc...
 
Finding all the unique TYPEs,
 
no <- sort(unique(my.data[,1]))
 
Creating an array
 
my.array <- structure(rep.int(NA, 5*4*length(no)), dim=c(5, 4, length(no)), dimnames=list(paste("Year", 1:5), paste("Loss", LETTERS[1:4]), paste("TYPE", no)))
 
Aggregating the table information:
y <- do.call("rbind", lapply(split(my.data, paste(my.data[,1], my.data[,2],sep="")), function(x){
y <- structure(x, dim=c(length(x)/6,6))
colMeans(y)
}))
 
Matching up the unique sorted Type
x <- y[match(no, y[,1]),]
 
I now want to put the information into the array, but i'm not sure how to do this, probably something like:
 
my.array[x????] <- y[,3:6]
 
Does anyone know how to do this?
 
Also, the actual data set that i'm performing this on is very large, so any tips on speeding up the code are greatly appreciated. Thanks in advance for any help.
 
Best,
Eric
 
 
 


Stay in the know. Pulse on the new Yahoo.com. Check it out.
<Prev in Thread] Current Thread [Next in Thread>
  • Manipulating data problem, Eric yang <=