s-news
[Top] [All Lists]

large data set analysis?

To: <s-news@lists.biostat.wustl.edu>
Subject: large data set analysis?
From: "Park, Richard" <Richard.Park@joslin.harvard.edu>
Date: Thu, 22 Jan 2004 12:49:59 -0500
Thread-index: AcPhC9VAPYEDsjLuRoiFIjKxTrHNYgAAzRGw
Thread-topic: [S] testing equality of pairwise correlation across subsamples
Hi everyone, 
I'm not a formally trained statistician, but I am a heavy user of s-plus for 
biological applications. This is a question about microarrays, but it is a 
general question of analyzing large data sets. If you have a matrix with 25 
columns and 12500 rows, how would you correlate which rows are most similar to 
each other? 

I guess in this instance we have 12,500 variables that represent genes. What 
types of algorithms or ideas would be best to tackle this analysis? I have 
tried some basic things such as creating correlation matrices based between 
rows. 

These microarrays values are usually logged and then assumed to behave under a 
normal distribution, but I have found that the raw values of these microarrays 
behave more similarly to power law distributions. 

does anyone believe in one method more than another? such as hierarchial 
cluster analysis, PCAs, SOMs? 

any ideas would be appreciated.. 

richard park 

<Prev in Thread] Current Thread [Next in Thread>
  • large data set analysis?, Park, Richard <=