Hi everyone,
I'm not a formally trained statistician, but I am a heavy user of s-plus for
biological applications. This is a question about microarrays, but it is a
general question of analyzing large data sets. If you have a matrix with 25
columns and 12500 rows, how would you correlate which rows are most similar to
each other?
I guess in this instance we have 12,500 variables that represent genes. What
types of algorithms or ideas would be best to tackle this analysis? I have
tried some basic things such as creating correlation matrices based between
rows.
These microarrays values are usually logged and then assumed to behave under a
normal distribution, but I have found that the raw values of these microarrays
behave more similarly to power law distributions.
does anyone believe in one method more than another? such as hierarchial
cluster analysis, PCAs, SOMs?
any ideas would be appreciated..
richard park
|