s-news
[Top] [All Lists]

Re: Applying vector functions to dataframes

To: "'Eric Turkheimer'" <ent3c@virginia.edu>
Subject: Re: Applying vector functions to dataframes
From: "Thomas Jagger" <tjagger@blarg.net>
Date: Tue, 26 Jul 2005 10:47:35 -0600
Cc: <s-news@lists.biostat.wustl.edu>
In-reply-to: <00d801c59157$52864280$6fae8f80@EricDesk>
Thread-index: AcWRV2yXDxdxikE+T3ednn+JRNOaXwAm4Vdw
Good morning. 

I have written many functions that perform similarly. If you can use a
single function on the whole matrix, using matrix mathematics then that is
the fastest. For example the unbiased covariance between the column vectors
for a matrix w is just


 ((t(w)-colMeans(w))%*%w)/(nrow(w)-1)


However, lapply uses for loops, so you might as well use the for loops
explicitly (or try R).

Since you are getting a symmetric matrix I would use a for loop in the
following manner

N<-ncol(x)
out<-array(0,c(N,N))

##Here is my variance function ... as an example for testing
my.fun<-function(x,y) sum((x - mean(x)) * y)/(length(x) - 1)


for(i in 1:N)
{
  for(j in i:N)
  {
   out[i,j]<-my.fun(x[,i],x[,j]) #symmetric function
   out[j,i]<-out[i,j]
  }
}

###You can save a small amount of time, by replacing the second line
x[j,i]<-x[i,j] by

out<-out+t(out)-diag(diag(out)) #or
out[col(out) < row(out)]<-t(out)[col(out)<row(out)] 

#You might also try the following which should call the internal Splus
function "S_matrix_apply" within apply since you have a 2 d matrix.

N <- ncol(x)
out<-array(0,c(N,N))
for(i in 1:(N-1))
{
 out[i,i:N]<-apply(x[,i:N],2,my.fun,y=x[,i])
}
 out[N,N]<-my.fun(x[,N],x[,N])

out<-out+t(out)-diag(diag(out))

###These have been tested on x<- outer(1:10,seq(1,2,.2),"^")
________________________________________
From: s-news-owner@lists.biostat.wustl.edu
[mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Eric Turkheimer
Sent: Monday, July 25, 2005 2:28 PM
To: s-news@lists.biostat.wustl.edu
Subject: [S] Applying vector functions to dataframes

I have a fairly simple function that takes two vectors as arguments with a
scalar result.  I would like to apply it pairwise to all the columns in a
large dataset, resulting in a symmetrical matrix.  (The situation is
analogous to using a function that computes the covariance between two
vectors to produce the covariance matrix of a dataset).  I have written a
function to do this using lapply, but on large (80 vars, 3000 subjects)
datasets it is very very slow and sometimes runs out of memory entirely. 
This must come up all the time.  Is there a standard best practice?
 
Thanks,
Eric
 
Eric Turkheimer, PhD
Department of Psychology
University of Virginia
PO Box 400400
Charlottesville, VA  22904-4400

434-982-4732
434-982-4766 (FAX) 
 


<Prev in Thread] Current Thread [Next in Thread>