s-news
[Top] [All Lists]

Re: Approx() function in for loop, S vs R

To: Patrick Burns <pburns@pburns.seanet.com>
Subject: Re: Approx() function in for loop, S vs R
From: "Fowler, Mark" <FowlerM@mar.dfo-mpo.gc.ca>
Date: Fri, 13 Jun 2008 16:05:08 -0300
Cc: s-news@lists.biostat.wustl.edu
In-reply-to: <48529A34.2000305@pburns.seanet.com>
Thread-index: AcjNbvdh5DG3UYdiSxKrNj0gmoJw6wADn7Gg
Thread-topic: [S] Approx() function in for loop, S vs R
 Patrick,

        Thank you, problem solved. Had nothing to do with approx() or
any other explicitly named function, it was subsetting dataframes to
build the z matrix. And anything in the loop can be done with matrices.
I used to know better.

        Rprof was how I discovered the problem of dataframe indexing. As
the code wasn't invoking data.frame() explicitly , I could only figure
to use interlude in S to confirm that identifiable functions were not
issues. Is there a way I might have tackled this from S alone? Interlude
is more convenient than Rprof for monitoring known suspects, but the
totality of Rprof well serves the clueless.



>       Mark Fowler
                Population Ecology Division
>       Bedford Inst of Oceanography
>       Dept Fisheries & Oceans
>       Dartmouth NS Canada
                B2Y 4A2
                Tel. (902) 426-3529
                Fax (902) 426-9710
                Email fowlerm@mar.dfo-mpo.gc.ca
                Home Tel. (902) 461-0708
                Home Email mark.fowler@ns.sympatico.ca


-----Original Message-----
From: Patrick Burns [mailto:pburns@pburns.seanet.com] 
Sent: June 13, 2008 1:03 PM
To: Fowler, Mark
Subject: Re: [S] Approx() function in for loop, S vs R

R is, in general, faster -- you might just be seeing the general
difference in speed.

But you can profile and see.  In R you can use Rprof (spelling?).  In
S-PLUS you can use 'interlude' from S Poetry.  I'm not sure if
'interlude' also works in R or not.


Patrick Burns
patrick@burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

Fowler, Mark wrote:
>
> Hello,
>         I'm working on a function that does image plots of 
> temperature-at-depth over time, a version for each of S (7.0) and R 
> (2.7.0). It uses vertical interpolation of temperature-depth per day 
> (100 points), hence the approx() function. It only uses the interp() 
> method to fill in blanks horizontally according to custom (and
> user-defined) rules concerning the maximum timespan. The for loop 
> extracts variable numbers of records per day (observed temperature and
> depth) for each vertical daily interpolation. To resolve a 53-day time

> series the R loop takes about .5 minutes whereas the S loop takes 
> about 2.5 minutes. I suspect the difference is due to the approx() 
> function. Same name and similar syntax in S and R, but different code.
> The R approx() calls a C subroutine while the S approx() calls a 
> Fortran subroutine. I'm hoping for some ideas to speed things up, 
> especially with respect to the S version. Possibly the for loop itself

> is the issue, but that part is identical in R and S. Any insights 
> would be greatly appreciated. Below I excerpt the relevant pieces of 
> code in S and R.
>
> #R version
> dailyvec<-trunc(timeDate(as.character(PATdata$DateTime),format="%d-%b-
> %Y
> %H:%M"),units=c("days"))
> aaa<-data.frame(dailyvec=as.character(dailyvec),sc=rep(1,length(dailyv
> ec)))
>
> #Number of obs per day
> bbb<-aggregate(aaa$sc,list(aaa$dailyvec),FUN=sum)
> zmat<-data.frame(DateTime=rep(as.character(trunc(timeDate(),units=c("d
> ays"))),100*dim(bbb)[1]),Temperature=rep(NA,100*dim(bbb)[1]),Depth=rep
> (NA,100*dim(bbb)[1]))
>
> recstart<-1
> matstart<-1
> firstday<-1
> for (i in 1:dim(bbb)[1]) {
> chunk<-PATdata[recstart:(recstart+bbb$x[i]-1),]
> zmat$DateTime[matstart:(matstart+99)]<-rep(as.character(timeDate(as.ch
> aracter(chunk$DateTime[1]))),100)
>
> if(dim(chunk)[1]>1) {
> options(warn=(-1))
> oneday<-approx(chunk$NDepth, chunk$MeanTemperature,n=100)
> options(warn=0)
> zmat$Depth[matstart:(matstart+99)]<-oneday$x[1:100]
> zmat$Temperature[matstart:(matstart+99)]<-oneday$y[1:100]
> }
> recstart<-recstart+bbb$x[i]
> matstart<-matstart+100
> }
>
> #S version
> dailyvec<-timeDate(as.character(PATdata$DateTime),format="%Y %b %d")
> aaa<-data.frame(dailyvec,rep(1,length(dailyvec)))
> bbb<-aggregate(aaa,list(dailyvec),FUN=sum)
> options(warn=(-1))
> zmat<-data.frame(DateTime=rep(timeDate(),100*dim(bbb)[1]),Temperature=
> rep(NA,100*dim(bbb)[1]),Depth=rep(NA,100*dim(bbb)[1]))
>
> options(warn=0)
> recstart<-1
> matstart<-1
> firstday<-1
> for (i in 1:dim(bbb)[1]) {
> chunk<-PATdata[recstart:(recstart+bbb$X2[i]-1),]
> zmat$DateTime[matstart:(matstart+99)]<-rep(timeDate(as.character(chunk
> $DateTime[1]),format="%Y
> %b %d"),100)
> if(dim(chunk)[1]>1) {
> oneday<-approx(chunk$NDepth, chunk$MeanTemperature,n=100) 
> zmat$Depth[matstart:(matstart+99)]<-oneday$x[1:100]
> zmat$Temperature[matstart:(matstart+99)]<-oneday$y[1:100]
> }
> recstart<-recstart+bbb$X2[i]
> matstart<-matstart+100
> }
>
>             Mark Fowler
>             Population Ecology Division
>             Bedford Inst of Oceanography
>             Dept Fisheries & Oceans
>             Dartmouth NS Canada
>             B2Y 4A2
>             Tel. (902) 426-3529
>             Fax (902) 426-9710
>             Email fowlerm@mar.dfo-mpo.gc.ca
>             Home Tel. (902) 461-0708
>             Home Email mark.fowler@ns.sympatico.ca
>
>

<Prev in Thread] Current Thread [Next in Thread>