Patrick,
Thank you, problem solved. Had nothing to do with approx() or
any other explicitly named function, it was subsetting dataframes to
build the z matrix. And anything in the loop can be done with matrices.
I used to know better.
Rprof was how I discovered the problem of dataframe indexing. As
the code wasn't invoking data.frame() explicitly , I could only figure
to use interlude in S to confirm that identifiable functions were not
issues. Is there a way I might have tackled this from S alone? Interlude
is more convenient than Rprof for monitoring known suspects, but the
totality of Rprof well serves the clueless.
> Mark Fowler
Population Ecology Division
> Bedford Inst of Oceanography
> Dept Fisheries & Oceans
> Dartmouth NS Canada
B2Y 4A2
Tel. (902) 426-3529
Fax (902) 426-9710
Email fowlerm@mar.dfo-mpo.gc.ca
Home Tel. (902) 461-0708
Home Email mark.fowler@ns.sympatico.ca
-----Original Message-----
From: Patrick Burns [mailto:pburns@pburns.seanet.com]
Sent: June 13, 2008 1:03 PM
To: Fowler, Mark
Subject: Re: [S] Approx() function in for loop, S vs R
R is, in general, faster -- you might just be seeing the general
difference in speed.
But you can profile and see. In R you can use Rprof (spelling?). In
S-PLUS you can use 'interlude' from S Poetry. I'm not sure if
'interlude' also works in R or not.
Patrick Burns
patrick@burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")
Fowler, Mark wrote:
>
> Hello,
> I'm working on a function that does image plots of
> temperature-at-depth over time, a version for each of S (7.0) and R
> (2.7.0). It uses vertical interpolation of temperature-depth per day
> (100 points), hence the approx() function. It only uses the interp()
> method to fill in blanks horizontally according to custom (and
> user-defined) rules concerning the maximum timespan. The for loop
> extracts variable numbers of records per day (observed temperature and
> depth) for each vertical daily interpolation. To resolve a 53-day time
> series the R loop takes about .5 minutes whereas the S loop takes
> about 2.5 minutes. I suspect the difference is due to the approx()
> function. Same name and similar syntax in S and R, but different code.
> The R approx() calls a C subroutine while the S approx() calls a
> Fortran subroutine. I'm hoping for some ideas to speed things up,
> especially with respect to the S version. Possibly the for loop itself
> is the issue, but that part is identical in R and S. Any insights
> would be greatly appreciated. Below I excerpt the relevant pieces of
> code in S and R.
>
> #R version
> dailyvec<-trunc(timeDate(as.character(PATdata$DateTime),format="%d-%b-
> %Y
> %H:%M"),units=c("days"))
> aaa<-data.frame(dailyvec=as.character(dailyvec),sc=rep(1,length(dailyv
> ec)))
>
> #Number of obs per day
> bbb<-aggregate(aaa$sc,list(aaa$dailyvec),FUN=sum)
> zmat<-data.frame(DateTime=rep(as.character(trunc(timeDate(),units=c("d
> ays"))),100*dim(bbb)[1]),Temperature=rep(NA,100*dim(bbb)[1]),Depth=rep
> (NA,100*dim(bbb)[1]))
>
> recstart<-1
> matstart<-1
> firstday<-1
> for (i in 1:dim(bbb)[1]) {
> chunk<-PATdata[recstart:(recstart+bbb$x[i]-1),]
> zmat$DateTime[matstart:(matstart+99)]<-rep(as.character(timeDate(as.ch
> aracter(chunk$DateTime[1]))),100)
>
> if(dim(chunk)[1]>1) {
> options(warn=(-1))
> oneday<-approx(chunk$NDepth, chunk$MeanTemperature,n=100)
> options(warn=0)
> zmat$Depth[matstart:(matstart+99)]<-oneday$x[1:100]
> zmat$Temperature[matstart:(matstart+99)]<-oneday$y[1:100]
> }
> recstart<-recstart+bbb$x[i]
> matstart<-matstart+100
> }
>
> #S version
> dailyvec<-timeDate(as.character(PATdata$DateTime),format="%Y %b %d")
> aaa<-data.frame(dailyvec,rep(1,length(dailyvec)))
> bbb<-aggregate(aaa,list(dailyvec),FUN=sum)
> options(warn=(-1))
> zmat<-data.frame(DateTime=rep(timeDate(),100*dim(bbb)[1]),Temperature=
> rep(NA,100*dim(bbb)[1]),Depth=rep(NA,100*dim(bbb)[1]))
>
> options(warn=0)
> recstart<-1
> matstart<-1
> firstday<-1
> for (i in 1:dim(bbb)[1]) {
> chunk<-PATdata[recstart:(recstart+bbb$X2[i]-1),]
> zmat$DateTime[matstart:(matstart+99)]<-rep(timeDate(as.character(chunk
> $DateTime[1]),format="%Y
> %b %d"),100)
> if(dim(chunk)[1]>1) {
> oneday<-approx(chunk$NDepth, chunk$MeanTemperature,n=100)
> zmat$Depth[matstart:(matstart+99)]<-oneday$x[1:100]
> zmat$Temperature[matstart:(matstart+99)]<-oneday$y[1:100]
> }
> recstart<-recstart+bbb$X2[i]
> matstart<-matstart+100
> }
>
> Mark Fowler
> Population Ecology Division
> Bedford Inst of Oceanography
> Dept Fisheries & Oceans
> Dartmouth NS Canada
> B2Y 4A2
> Tel. (902) 426-3529
> Fax (902) 426-9710
> Email fowlerm@mar.dfo-mpo.gc.ca
> Home Tel. (902) 461-0708
> Home Email mark.fowler@ns.sympatico.ca
>
>
|