On Wed, 19 Dec 2001, Stephen Tallon wrote:
> Hi,
>
> S+6 rel 2, win2k, 256MB
>
> I have large text files ~2GB (and in principle of indefinite size)
> consisting of sequential blocks of data of around 6000 points each. My
> system won't load a file this size (and no system could load an arbitrarily
> large file), but the calculations I wish to do only relate to each block of
> data and are not calculated across blocks of data, so in principle the
> calculations could be done in sequence. I was hopeful when reading that s+6
> could use a map to the file instead of making a copy in memory of the data,
> but it appears there is still an initial surge in memory use to set it up,
> so it does not help me. The scan function appears to have improved under s+6
> and the skip will actually skip data without consuming memory, but it takes
> an increasingly long time to scan through from the beginning to later parts
> of the file using skip, and in the long run is not a feasible solution for
> me either.
>
> If there is a simple way to load these blocks of data in please let me know.
> If there isn't, it would be nice if future s+ could make use of a file
> position pointer that could be used across multiple calls to scan, so that
> it can jump straight back to where it finished the previous time.
That's what connections are for (or at least, a part of it). Use scan()
with an explicitly opened file connection to read chunks at a time.
Something like
zz <- file("foo.dat", "r")
repeat{
z <- scan(xx, what=something, n=6000)
if(!length(z)) break
process(z)
}
close(zz)
S-PLUS 6.0 also has openData and readNextDataRows to read a dataset in
chunks.
Connections are an unappreciated resource of S4, and not very well
documented, so it took me a while to cotton on. The best I can suggest is
to read the chapter in the Green Book (JMC, 1998) several times.
I find I use them all the time.
--
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
|