s-news
[Top] [All Lists]

Re: Tips for Faster Import

To: liao <liao@idminer.com.tw>
Subject: Re: Tips for Faster Import
From: Prof Brian Ripley <ripley@stats.ox.ac.uk>
Date: Thu, 30 Oct 2003 06:10:24 +0000 (GMT)
Cc: "'Dustin Hux'" <dustin@datamininglab.com>, <s-news@lists.biostat.wustl.edu>
In-reply-to: <002301c39e8f$beb1c860$a801a8c0@idminer.com.tw>
On Thu, 30 Oct 2003, liao wrote:

> I use S-PLUS 6.1 for Windows. To read large csv/txt data set, I have tried
> several ways including importData(), openData(), read.table(), scan().
> My experience is that the fastest way to read data is using scan().
> 
> I have read a csv file about 1.1M rows and 4 columns. S-PLUS commands like
> following
> -----------------------------------
> date()
> Your.function.name<-scan("your_data.csv",sep=",")
> Your.function.name<-matrix(Your.function.name,ncol=4,byrow=T)
> date()
> -----------------------------------
> Some comments
> 
> 1. I use Notebook- P4 1.7G, 512MB memory to read these data set into S-PLUS
> in about 15 seconds.
> 
> 2. But the read data are all in character type, and then I use function like
> as.integer(), as.factor(),... to change data type of each column to data
> type I want.

Try the what= arg to scan(): it should be a bit faster and give you a 
data frame directly.  And do specify n, to get storage preallocated.

The question was about R, though, which has hints on the help page for 
read.delim and in its Data Import/Export manual.

> 
> I hope this help.
> 
> Best Regards,
>  
> Liao
> 
> -----Original Message-----
> From: s-news-owner@lists.biostat.wustl.edu
> [mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Dustin Hux
> Sent: Thursday, October 30, 2003 2:47 AM
> To: s-news@lists.biostat.wustl.edu
> Subject: [S] Tips for Faster Import
> 
> 
> I have a tab-delimited file that is ~160MB (~1.5 million rows x 13 columns).
> Any tips on reading files of this size in faster would be appreciated?  It
> takes about 5 minutes to read it using read.delim in R.
> 
> Thank you,
> 
> Dustin

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


<Prev in Thread] Current Thread [Next in Thread>