s-news
[Top] [All Lists]

Re: Tips for Faster Import

To: "'Dustin Hux'" <dustin@datamininglab.com>, <s-news@lists.biostat.wustl.edu>
Subject: Re: Tips for Faster Import
From: "liao" <liao@idminer.com.tw>
Date: Thu, 30 Oct 2003 10:44:24 +0800
Importance: Normal
In-reply-to: <000f01c39e4d$0946a140$6601a8c0@dustinomnibook>
Organization: Intelligent Data Miner Inc.
Reply-to: <liao@idminer.com.tw>
Hi,

I use S-PLUS 6.1 for Windows. To read large csv/txt data set, I have tried
several ways including importData(), openData(), read.table(), scan().
My experience is that the fastest way to read data is using scan().

I have read a csv file about 1.1M rows and 4 columns. S-PLUS commands like
following
-----------------------------------
date()
Your.function.name<-scan("your_data.csv",sep=",")
Your.function.name<-matrix(Your.function.name,ncol=4,byrow=T)
date()
-----------------------------------
Some comments

1. I use Notebook- P4 1.7G, 512MB memory to read these data set into S-PLUS
in about 15 seconds.

2. But the read data are all in character type, and then I use function like
as.integer(), as.factor(),... to change data type of each column to data
type I want.

I hope this help.

Best Regards,
 
Liao

-----Original Message-----
From: s-news-owner@lists.biostat.wustl.edu
[mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Dustin Hux
Sent: Thursday, October 30, 2003 2:47 AM
To: s-news@lists.biostat.wustl.edu
Subject: [S] Tips for Faster Import


I have a tab-delimited file that is ~160MB (~1.5 million rows x 13 columns).
Any tips on reading files of this size in faster would be appreciated?  It
takes about 5 minutes to read it using read.delim in R.

Thank you,

Dustin


--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with the
BODY of the message:  unsubscribe s-news



<Prev in Thread] Current Thread [Next in Thread>