Thanks to Steve Friedman (who suggests using read.table), Steven Kaluzny and
Henrik Aalborg Nielsen for their quick replies. See their responses below.
-------------------------------------------------------------------------
You need to set colNameRow=0 to prevent importData from trying to search
for a row in the file to use as column names. E.g.
importData("xyz.txt", colNameRow=0)
You can use the colNames argument to set the column names in the
resulting data frame if you don't like the default Col1, Col2, ...
> While I'm on the subject, does anyone have views on what is the most
> efficient method of reading in large text files.Typically I need to read
in
> several large text files which I then rbind in S+ but I have found
problems
> with memory doing this (admittedly this was with S+2000). Are there
> advantages to read.table or openData?
You may want to look at openData and readNextDataRows. This lets you
read in a large data file in chunks.
Cheers,
Stephen Kaluzny
--------------------------------------------------------------------------
This is mainly a comment on efficiency. Especially, when the data in the
text file are all of the same type and when you know the number of
columns, then a combination of scan() and matrix() seems quite efficient.
Something like (please check the details before use):
data <- matrix(scan(file="data.txt", sep=",", what=0), ncol=10, byrow=T)
BTW importData generally tries to do something sensible depending on
the precise details of your data. When it works it's great, but
sometimes it results in incorrect imports. I've found that the
function is very reluctant in issuing warnings. The function has tons
of options; it may be possible to solve your problem by close
inspection of the help-page.
Regards,
Henrik
---------------------------------------------------------------------------
Regards,
Glenn
-----Original Message-----
From: Glenn.Treacy@ILIM.COM [mailto:Glenn.Treacy@ILIM.COM]
Sent: 14 April 2004 17:53
To: s-news@lists.biostat.wustl.edu
Subject: Re: [S] problems with importData
Hi,
I'm using S+6.1 on NT4 and I am using importData to read in large tab
delimited text files. I have found that if one of my columns starts with the
value NA then that row is dropped. It appears to try to interpret that row
as column names. If the text file has column names then it reads in
perfectly. Does anyone have an idea what I need to do to read in headerless
text files without the first row being dropped?
While I'm on the subject, does anyone have views on what is the most
efficient method of reading in large text files.Typically I need to read in
several large text files which I then rbind in S+ but I have found problems
with memory doing this (admittedly this was with S+2000). Are there
advantages to read.table or openData?
Regards,
Glenn Treacy
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote also confirms that this email message has been swept
for the presence of computer viruses.
**********************************************************************
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news
|