s-news
[Top] [All Lists]

Re: really large files - import vs. use

To: 'Eva Goldwater' <goldwater@schoolph.umass.edu>, Chushu Gu <chushugu@hotmail.com>
Subject: Re: really large files - import vs. use
From: "Bos, Roger" <BosR@ny.rothinc.com>
Date: Thu, 16 Dec 2004 09:47:40 -0500
Cc: s-news@lists.biostat.wustl.edu
A lot of people have given me suggestions on splitting up the data and
recombining it in SPLUS.  That is what I have done before in both S+ and R.
I am also somewhat puzzled in why I can't import a large file into S+, but
if I import many small files and combine them, I can run a regression using
lm or apply any number of functions to the data with no problem and its just
as fast as with small files.

I am going to try to import the file into SQL Server and then use the SQL
ability to read in the columns I need and work it that way.  Thanks everyone
for your suggestions.

Roger


-----Original Message-----
From: Eva Goldwater [mailto:goldwater@schoolph.umass.edu]
Sent: Thursday, December 16, 2004 9:33 AM
To: Chushu Gu
Cc: s-news@lists.biostat.wustl.edu
Subject: [S] really large files - import vs. use


Hello

As a long-time SAS user but S-Plus newbie, I am quite puzzled by this
solution, as it implies that S-Plus has no problem WORKING with very large
files, but only with the import process.  Is that correct, or am I missing
something???

Eva Goldwater                           email: goldwater@schoolph.umass.edu
Biostatistics Consulting                Phone: (413) 545-2949
418 Arnold House                        Fax:   (413) 545-1645
715 North Pleasant Street
University of Massachusetts
Amherst, MA 01003-9304

On Wed, 15 Dec 2004, Chushu Gu wrote:

> My reccomendation:
>
> Using SAS to seperate the files. But you need to know how large the file
> Splus can handle (How many records in source file).
> Import all the files as SAS data sets.When all files imported, rbind will
> finish the work.
>
> I used to process a large file in this way.
>
> Code in SAS:
> Assume only 10000 can be imported in Splus.
>
> data temp1;
> infile 'c:\largefile' firstobs=1 obs=10000;
> input a b $ c;
> run;
> data temp2;
> infile 'c:\largefile' firstobs=10001 obs=20000;
> input a b $ c;
> run;
> ..
>
> If you are an SAS expert, a simple macro would do the trick.
>
> Then you got all the data sets temp1, temp2, ...
> Import them directly by Splus, the file name for these dataset maybe
> temp1.sas7bdat,temp2.sas7bdat ...
>
>
> Hope this helps,
>
> Chushu Gu
>
>
> ----- Original Message -----
> From: "Bos, Roger" <BosR@ny.rothinc.com>
> To: <s-news@lists.biostat.wustl.edu>
> Sent: Tuesday, December 14, 2004 10:45 AM
> Subject: [S] help importing really large files
>
>
> > Has anyone found a trick to importing really large txt files into S+ 6.2
> > under XP?  I sent the question to Insightful and their only
recommendation
> > was to break it up into smaller files.  The file is 350 megs, which is
> large
> > I grant, but my machine has 4 gigs of memory.  If I did want to break it
> up,
> > what utility could I use to do so?  Excel is not going to read it
either.
> > See below for my full question and support's answer.  Thanks in advance.
> >
> >
> > I get the "unable to obtain requested dynamic memory" error when I try
to
> > read in a large file into S+ 6.2 using the following command:
> >
> > data <-
> >
>
read.table("M:\\tina\\R2000V10SPLS29m.TXT",header=TRUE,sep=",",as.is=TRUE,na
> > strings="NA")
> > dim(data)
> >
> > The text file is 347,456 KB big.  My windows XP machine has 4 Gigs of
> > memory, which I believe is the max it can handle.  I also believe that
my
> > virtual memory is maxed out.  I read the FAQ on this topic, but it
mostly
> > said to optimize the code and I am just trying to read it in.  I
> understand
> > that the operating system steals half of this.  Do I need to change any
> > setting to make sure S+ is fully utilizing my memory capabilities?
> Anything
> > else I can try?
> >
> >
--------------------------------------------------------------------------
> --
> > -----------
> > Solution:
> >
> > The file you are trying to import is a very large file. The calculation
we
> > use to calculate the size of the data you are trying to import is:
> >
> > (rows)*(columns)*8*4.5
> >
> > You should import the file by breaking it into smaller files. Then
import
> > these smaller files into S-Plus and finally, recombine them inside
S-Plus.
> >
> >
> >
> > Please let me know if you have any questions.
> >
> > Sincerely,
> >
> > Jacob Geballe
> >
> >
>
===========================================================================
> >  Jacob Geballe                       email: support@insightful.com
> >  Technical Support Engineer            FAX: (206) 283-8691
> >  Insightful Corporation                Phone: (206) 283-8802 ext.235
> >  www.insightful.com                          1-800-569-0123 ext.235
> >
>
===========================================================================
> >
> > Roger J. Bos, CFA
> > Rothschild Asset Management
> > 1251 Avenue of the Americas
> > New York, NY  10020
> > 212-403-5471
> >
> >
> > ********************************************************************** *
> This message is for the named person's use only. It may
> > contain confidential, proprietary or legally privileged
> > information. No right to confidential or privileged treatment
> > of this message is waived or lost by any error in
> > transmission. If you have received this message in error,
> > please immediately notify the sender by e-mail,
> > delete the message and all copies from your system and destroy
> > any hard copies. You must not, directly or indirectly, use,
> > disclose, distribute, print or copy any part of this message
> > if you are not the intended recipient.
> > **********************************************************************
> > --------------------------------------------------------------------
> > This message was distributed by s-news@lists.biostat.wustl.edu.  To
> > unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> > the BODY of the message:  unsubscribe s-news
> >
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news
>

--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

********************************************************************** * This 
message is for the named person's use only. It may 
contain confidential, proprietary or legally privileged 
information. No right to confidential or privileged treatment 
of this message is waived or lost by any error in 
transmission. If you have received this message in error, 
please immediately notify the sender by e-mail, 
delete the message and all copies from your system and destroy 
any hard copies. You must not, directly or indirectly, use, 
disclose, distribute, print or copy any part of this message 
if you are not the intended recipient. 
**********************************************************************

<Prev in Thread] Current Thread [Next in Thread>