s-news
[Top] [All Lists]

Re: really large files - import vs. use

To: "Eva Goldwater" <goldwater@schoolph.umass.edu>
Subject: Re: really large files - import vs. use
From: "Chushu Gu" <chushugu@hotmail.com>
Date: Thu, 16 Dec 2004 12:03:22 -0500
Cc: <s-news@lists.biostat.wustl.edu>
References: <E07964B84690CC47B01421B2A71D4F0F095F4ACA@rinnycs0000> <BAY102-DAV68C69D43FD9FE74F1F8A8CDAE0@phx.gbl> <Pine.GSO.4.55.0412160922490.15981@shell1.oit.umass.edu>
Yes, just import process has some limit.

----- Original Message ----- 
From: "Eva Goldwater" <goldwater@schoolph.umass.edu>
To: "Chushu Gu" <chushugu@hotmail.com>
Cc: <s-news@lists.biostat.wustl.edu>
Sent: Thursday, December 16, 2004 9:33 AM
Subject: [S] really large files - import vs. use


> Hello
>
> As a long-time SAS user but S-Plus newbie, I am quite puzzled by this
> solution, as it implies that S-Plus has no problem WORKING with very large
> files, but only with the import process.  Is that correct, or am I missing
> something???
>
> Eva Goldwater email: goldwater@schoolph.umass.edu
> Biostatistics Consulting Phone: (413) 545-2949
> 418 Arnold House Fax:   (413) 545-1645
> 715 North Pleasant Street
> University of Massachusetts
> Amherst, MA 01003-9304
>
> On Wed, 15 Dec 2004, Chushu Gu wrote:
>
> > My reccomendation:
> >
> > Using SAS to seperate the files. But you need to know how large the file
> > Splus can handle (How many records in source file).
> > Import all the files as SAS data sets.When all files imported, rbind
will
> > finish the work.
> >
> > I used to process a large file in this way.
> >
> > Code in SAS:
> > Assume only 10000 can be imported in Splus.
> >
> > data temp1;
> > infile 'c:\largefile' firstobs=1 obs=10000;
> > input a b $ c;
> > run;
> > data temp2;
> > infile 'c:\largefile' firstobs=10001 obs=20000;
> > input a b $ c;
> > run;
> > ..
> >
> > If you are an SAS expert, a simple macro would do the trick.
> >
> > Then you got all the data sets temp1, temp2, ...
> > Import them directly by Splus, the file name for these dataset maybe
> > temp1.sas7bdat,temp2.sas7bdat ...
> >
> >
> > Hope this helps,
> >
> > Chushu Gu
> >
> >
> > ----- Original Message -----
> > From: "Bos, Roger" <BosR@ny.rothinc.com>
> > To: <s-news@lists.biostat.wustl.edu>
> > Sent: Tuesday, December 14, 2004 10:45 AM
> > Subject: [S] help importing really large files
> >
> >
> > > Has anyone found a trick to importing really large txt files into S+
6.2
> > > under XP?  I sent the question to Insightful and their only
recommendation
> > > was to break it up into smaller files.  The file is 350 megs, which is
> > large
> > > I grant, but my machine has 4 gigs of memory.  If I did want to break
it
> > up,
> > > what utility could I use to do so?  Excel is not going to read it
either.
> > > See below for my full question and support's answer.  Thanks in
advance.
> > >
> > >
> > > I get the "unable to obtain requested dynamic memory" error when I try
to
> > > read in a large file into S+ 6.2 using the following command:
> > >
> > > data <-
> > >
> >
read.table("M:\\tina\\R2000V10SPLS29m.TXT",header=TRUE,sep=",",as.is=TRUE,na
> > > strings="NA")
> > > dim(data)
> > >
> > > The text file is 347,456 KB big.  My windows XP machine has 4 Gigs of
> > > memory, which I believe is the max it can handle.  I also believe that
my
> > > virtual memory is maxed out.  I read the FAQ on this topic, but it
mostly
> > > said to optimize the code and I am just trying to read it in.  I
> > understand
> > > that the operating system steals half of this.  Do I need to change
any
> > > setting to make sure S+ is fully utilizing my memory capabilities?
> > Anything
> > > else I can try?
> > >
> >
> --------------------------------------------------------------------------
> > --
> > > -----------
> > > Solution:
> > >
> > > The file you are trying to import is a very large file. The
calculation we
> > > use to calculate the size of the data you are trying to import is:
> > >
> > > (rows)*(columns)*8*4.5
> > >
> > > You should import the file by breaking it into smaller files. Then
import
> > > these smaller files into S-Plus and finally, recombine them inside
S-Plus.
> > >
> > >
> > >
> > > Please let me know if you have any questions.
> > >
> > > Sincerely,
> > >
> > > Jacob Geballe
> > >
> > >
> >
===========================================================================
> > >  Jacob Geballe                       email: support@insightful.com
> > >  Technical Support Engineer            FAX: (206) 283-8691
> > >  Insightful Corporation                Phone: (206) 283-8802 ext.235
> > >  www.insightful.com                          1-800-569-0123 ext.235
> > >
> >
===========================================================================
> > >
> > > Roger J. Bos, CFA
> > > Rothschild Asset Management
> > > 1251 Avenue of the Americas
> > > New York, NY  10020
> > > 212-403-5471
> > >
> > >
> > > **********************************************************************
*
> > This message is for the named person's use only. It may
> > > contain confidential, proprietary or legally privileged
> > > information. No right to confidential or privileged treatment
> > > of this message is waived or lost by any error in
> > > transmission. If you have received this message in error,
> > > please immediately notify the sender by e-mail,
> > > delete the message and all copies from your system and destroy
> > > any hard copies. You must not, directly or indirectly, use,
> > > disclose, distribute, print or copy any part of this message
> > > if you are not the intended recipient.
> > > **********************************************************************
> > > --------------------------------------------------------------------
> > > This message was distributed by s-news@lists.biostat.wustl.edu.  To
> > > unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> > > the BODY of the message:  unsubscribe s-news
> > >
> > --------------------------------------------------------------------
> > This message was distributed by s-news@lists.biostat.wustl.edu.  To
> > unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> > the BODY of the message:  unsubscribe s-news
> >
>
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news
>

<Prev in Thread] Current Thread [Next in Thread>