s-news
[Top] [All Lists]

Re: Large data sets

To: "jose Bartolomei" <surfprjab@hotmail.com>, s-news@lists.biostat.wustl.edu
Subject: Re: Large data sets
From: Tony Plate <tplate@blackmesacapital.com>
Date: Tue, 27 Apr 2004 16:19:47 -0600
In-reply-to: <BAY1-F82uOlvX6D96do00006536@hotmail.com>
References: <BAY1-F82uOlvX6D96do00006536@hotmail.com>
That means you're starting with around 1 GB of data if you store it as single precision, or 2 GB if double. That's definitely too large for S-plus 6.2 under Windows even if you maxed out the physical memory on your machine. So, looking into a database connection sounds like a good idea. I'd consider a larger hard disk -- trying to work with 2 GB of raw data in a database on a machine with only 30 GB of hard disk might be painful.

-- Tony Plate

At Tuesday 02:58 PM 4/27/2004, jose Bartolomei wrote:
Dear all,
Thanks for your answers!!

My apologies for used an expression that really is very subjective and relative.

The two database that I will receiving will contain 15 Colums and around 16 million observations. After this first shot of data, I will be receiving every 6 month around 6 million observation.


I have available a machine with 2GHz, 224MB-RAM, 30GB-HD, WinXP.

Thanks again,
jose



From: Tony Plate <tplate@blackmesacapital.com>
To: "jose Bartolomei" <surfprjab@hotmail.com>,s-news@lists.biostat.wustl.edu
Subject: Re: [S] Large data sets
Date: Tue, 27 Apr 2004 11:26:38 -0600

"Large" data set is all relative. Some people might think 1Mb is large, but that's actually tiny by the standards of current machines. I regularly use data objects in the region of 100Mb in S-plus under Windows 2000 on a machine with a couple of Gb of RAM. I have no idea if that is "gigantic" or small for you. (I'd think that a current reasonable interpretation of "gigantic" would be terabytes.) On a Unix machine with more physical memory (and 64-bit addressing) you could work with proportionally larger data sets. I think Insightful might be planning some more features in S-plus to perform analysis on data sets larger than will fit in S-plus -- you might want to check with them regarding features and timelines.

If you're planning to use ODBC you might want to verify that it will not be a bottleneck for your analysis.

hope this helps,

Tony Plate

At Tuesday 08:19 AM 4/27/2004, jose Bartolomei wrote:
Hi S-users

I am a student and I am quite new to R/S programming. During my short experience with the package and due to different comments (s-news & other) I have the impression that S-Plus have problem dealing with very large data sets.

I am correct in having this notion?

Soon I will receive a gigantic database that I was planned to analyze in S-Plus. To overcome this limitation I am planning to import the database to an
SQL server and access the data via the S-Plus ODBC to conduct the analysis.

This is a good idea?

Thanks to all,
jose

_________________________________________________________________
MSN Toolbar provides one-click access to Hotmail from any Web page ­ FREE download! http://toolbar.msn.com/go/onm00200413ave/direct/01/

--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

_________________________________________________________________
Get rid of annoying pop-up ads with the new MSN Toolbar ­ FREE! http://toolbar.msn.com/go/onm00200414ave/direct/01/


<Prev in Thread] Current Thread [Next in Thread>