s-news
[Top] [All Lists]

Re: value of enterprise edition / big data?

To: "Michael Camilleri" <MichaelCamilleri@branz.co.nz>, "Dave Cacela" <DCacela@stratusconsulting.com>, <s-news@wubios.wustl.edu>
Subject: Re: value of enterprise edition / big data?
From: "Michael Camilleri" <MichaelCamilleri@branz.co.nz>
Date: Tue, 20 Mar 2007 08:49:55 +1200
In-reply-to: <A23B682083FD8248ADFE1317C82CBD6702536F27@exchange.branznt.org.nz>
References: <88D649D1E09B684E8CBF77A5ADB3D306010847BB@tron.stratus.local> <A23B682083FD8248ADFE1317C82CBD6702536F27@exchange.branznt.org.nz>
Thread-index: Acdn8PCzVsxPPTljTP2ivauebDQ8CQBssexwADCbliA=
Thread-topic: value of enterprise edition / big data?
BTW, when you are importing files and are near the limits of your memory
some import functions might work and some may not. There is the GUI
import menu function, which you can also call directly as
guiImportData(). There is also the importData function which is very
similar. On a large data set one might work where another fails.

If your files are stored in a database then that frees S+ from having to
figure out what the column data types should be. This might be a more
efficient way of importing large files.

If you are reading delimited files then you can use read.table().

And finally, the openData function lets you open large files a block at
a time. This allows you to open a large file and process it on the fly.

Just saying that if at first you don't succeed, try, try again. There
are usually several ways to accomplish the same thing in S+, and
sometimes their performance is dramatically different.

As far as analysis goes, do you really need to look at a few dozen
columns at the same time? Doing preliminary analysis on subsets of the
data might be a viable option, or enable you to find which columns you
really need to look at, or how to summarise or compact the data.

Michael

<Prev in Thread] Current Thread [Next in Thread>