s-news
[Top] [All Lists]

Re: Data manipulation and big data problem

To: "'Chen,Sichong'" <sichong.chen@postgrad.manchester.ac.uk>, s-news@lists.biostat.wustl.edu
Subject: Re: Data manipulation and big data problem
From: "Barker, Chris [SCIUS]" <cbarker1@scius.jnj.com>
Date: Thu, 29 Jun 2006 14:20:54 -0400

I used the bigdata library with about a  Gig (1G) of data. Some iterative procedures (logistic regression) may take a few minutes, though I think that's perfectly reasonable.  Its an excellent tool.


        Chris Barker


 -----Original Message-----
From:   s-news-owner@lists.biostat.wustl.edu [mailto:s-news-owner@lists.biostat.wustl.edu]  On Behalf Of Chen,Sichong
Sent:   Thursday, June 29, 2006 10:20 AM
To:     s-news@lists.biostat.wustl.edu
Subject:        [S] Data manipulation and big data problem

Thanks for Thomas's brilliant function and David's recommendation of the
library. My problem has been solved.

But I have a new question regarding process big data set.

My data set is 2106745*3 (one year data), the size of original csv file is
100M. When I try to group2row to transform data, for once it succeeded, but
it failed afterwards, it shows Microsoft Visual C++ runtime error, and the
SPlus program is terminated. I tried a small size data set. It works without
any problem.

There is another problem when I handle this big data. Sometimes it shows
that not sufficient dynamic memory and the program stopped but Splus was not
terminated.

It is the first time I am dealing with big data set and I will increase my
time period to ten years later on. Could someone tell me, what is the
biggest size of a dataset Splus can process?

My version is 7.0 and there is a big data library. Is this library really
good? It sounds to deal with big data problem. But old Splus function does
not work for bd object, for example group2row, since big data object has
their unique function. To solve this problem, I have to use  bd.coerce  to
convert bd object to ordinary data frame. I don't have any experience to
process bd object before so I am not quite sure whether this object is
superior to the previous one. If it is not much better than previous data
frame, I don't want to use them any more due to the inconvenience.

Many thanks,
Sichong
SPlus 7.0 & FinMetrics 2.0 User
Windows XP


--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>