| To: | Stephen Ban <ban@zoology.ubc.ca> |
|---|---|
| Subject: | Re: Improting large datasets into S-Plus and R |
| From: | Sven.Knudsen@adeptscience.dk |
| Date: | Fri, 26 Sep 2003 13:45:36 +0200 |
| Cc: | "John E. Cornell, Ph.D." <cornell@uthscsa.edu>, S-News <s-news@lists.biostat.wustl.edu>, s-news-owner@lists.biostat.wustl.edu |
| In-reply-to: | <5.1.1.6.0.20030925130659.00af05f0@pop.zoology.ubc.ca> |
|
The import gui is sometimes acting strange - might have something to do with windows. A more direct approach is to use the command line - but the block read and write method seem to work much faster. You can then append (this slows down the process) - or read the whole data as one large block. In fact, the latter works faster than traditional import methods. Look for the functions OpenData ReadNextDataRows WriteNextDataRows closeData Here is an example of a function that imports data and calculate min and max minmaxBlock <- function(file,type,nr) { #Construct data handle dh <- openData(file,type=type,openType="read", rowsToRead=nr) tempblock <- readNextDataRows(dh) tempmin <- sapply(tempblock,min) tempmax <- sapply(tempblock,max) while(T) { tempblock <- readNextDataRows(dh) if( length(tempblock) == 0 ) break tempmin <- pmin(tempmin, sapply(tempblock,min)) tempmax <- pmax(tempmax, sapply(tempblock,max)) } list(min=tempmin, max=tempmax) } Note that the function does not import the data, but aggregates it (a bit inspiret of Insightful Miner, without braking any patent :-) I have tested it using datasets with 1 mill records x 5 cols - in which case it uses less than 1 min. Using traditional import, the time used is 3.5 min (Windows 2000, S+6.1, 512 MB RAM, PIII 800 Mhz) Hope this is to any inspiration. Adept Scientific ApS Sven Jesper Knudsen Senior Consultant
I also have large datasets (1.5-2.0 million records), and I found a weird solution. SPSS had no problems opening my datasets. If you then save them as SPSS *.sav files, S-PLUS was able to import these no problem, even though it would choke on the original raw data. Hope you have access to SPSS :) Stephen At 01:10 PM 25/09/2003 -0500, John E. Cornell, Ph.D. wrote: >I have a dataset with 1.6 million records and 45 binary variables. I want >to apply monothetic (mona) and fuzzy (fanny) clustering methods to the >datamatrix. The original dataset was created in SAS, but I created a comma >delimited text version to import into S-Plus. When I try to import the >dataset via the Import GUI interface, the program freezes and stops >responding. Is there a more efficient way to import a large dataset into >S-Plus or R? > >John Cornell > >**************************************************************************** >******** > >Expectation, hope, intention toward possibility that has still not become: >this is not only a basic feature of human consciousness, but, ..., a basic >determination within objective reality as a whole. > > --Ernst Bloch > --The Principle of Hope (Vol. 1) > > > >-------------------------------------------------------------------- >This message was distributed by s-news@lists.biostat.wustl.edu. To >unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with >the BODY of the message: unsubscribe s-news -------------------------------------------------------------------- This message was distributed by s-news@lists.biostat.wustl.edu. To unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with the BODY of the message: unsubscribe s-news |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | S+ 6.1 examples of Genetic Algorithms or Genetic Programming?, Greg Makowski |
|---|---|
| Next by Date: | Re: Improting large datasets into S-Plus and R, Robert Dodier |
| Previous by Thread: | Re: Improting large datasets into S-Plus and R, Stephen Ban |
| Next by Thread: | Re: Improting large datasets into S-Plus and R, Robert Dodier |
| Indexes: | [Date] [Thread] [Top] [All Lists] |