s-news
[Top] [All Lists]

Summary: svd for large array

To: "bob pruzek" <rmpruzek@yahoo.com>
Subject: Summary: svd for large array
From: jsv@stat.ohio-state.edu
Date: Sun, 24 Jul 2005 11:54:02 -0400 (EDT)
Cc: jsv@stat.ohio-state.edu, s-news@lists.biostat.wustl.edu
Importance: Normal
In-reply-to: <20050724031102.15579.qmail@web50402.mail.yahoo.com>
References: <1442.24.210.95.83.1122158816.squirrel@www.stat.ohio-state.edu> <20050724031102.15579.qmail@web50402.mail.yahoo.com>
User-agent: SquirrelMail/1.4.4
Special thanks to Bob Pruzek, whose success on a small Apple G4
prompted me to try this in R.

> A <- matrix(rnorm(3000000),100,30000)
> date()
[1] "Sun Jul 24 11:28:21 2005"
> S <- svd(A,nu=0)
> date()
[1] "Sun Jul 24 11:28:27 2005"
> dim(S$v)
[1] 30000   100

Works like a charm in 6 seconds.
  Thanks also to Brian Ripley and Bill Dunlop for their replies.
  The question now is "Shouldn't this work with Splus for Windows?"
  Judging from the R code, these all use the same calls to LAPACK or
  LINPACK.

Joe Verducci

______________________________________________________



> Joe,
>      For the heck of it I decided to try your problem
> on my little Apple
> Powerbook G4 (512 Mb ram).   The computation not only
> worked, but did so (correctly?) in less than 40
> seconds):
>
>  A<-matrix(rnorm(3000000),30000,100)
>> dim(A)
> [1] 30000   100
>> date()
> [1] "Sat Jul 23 23:01:39 2005"
>> svd(A,nu=0)$d
>   [1] 183 182 182 182 181 181 181 180 180 180 180 179
> 179 179
> middle snipped...
>  [91] 166 166 166 166 166 165 165 165 164 164
>  date()
> [1] "Sat Jul 23 23:02:17 2005"
>     ....For what it's worth...Bob Pruzek
>          (SUNY Albany)
>
> --- jsv@stat.ohio-state.edu wrote:
>
>> Dear Prof. Ripley:
>>   Here is what I get
>> > A <- matrix(rnorm(3000000),100,30000)
>> > S <- svd(A,nu=0)
>> Problem in 1:ll: Cannot create - data would have
>> length greater than 536
>> 870911 (.Machine$integer.max/sizeof(integer))
>> Use traceback() to see the call stack
>> > traceback()
>> 10: eval(action, sys.parent())
>> 9: doErrorAction("Problem in 1:ll: Cannot create -
>> data would have lengt
>> h greater than 536870911
>> (.Machine$integer.max/sizeof(integer))",
>> 8: 1:ll
>> 7: array(0., c(p, p))
>> 6: .Fortran(if(!cmplx) "dsvdcs" else "zsvdcs",
>> 5: .Fortran(if(!cmplx) "dsvdcs" else "zsvdcs",
>> 4: svd.default(A, nu = 0)
>> 3: svd(A, nu = 0)
>> 2: eval(expression(S <- svd(A, nu = 0)))
>> 1:
>> Message: Problem in 1:ll: Cannot create - data would
>> have length greater
>>  than 536870911
>> (.Machine$integer.max/sizeof(integer))
>> I interpreted line 7 from the traceback as svd's
>> attempt to
>> allocate a 30K x 30K intermediate result matrix.
>> This is corroborated
>> by the fact that the same message appears in the
>> allocation
>> > A <- matrix(0,30000,30000)
>> Problem in 1:ll: Cannot create - data would have
>> length greater than 536
>> 870911 (.Machine$integer.max/sizeof(integer))
>> Use traceback() to see the call stack
>>  I think that construction of a full symmetric
>> matrix may be part of a
>> computationally fast algorithm, but I would be
>> willing to trade some
>> extra time to do the computation with less space.
>> I am using "svd" that is included in the initial
>> loading of Splus.
>> Thanks for any further pointers you might provide.
>> Joe Verducci
>> ______________________________
>> > What do you want from the svd?  In a M x N system
>> the left singular
>> > vectors form a M x M system, taking up much more
>> space that the 2Gb
>> > Windows will allow S-PLUS.  By default svd()
>> avoids this by giving you
>> > only N (for M > N) components of the singular
>> vectors.
>> >
>> > You will probably do better to do things in two
>> steps, for example
>> >
>> > A <- matrix(rnorm(30), 10, 3)
>> > S <- svd(A, nu=0)
>> > S$u <- A %*% S$v / rep(S$d, each=nrow(A))
>> >
>> > Now, if finding S is too slow (and it should not
>> be with LAPACK-based
>> > code, so you might want to try the Matrix library)
>> you can get the
>> > components (less accurately) from
>> eigen(crossprod(A)),
>> >
>> > E <- eigen(crossprod(A))
>> > S <- list(d=sqrt(E$values), v=E$vectors)
>> > S$u <- A %*% S$v / rep(S$d, each=nrow(A))
>> >
>> > My 1GB laptop swapped rather a lot, but was able
>> to do that in less than a
>> > minute CPU.
>> >
>> > On Fri, 22 Jul 2005 jsv@stat.ohio-state.edu wrote:
>> >
>> >>  I'm using Splus 7.0 on a Windows XP with 4Gb
>> memory.  I would like to
>> >> perform an svd on a rectangular array where one
>> dimension is very large
>> >> (around 30K) and the other is relatively small
>> (around 100).
>> >> Unfortunately, svd is not one of the functions
>> listed to take advantage
>> >> of the out-of-memory mode in version 7.0.  Right
>> now my feasible limits
>> >> in-memory are (8K x 100).  It should be possible
>> to use program an SVD
>> >> for rectangular arrays using sequential
>> optimization, but before going
>> >> in that route, I am hoping someone here knows of
>> an available code.
>> >
>> > --
>> > Brian D. Ripley,
>> ripley@stats.ox.ac.uk
>> > Professor of Applied Statistics,
>> http://www.stats.ox.ac.uk/~ripley/
>> > University of Oxford,             Tel:  +44 1865
>> 272861 (self)
>> > 1 South Parks Road,                     +44 1865
>> 272866 (PA)
>> > Oxford OX1 3TG, UK                Fax:  +44 1865
>> 272595
>> >
> --------------------------------------------------------------------
>> > This message was distributed by
>> s-news@lists.biostat.wustl.edu.  To
>> > unsubscribe send e-mail to
>> s-news-request@lists.biostat.wustl.edu with
>> > the BODY of the message:  unsubscribe s-news
>> >
> --------------------------------------------------------------------
>> This message was distributed by
>> s-news@lists.biostat.wustl.edu.  To
>> unsubscribe send e-mail to
>> s-news-request@lists.biostat.wustl.edu with
>> the BODY of the message:  unsubscribe s-news
>





<Prev in Thread] Current Thread [Next in Thread>