s-news
[Top] [All Lists]

[S] Summary: Memory problems.

To: s-news@wubios.wustl.edu
Subject: [S] Summary: Memory problems.
From: "Gérald Jean" <Gerald.Jean@spgdag.ca>
Date: Mon, 31 Jan 2000 11:46:34 -0500
Sender: owner-s-news@wubios.wustl.edu
Hello S-users,

last Friday I posted a question concerning memory management.  I got a single
answer, from Brian Ripley.  Thanks to him for his support.  Following you will
find the original posting and Brian Ripley's reply.

Thanks again to prof. Ripley.

Gérald Jean
Analyste-conseil (statistiques), Actuariat
télephone            : (418) 835-8839
télecopieur          : (418) 835-5865
courrier électronique: gerald.jean@spgdag.ca

"In God we trust all others must bring data"

---------------



On Fri, 28 Jan 2000, [iso-8859-1] "Gérald Jean" wrote:

>
> Hello S-users,
>
> I am running Splus-2000 under NT4.0 on a Compaq AP500, dual processors, 1G of
> RAM and 4.8G of swap file.  Unfortunately I still have memory problems?  I am
> fitting glm's on very large tables.  The following script aborted with the
> following error, all other applications were closed in an attempt to give as
> much memory as possible to S+.  One of the thing that puzzles me is that the
> same script ran the previous day with a slightly larger data set?
>
> > dim(inc3.freq)        # data set in previous run.
> [1] 336458     18
> > dim(inc4.freq)        # data set in this run.
> [1] 318704     18
>
> > inc4.glm.freq <- glm(formula = inc3.et4.freq.formula, family = quasi(link =
> log,
>      var = mu), data = inc4.all, weights = unsous)
>
> Error in .Fortran("glmfit",: Unable to obtain requested dynamic memory (this
req
> uest is for 264217676 bytes, 0 bytes already in use)
>
> inc3.et4.freq.formula contains 12 factors for a total of 53 levels.  Yes I
know
> that family quasi with a log link and mu variance
>  is a poisson model, I use quasi to estimate the dispersion subsequently.

You can estimate over-dispersion using the poisson, of course. It is unused
in fitting, and setting dispersion=0 in a call to summary gives you
estimated dispersion.

I DID NOT KNOW THE DISPERSION COULD BE ESTIMATED THIS WAY?  ONE MORE THING I
LEARNED!

> Is there anything I could do to improve memory management?

Yes. Set a good starting point. The memory build up occurs (AFAIK) as
glm iterates, and the difference between your examples is likely to be the
number of iterations used.  It seems unlikely that you need 300 000
examples, and I would have thought that a 10% or less sample (making sure
all the levels occur) would suffice: if not it will provide an excellent
starting point. To get a starting point, predict on the whole dataset and
use that for start.


> Maybe running S+ differently?
>
> Maybe setting some of the options differently?  e.g. the "memory" option which
> is at the 2147483647 bytes default; even after
> reading the help file it is not clear to me what this option does?  Any other
> options could help?

You can only reduce that one, and you don't have 2G of RAM. It limits the
total memory usage.

--
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news


-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>
  • [S] Summary: Memory problems., "Gérald Jean" <=