Hi everybody,
I’m estimating a binomial GLM on a large dataset
(about 2.5M records, 100 variables). The model itself has about 20 variables,
many of which are categorical, so the model itself has (at the moment) just
over 100 parameters. SPLUS seems to estimate the model just fine, although of
course it takes a while, and it produces the (sensible-looking) in-sample fits
without complaints. However, when I try to generate out-of-sample predictions
using predict (where newdata = OutData has about 500k records), I get the
dreaded “unable to obtain requested dynamic memory” error.
Traceback follows:
---
15: eval(action, sys.parent())
14: doErrorAction("Problem in
bd.internal.exec.node(engine.class = \"com.insightful.miner.BDLManager$BDLSplusScri..:
BDLManager$BDLSplusScriptEngineNode (0): Proble m in
model.matrix.default(args$terms.object, IM$in1, args$contrasts.arg,
args$xlevels): Unable to obtain requested dynamic memory",
13: stop(ret$error)
12: bd.internal.exec.node(engine.class =
"com.insightful.miner.BDLManager$BDLSplusScriptEngineNode",
node.props = node.props, inputs = in.bdFrame.lst, num.outputs =
11: list(
10: NULL
9: bd.block.apply(data, FUN =
bd.internal.model.matrix.script, test = F, one.block = F, sample = F)
8: bd.internal.model.matrix(terms(pform), mf, contrasts =
object$contrasts, xlevels = object$xlevels)
7: predict.bdGlm(sub.glm, OutData, type =
"response")
6: predict(sub.glm, OutData, type = "response")
5: eval(i, local)
4: source(auto.print = auto.print, exprs =
substitute(exprs.literal))
3: script.run(exprs.literal = {
2: eval(_expression_(script.run(exprs.literal = {
1:
Message: Problem in bd.internal.exec.node(engine.class =
"com.insightful.miner.BDLManager$BDLSplusScri..:
BDLManager$BDLSplusScriptEngineNode (0): Problem in model.
matrix.default(args$terms.object, IM$in1,
args$contrasts.arg, args$xlevels): Unable to obtain requested dynamic memory
---
I’m at a loss to explain this, since it is using
predict.bdGlm, and my understanding is that this is exactly the limitation that
the bigdata library is supposed to address. Clearly it’s able to produce
such results on a larger data set (namely, the sample used to estimate the
model), so why would it choke on a smaller data set?
I’m running SPLUS 8.0 under Windows XP. My RAM is
2G, and page file is about 3G, although since it’s supposed to be using
bigdata routines, I’m not sure how this matters. I also have about 100G
free disk space.
I’m going to try chopping down the number of
variables in the dataset to see if that helps, but I feel like I
shouldn’t have to. Any ideas? I’m hoping somebody has run into
this problem before – it doesn’t seem like an unusual situation. I’ve
searched the archives but couldn’t find any guidance.
Thanks in advance, and hope I can return the favor
someday,
Marc Pelath