s-news
[Top] [All Lists]

Problems with predict() within rpart package under S-Plus 6.2

To: <s-news@lists.biostat.wustl.edu>
Subject: Problems with predict() within rpart package under S-Plus 6.2
From: <Peter.Caley@csiro.au>
Date: Wed, 17 Nov 2004 10:06:16 +1100
Thread-index: AcTMMOB07NpuZTt2RuOB5uPw7UQetw==
Thread-topic: Problems with predict() within rpart package under S-Plus 6.2
Dear list

A couple of problems with running rpart() under S-Plus 6.2.  Consider
the following model:

> temp.model
n= 369 
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 369 83 1 (0.2249322 0.7750678) 
2) as.factor(Q7.02)=N 95 42 1 (0.4421053 0.5578947) 
4) as.factor(Q3.06)=N 20 3 0 (0.8500000 0.1500000) *
5) as.factor(Q3.06)=Y 75 25 1 (0.3333333 0.6666667) *
3) as.factor(Q7.02)=Y 274 41 1 (0.1496350 0.8503650) 
6) as.factor(Q3.01)=N 98 32 1 (0.3265306 0.6734694) 
12) as.factor(Q1.01)=Y 14 4 0 (0.7142857 0.2857143) *
13) as.factor(Q1.01)=N 84 22 1 (0.2619048 0.7380952) *
7) as.factor(Q3.01)=Y 176 9 1 (0.0511363 0.9488636) *

and a single row of a dataframe:

> Training[i,  ]
              Species Q1.01 Q1.02 Q1.03 Q2.01 Q2.02 Q2.03 Q2.04 Q3.01
Q3.02 Q3.03 Q3.04 Q3.05 Q4.01 Q4.02 Q4.03 Q4.04 Q4.05 Q4.06 Q4.07 
X2 Abies nordmanniana     Y     Y     N     1     0     N     Y     Y
Y     N     N     N     N     N     N     N    NA    NA    NA

   Q4.08 Q4.09 Q4.1 Q4.11 Q4.12 Q5.01 Q5.02 Q5.03 Q5.04 Q6.01 Q6.02
Q6.03 Q6.04 Q6.05 Q6.06 Q6.07 Q7.01 Q7.02 Q7.03 Q7.04 Q7.05 Q7.06 Q7.07 
X2     N    NA    N    NA     N     N     N    NA    NA    NA    NA
Y     N     N     Y     0    NA     N     Y     N     Y     N     N

   Q7.08 Q8.01 Q8.02 Q8.03 Q8.04 Q8.05 Q3.06 Q7.09 Q6.07.Factor
Weed.Class Binary.Outcome 
X2     N    NA    NA    NA    NA    NA     Y     N        short
0              0
  
First, predict() doesn't like predicting for a single observation:

> predict(object = temp.model, newdata = Training[i,  ], type = "prob")
Problem in dimnames(pred) <- list(names(where), ylevels): Cannot have
dimnames for nonarray 
Use traceback() to see the call stack

However, predic() will happily predict for two or more rows:

> predict(object = temp.model, newdata = rbind(Training[i,  ],
Training[i,  ]), type = "prob")
       0    1 
 X2 0.85 0.15
X21 0.85 0.15

However, the predictions are incorrect, as an observation with Q7.02="N"
and Q3.06="Y" should end up at a terminal node with fitted probabilities
of (0.3333333 0.6666667), not the returned (0.85, 0.15).  

Running the same code in R2.0 provides the seemingly correct answer:

> predict(object=temp.model,newdata=Training[i,],type="prob")
          0         1
2 0.3333333 0.6666667

Can anybody help with the S of things [I know what the R users will
say!]

cheers

Peter

*********************************************************************
Dr Peter Caley
CSIRO Entomology
GPO Box 1700, Canberra,
ACT 2601
Email: peter.caley@csiro.au
Ph: +61 (0)2 6246 4076   Fax: +61 (0)2 6246 4000
*********************************************************************

<Prev in Thread] Current Thread [Next in Thread>
  • Problems with predict() within rpart package under S-Plus 6.2, Peter.Caley <=