I am trying to apply a technique of Bill Venables
(http://www.biostat.wustl.edu/archives/html/s-news/2000-04/msg00206.html)
for finding break points in a broken line
regression.
I need to do this for each group in a data set with
multiple groups. I have
tried using the by() function with no success.
Would someone please have a look at my example and
see if you can create a
single statement that would produce a data frame
such the following?
group threshold
----- ---------
A 5
B 4
For example:
> # data constructed to have breakpoint = 5
> a <- list( group=as.character(
rep("A",10) ),
+ xcol=as.numeric(
c(1,2,3,4,5,6,7,8,9,10) ),
+ ycol=as.numeric( c(9,9,9,9,9,8,7,6,5,4)
)
+ )
>
> # data constructed to have breakpoint = 4
> b <- list( group=as.character(
rep("B",10) ),
+ xcol=as.numeric(
c(1,2,3,4,5,6,7,8,9,10) ),
+ ycol=as.numeric( c(7,7,7,7,6,5,4,3,2,1)
)
+ )
>
> # Combine a and b in a single data frame
> ab <- rbind(data.frame(a),data.frame(b))
>
> # Show the data in an ASCII graph
> printer(height=35, width=80)
> plot(ab$xcol,ab$ycol, xlab='X', ylab='Y',
pch=as.vector(ab$group) )
> title("Two datasets A and B with
breakpoints")
> show.printer()
Two datasets A and B with breakpoints
......................................................................
. A A A A A
.
.
8.. A
.
.
. B B B B
A
.
.
6..
B A
.
Y .
B A
.
.
4.. B
A
.
.
. B
.
.
2.. B
.
.
. B
......................................................................
2 4
6 8 10
X
> dev.off()
graphsheet
2
>
> # Bill Venables' function to use with
optimize() to find the breakpoint threshold, th
> fn <- function( th, xvar, yvar )
+ {
+ X <- cbind(xvar, pmax(0, xvar - th))
+ sum(lsfit(X, yvar)$resid^2)
+ }
>
> # breakpoint for a should be 5
> round( optimize( f=fn, interval=range(a$xcol),
xvar=a$xcol, yvar=a$ycol )$minimum, 1 )
[1] 5
> # breakpoint for b should be 4
> round( optimize( f=fn, interval=range(b$xcol),
xvar=b$xcol, yvar=b$ycol )$minimum, 1 )
[1] 4
>
> # This is where I try to invoke (without
success) optimize(fn ... for each ab$group
> by( x = ab
+ , INDICES = ab$group
+ , FUN = fn
+ , xvar = ab$xcol
+ , yvar = ab$ycol
+ )
Warning messages:
Number of rows of result is not a multiple of
vector length (arg 1) in: cbind(..1, ..2, deparse.level = deparse.level)
Problem in specials.x | specials.y: length of longer
operand (30) should be a multiple of length of shorter (20)
Use traceback() to see the call stack
>
Thanks you for any help you can offer.
- Doug
Douglas Hersh
Environmental Quality Department
Massachusetts
Water Resource Authority
100
First Ave. Boston
MA 02129
phone: (617) 788-4945
fax: (617) 788-4888
email: douglas.hersh@mwra.state.ma.us
http://www.mwra.state.ma.us/harbor/html/bhrecov.htm