s-news
[Top] [All Lists]

Minimizing a function for grouped data

To: <s-news@lists.biostat.wustl.edu>
Subject: Minimizing a function for grouped data
From: "Hersh, Douglas" <Douglas.Hersh@mwra.state.ma.us>
Date: Fri, 22 May 2009 16:52:03 -0400
Thread-index: AcnbHyiuPZfFfRu1TcyCubHSVxPqXQ==
Thread-topic: Minimizing a function for grouped data

I am trying to apply a technique of Bill Venables

(http://www.biostat.wustl.edu/archives/html/s-news/2000-04/msg00206.html)

for finding break points in a broken line regression.

I need to do this for each group in a data set with multiple groups. I have

tried using the by() function with no success.

 

Would someone please have a look at my example and see if you can create a

single statement that would produce a data frame such the following?

 

group threshold

----- ---------

A     5

B     4

 

For example:

 

> # data constructed to have breakpoint = 5

> a <- list( group=as.character( rep("A",10) ),

+            xcol=as.numeric( c(1,2,3,4,5,6,7,8,9,10) ),

+            ycol=as.numeric( c(9,9,9,9,9,8,7,6,5,4) )

+          )

>

> # data constructed to have breakpoint = 4

> b <- list( group=as.character( rep("B",10) ),

+            xcol=as.numeric( c(1,2,3,4,5,6,7,8,9,10) ),

+            ycol=as.numeric( c(7,7,7,7,6,5,4,3,2,1) )

+          )

>

> # Combine a and b in a single data frame

> ab <- rbind(data.frame(a),data.frame(b))

>

> # Show the data in an ASCII graph

> printer(height=35, width=80)

> plot(ab$xcol,ab$ycol, xlab='X', ylab='Y', pch=as.vector(ab$group) )

> title("Two datasets A and B with breakpoints")

> show.printer()

 

 

                       Two datasets A and B with breakpoints

 

       ......................................................................

       . A      A       A      A      A

       .

       .

     8..                                     A

       .

       .

       . B      B       B      B                    A

       .

       .

     6..                              B                    A

       .

 Y     .                                     B                    A

       .

       .

     4..                                            B                    A

       .

       .

       .                                                   B

       .

       .

     2..                                                          B

       .

       .

       .                                                                 B

       ......................................................................

                2              4             6             8            10

 

                                         X

 

 

> dev.off()

 graphsheet

          2

>

> # Bill Venables' function to use with optimize() to find the breakpoint threshold, th

> fn <- function( th, xvar, yvar )

+       {

+         X <- cbind(xvar, pmax(0, xvar - th))

+         sum(lsfit(X, yvar)$resid^2)

+       }

>

> # breakpoint for a should be 5

> round( optimize( f=fn, interval=range(a$xcol), xvar=a$xcol, yvar=a$ycol )$minimum, 1 )

[1] 5

> # breakpoint for b should be 4

> round( optimize( f=fn, interval=range(b$xcol), xvar=b$xcol, yvar=b$ycol )$minimum, 1 )

[1] 4

>

> # This is where I try to invoke (without success) optimize(fn ... for each ab$group

> by( x = ab

+     , INDICES = ab$group

+     , FUN = fn

+     , xvar = ab$xcol

+     , yvar = ab$ycol

+   )

Warning messages:

  Number of rows of result is not a multiple of vector length (arg 1) in: cbind(..1, ..2, deparse.level = deparse.level)

Problem in specials.x | specials.y: length of longer operand (30) should be a multiple of length of shorter (20)

Use traceback() to see the call stack

> 

 

Thanks you for any help you can offer.

 

- Doug

 

Douglas Hersh

Environmental Quality Department

Massachusetts Water Resource Authority

100 First Ave. Boston MA 02129

phone: (617) 788-4945

fax: (617) 788-4888

email: douglas.hersh@mwra.state.ma.us

http://www.mwra.state.ma.us/harbor/html/bhrecov.htm

 

<Prev in Thread] Current Thread [Next in Thread>
  • Minimizing a function for grouped data, Hersh, Douglas <=