s-news
[Top] [All Lists]

Re: Minimizing a function for grouped data - summary

To: <s-news@lists.biostat.wustl.edu>
Subject: Re: Minimizing a function for grouped data - summary
From: "Hersh, Douglas" <Douglas.Hersh@mwra.state.ma.us>
Date: Thu, 4 Jun 2009 12:23:40 -0400
In-reply-to: <C0CF5802DA312049982061E94F592D390B6AFDB9@MWRAMAIL.mwra.net>
References: <C0CF5802DA312049982061E94F592D390B6AFDB9@MWRAMAIL.mwra.net>
Thread-index: AcnbHyiuPZfFfRu1TcyCubHSVxPqXQKDjHqg
Thread-topic: Minimizing a function for grouped data - summary

A better subject for this would have been "Using by() with a custom function"

 

With help from Kirsten Smith at TIBCO support, this is a revised recap of the problem and solution.

 

# data constructed to have breakpoint in a piecewise linear regression = 5

a <- list( group=as.character( rep("A",10) ),

           xcol=as.numeric( c(1,2,3,4,5,6,7,8,9,10) ),

           ycol=as.numeric( c(9,9,9,9,9,8,7,6,5,4) )

         )

 

# data constructed to have breakpoint = 4

b <- list( group=as.character( rep("B",10) ),

           xcol=as.numeric( c(1,2,3,4,5,6,7,8,9,10) ),

           ycol=as.numeric( c(7,7,7,7,6,5,4,3,2,1) )

         )

 

# Combine a and b in a single data frame

ab <- rbind(data.frame(a),data.frame(b))

 

# Plot the data

printer(height=35, width=80)

plot(ab$xcol,ab$ycol, xlab='X', ylab='Y', pch=as.vector(ab$group) )

title("Two datasets A and B with breakpoints")

show.printer()

dev.off()

 

# Bill Venables' function to use with optimize() to find the breakpoint threshold, th

# The round() function is not required - I just use it to get breakpoints of exactly 5 and 4

optFun <- function( xvar, yvar )

          { round( optimize( f=function( th, xvar, yvar )

                               { X <- cbind(xvar, pmax(0, xvar - th))

                                 sum(lsfit(X, yvar)$resid^2)

                               }

                             , interval=range(xvar)

                             , xvar=xvar

                             , yvar=yvar

                           )$minimum

                   , 1

                 )

          }

 

# This is where I try to invoke (without success) optFun() for each group in ab

by( x = ab

    , INDICES = ab$group

    , FUN = optFun

    , xvar = x$xcol

    , yvar = x$ycol

  )

 

# This is the version from TIBCO support that works.

# When using by() with a custom function, the calling syntax is a little different.

# The variable in the FUN= argument is the dataset specified in the x= argument, and then you can call the actual arguments for optFun().

by( x = ab

    , INDICES = ab$group

    , FUN = function(x) optFun(x$xcol, x$ycol)

  )

 

Douglas Hersh

Environmental Quality Department

Massachusetts Water Resource Authority

100 First Ave. Boston MA 02129

phone: (617) 788-4945

fax: (617) 788-4888

email: douglas.hersh@mwra.state.ma.us

http://www.mwra.state.ma.us/harbor/html/bhrecov.htm

 


From: Hersh, Douglas
Sent: Friday, May 22, 2009 4:52 PM
To: s-news@lists.biostat.wustl.edu
Subject: Minimizing a function for grouped data

 

I am trying to apply a technique of Bill Venables

(http://www.biostat.wustl.edu/archives/html/s-news/2000-04/msg00206.html)

for finding break points in a broken line regression.

I need to do this for each group in a data set with multiple groups. I have

tried using the by() function with no success.

 

Would someone please have a look at my example and see if you can create a

single statement that would produce a data frame such the following?

 

group threshold

----- ---------

A     5

B     4

 

For example:

 

> # data constructed to have breakpoint = 5

> a <- list( group=as.character( rep("A",10) ),

+            xcol=as.numeric( c(1,2,3,4,5,6,7,8,9,10) ),

+            ycol=as.numeric( c(9,9,9,9,9,8,7,6,5,4) )

+          )

>

> # data constructed to have breakpoint = 4

> b <- list( group=as.character( rep("B",10) ),

+            xcol=as.numeric( c(1,2,3,4,5,6,7,8,9,10) ),

+            ycol=as.numeric( c(7,7,7,7,6,5,4,3,2,1) )

+          )

>

> # Combine a and b in a single data frame

> ab <- rbind(data.frame(a),data.frame(b))

>

> # Show the data in an ASCII graph

> printer(height=35, width=80)

> plot(ab$xcol,ab$ycol, xlab='X', ylab='Y', pch=as.vector(ab$group) )

> title("Two datasets A and B with breakpoints")

> show.printer()

 

 

                       Two datasets A and B with breakpoints

 

       ......................................................................

       . A      A       A      A      A

       .

       .

     8..                                     A

       .

       .

       . B      B       B      B                    A

       .

       .

     6..                              B                    A

       .

 Y     .                                     B                    A

       .

       .

     4..                                            B                    A

       .

       .

       .                                                   B

       .

       .

     2..                                                          B

       .

       .

       .                                                                 B

       ......................................................................

                2              4             6             8            10

 

                                         X

 

 

> dev.off()

 graphsheet

          2

>

> # Bill Venables' function to use with optimize() to find the breakpoint threshold, th

> fn <- function( th, xvar, yvar )

+       {

+         X <- cbind(xvar, pmax(0, xvar - th))

+         sum(lsfit(X, yvar)$resid^2)

+       }

>

> # breakpoint for a should be 5

> round( optimize( f=fn, interval=range(a$xcol), xvar=a$xcol, yvar=a$ycol )$minimum, 1 )

[1] 5

> # breakpoint for b should be 4

> round( optimize( f=fn, interval=range(b$xcol), xvar=b$xcol, yvar=b$ycol )$minimum, 1 )

[1] 4

>

> # This is where I try to invoke (without success) optimize(fn ... for each ab$group

> by( x = ab

+     , INDICES = ab$group

+     , FUN = fn             # I forgot to include the optimize() function here. See improved version in follow-up above.

+     , xvar = ab$xcol

+     , yvar = ab$ycol

+   )

Warning messages:

  Number of rows of result is not a multiple of vector length (arg 1) in: cbind(..1, ..2, deparse.level = deparse.level)

Problem in specials.x | specials.y: length of longer operand (30) should be a multiple of length of shorter (20)

Use traceback() to see the call stack

> 

 

Thanks you for any help you can offer.

 

- Doug

 

Douglas Hersh

Environmental Quality Department

Massachusetts Water Resource Authority

100 First Ave. Boston MA 02129

phone: (617) 788-4945

fax: (617) 788-4888

email: douglas.hersh@mwra.state.ma.us

http://www.mwra.state.ma.us/harbor/html/bhrecov.htm

 

<Prev in Thread] Current Thread [Next in Thread>
  • Re: Minimizing a function for grouped data - summary, Hersh, Douglas <=