s-news
[Top] [All Lists]

Re: sorting a data frame

To: "Data Analytics Corp." <dataanalytics@earthlink.net>
Subject: Re: sorting a data frame
From: David L Lorenz <lorenz@usgs.gov>
Date: Tue, 29 Jan 2008 09:16:33 -0600
Cc: s-news@lists.biostat.wustl.edu, s-news-owner@lists.biostat.wustl.edu
In-reply-to: <479F3EA2.8050206@earthlink.net>

Walt,
  The reason that the data appear not to be sorted correctly is that the year column is not an integer value. There are very small differences in the value for yr because of minor differences in the storage of the values. This is what I get when I subtract 2007 from x$yr
x$yr - 2007
 [1]  1.000000e+000  3.183231e-012  3.183231e-012  3.183231e-012  3.183231e-012 -5.684342e-012
-5.684342e-012 -5.684342e-012
 [9] -5.684342e-012 -5.684342e-012
  As you can see, the values that started out greater than 7 are all slightly less than 2007 and the values that started out less than 8 are all slightly larger than 2007, thus they are sorted correctly. You need to round them when you do your computations.
Dave



"Data Analytics Corp." <dataanalytics@earthlink.net>
Sent by: s-news-owner@lists.biostat.wustl.edu

01/29/2008 08:56 AM

To
s-news@lists.biostat.wustl.edu
cc
Subject
[S] sorting a data frame





Good morning,

I posted a question yesterday regarding sorting a data frame.  The
problem is that the sort order is not correct when I use sort.col.  I
received many responses, all saying basically the same thing - I must
have a factor.  I actually checked this before posting the query and all
was numeric.  So let me try again but be more explicit this time,
because I'm very puzzled.

I have a client file that has the dates as 1.2008, 4.2007, etc.  The
commands and example I used along with the as.numeric() checks are
below.  As you can see, the sort is not correct.  I still don't
understand why.  Any hints?

> y <- c(1.2008, 4.2007, 5.2007, 6.2007, 7.2007, 8.2007, 9.2007,
10.2007, 11.2007, 12.2007)
> y
[1]  1.2008  4.2007  5.2007  6.2007  7.2007  8.2007  9.2007 10.2007
11.2007 12.2007
> is.numeric(y)
[1] T
> x <- data.frame(yr = (y - floor(y))*10000, month = floor(y))
> x
    yr month
1 2008     1
2 2007     4
3 2007     5
4 2007     6
5 2007     7
6 2007     8
7 2007     9
8 2007    10
9 2007    11
10 2007    12
> is.numeric(x$month)
[1] T
> sort.col(x, "@ALL", 1:2)
    yr month
6 2007     8
7 2007     9
8 2007    10
9 2007    11
10 2007    12
2 2007     4
3 2007     5
4 2007     6
5 2007     7
1 2008     1
>


Any help is appreciated.

Walt Paczkowski
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>