s-news
[Top] [All Lists]

Re: [Int_snews] Re: Unique for matrices

To: Anne York <anne.york@noaa.gov>
Subject: Re: [Int_snews] Re: Unique for matrices
From: Bill Dunlap <bill@statsci.com>
Date: Tue, 28 Nov 2000 13:28:23 -0800 (PST)
Cc: Phil Corbett <stpbk@warwick.ac.uk>, S News <s-news@wubios.wustl.edu>
In-reply-to: <Pine.GSO.4.05.10011281149140.12915-100000@ofis450a.akctr.noaa.gov>
On Tue, 28 Nov 2000, Anne York wrote:

> Here is one  method:
> Suppose your matrix is called dum:
> 1. convert the matrix into a vector of strings:
> dum.strings_apply(dum,1,paste,collapse="")

This is about what I'd recommend, but 2 changes can make it
more reliable and faster.

(reliability) I like to use "\001" as the separator so that it doesn't
        confound, e.g., 12 3 and 1 23.  (This assumes that "\001", control-A,
        doesn't show up in data often).

(speed) Paste the columns together in one call to paste instead
        of calling paste on each row.  E.g.,
                dum.list <- as.list(data.frame(dum))
                names(dum.list)<-NULL
                dum.strings <- do.call("paste", c(list(sep="\001"), dum.list))
        (We take off the names of the data frame so paste() doesn't
        interpret them as argument names).

The complete function would be

unique.rows1 <- function(dum) {
        dum.list <- as.list(data.frame(dum))
        names(dum.list) <- NULL
        dum.strings <- do.call("paste", c(list(sep = "\001"), dum.list))
        dum[match(unique(dum.strings), dum.strings),  ]
}

You can add a column of counts with the following function

table.rows <- function(dum) {
        dum.list <- as.list(data.frame(dum))
        names(dum.list) <- NULL
        dum.strings <- do.call("paste", c(list(sep = "\001"), dum.list))
        tab <- table(factor(dum.strings, levels = unique(dum.strings)))
        cbind(dum[match(unique(dum.strings), dum.strings),  ], count = tab)
}
                
----------------------------------------------------------------------------
Bill Dunlap                                      22461 Mt Vernon-Big Lake Rd
Data Analysis Products Div. of MathSoft, Inc.    Mount Vernon, WA 98274
bill@statsci.com                                 360-428-8146

"All statements in this message represent the opinions of the author and do
not necessarily reflect MathSoft policy or position."


<Prev in Thread] Current Thread [Next in Thread>