s-news
[Top] [All Lists]

Re: MASS4 questions

To: <nj7w@virginia.edu>
Subject: Re: MASS4 questions
From: ripley@stats.ox.ac.uk
Date: Fri, 28 Jun 2002 07:46:44 +0100 (BST)
Cc: <s-news@lists.biostat.wustl.edu>
In-reply-to: <Pine.A41.4.32.0206271803270.44990-100000@node1.unix.Virginia.EDU>
On Thu, 27 Jun 2002 nj7w@virginia.edu wrote:

> I am using MASS4 package for doing classification.

There is no MASS4 package: I presume you mean the *class* library section.
It contains lots of methods for clasification, but you are not using one.
It has two methods for SOM, and you are not using the preferred one.

> Facing two problems:
> 1) Have a test data frame, named data (9*3)
> and wish to calssify it in 3 groups using knn1.
>
> The code is:
> > data
>              c1    c2    c3
>       r1    1.0   1.0   1.0
>       r2    2.0   1.5   3.0
>       r3    1.2   0.9   1.0
>       r4   20.0  14.0  15.0
>       r5   21.0  13.0  16.0
>       r6   22.0  14.0  14.5
>       r7 1001.0 105.0 100.0
>       r8 1000.0  88.0  96.0
>       r9  999.0 100.0 101.0
>
> > grid <- somgrid(3,1,topo = "hexagonal")
> > test.som<-SOM(data,grid)
> > bins <- as.numeric(knn1(test.som$code, data, 0:2))
>
> The classification of the first two groups is
> variable, as repeating the knn1 statement gives different
> classification for 1st two groups.

You are using SOM, an unsuervised method that does not know about the
groups, rather than knn1.  And yes, it is random (the book MASS4 does say
so).

>
> When I look at the test.som$codes, I got:
> > test.som$codes
>                 c1       c2       c3
>       [1,] 352.1387 327.4666 340.0801
>       [2,] 352.1387 327.4666 340.0801
>       [3,] 364.9670 384.6117 270.0031
> implying that the representative vector is very similar in
> first two cases, and hence the misclassification by knn1.
>
> Shouldn't they be different, as clearly the data set has three
> types of vectors, having low, medium and high values or I am
> missing some part?

You clearly have not read up on the theory of SOM, nor have you looked at
the examples in the book which this illustrates.  You are using a
inappropriate method with completely inappropriate parameters,


> 2) It is more of understanding problem:
>
> If we wish to classify data set of dimensions X*Y
> and specify the grid as m*n, then SOM gives the
> matrix of representatives (m*n * Y).
>
> Clearly, X >= m*n
> but there should not be any dependency of m*n on Y
> (like m*n >=Y etc.)
>
> So why this code is failing?
> > y<-matrix(rnorm(1000),nrow=100,byrow=T)
> > gr1 <- somgrid(4,2,topo = "hexagonal")
> > SOM(y,gr1)
>
> It gives: "Terminating S Session: Bus Error signal"

I have no idea.

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


<Prev in Thread] Current Thread [Next in Thread>