Does anyone know of an S+ implementation of the
INDSCAL multidimensional scaling algorithm described
by Carroll and Chang in Psychometrika in 1970?
The input is an n * n* m array consisting of the m
matrices of pairwise differences/distances between n
stimuli (each matrix representing the comparisons made
by one of m subjects). The output is a set of n
points in Euclidean r space representing the stimuli,
and a set of m weights representing the weight given
to each dimension by each subject.
At the bottom of this message is my somewhat
amateurish /student attempt at a function (Win 98, S+
2000) to perform the algorithm. Something must be
wrong at least with the scaling (eg is it correct to
set the sum of squares of each of the m scalar
products matrices to equal one) if not something more
fundamental because, although the results look
plausible on a plot, the resulting weights totally
fail to have the properties described by Carroll and
Chang namely representing by their distance from the
origin the proportion of each subject's variation
explained by the multidimensional scaling procedure.
Very grateful for any pointers from those more
mathematically and S+ skilled than myself both on the
fundamentals and also any pointers on vectorising some
fairly inelegant code. Hope I'm not wasting people's
time with an obvious mistake.
See the following contrived fuel.frame example of
usage:
> attach(fuel.frame)
> names(fuel.frame)
[1] "Weight" "Disp." "Mileage" "Fuel" "Type"
> fuel.d_dist(scale(cbind(Weight,Disp.,Mileage,Fuel)))
> mean(fuel.d)
[1] 2.424668
> fuel.d2_rep(fuel.d,15)+rnorm(60*59/2*15,4,2)
> fuel.da_array(0,dim=c(60,60,15))
> for (i in 1:15){
+ fuel.da[,,i]_dist2full(
+ fuel.d2[((i-1)*60*59/2+1):(i*60*59/2)],n=60)}
> #
> # ie array with 15 different impressions of
> # the differences between pairs of vehicles.
> #
> fuel.ind_indscal(fuel.da)
[1] 0.00004494675
[1] 3.898503e-008
[1] 1.055661e-008
[1] 2.630077e-009
[1] 7.865586e-010
[1] 2.505446e-010
[1] 8.252098e-011
> par(mfrow=c(2,2))
> plot(fuel.ind$points)
> plot(fuel.ind$W,
+ xlim=c(-max(fuel.ind$W[,1]),max(fuel.ind$W[,1]))
+ ,ylim=c(-max(fuel.ind$W[,2]),max(fuel.ind$W[,2])))
> abline(h=0,v=0)
based on the following functions:
> dist2full
function(dis, n = attr(dis, "Size"))
{
full <- matrix(0, n, n)
full[lower.tri(full)] <- dis
full + t(full)
}
> indscal
function(d, r = 2, add = T, maxit = 50, print.iter =
T, workings = F)
{
n <- dim(d)[1]
m <- dim(d)[3]
b <- array(0, dim(d))
ac <- rep(0, m)
W <- matrix(0, m, r)
change <- 1 #
#
# Uses cmdscale to estimate starting position:
#
if(add) {
XL <- cmdscale(apply(d, 1:2, median), k = r, add =
T)$points
}
else {
XL <- cmdscale(apply(d, 1:2, median), k = r)
}
XR <- XL #
#
# Borrowed code from cmdscale follows to calculate
additive constant
# and convert distance to scalarproducts matrices:
#
storage.mode(d) <- "single"
storage.mode(n) <- "integer"
if(add) {
ac <- apply(d, 3, function(x)
{
.Fortran("addc",
x,
dim(x)[1],
ac = single(1))$ac
}
)
}
for(i in 1:m) {
b[, , i] <- .Fortran("sclpro",
x = d[, , i],
n,
single(n),
as.single(ac[i]))$x
}
unscaled.b <- b #
#
# the next line of code scales the scalar product
matrices
# so they each have a sum of squares equal to one.
# This stops individual subjects dominating the
derived
# dimensions.
#
b <- array(apply(b, 3, function(x)
{
(x)/sqrt(sum(x^2))
}
), dim = dim(b)) #
#
# now we rearrange scalar products array into the Z*
and Z**
# matrices described by Carroll and Chang.
# (?check Z2star?)
#
Zstar <- t(matrix(b, nrow = n^2))
Z2star <- matrix(b, nrow = n * m, byrow = T) #
#
# Now we commence the iterative procedure:
#
reps <- 0
while(change > 10^(-10)) {
reps <- reps + 1
G <- kronecker(XL[, 1], XR[, 1])
if(r > 1) {
for(i in 2:r) {
G <- cbind(G, kronecker(XL[, i], XR[, i]))
}
}
tmp <- Zstar %*% G %*% solve(t(G) %*% G)
change <- sum((W - tmp)^2)
W <- tmp
L <- kronecker(W[, 1], XR[, 1])
if(r > 1) {
for(i in 2:r) {
L <- cbind(L, kronecker(W[, i], XL[, i]))
}
}
XR <- t(solve(t(L) %*% L) %*% t(L) %*% Z2star)
R <- kronecker(W[, 1], XR[, 1])
if(r > 1) {
for(i in 2:r) {
R <- cbind(R, kronecker(W[, i], XR[, i]))
}
}
XL <- t(Z2star) %*% R %*% solve(t(R) %*% R)
if(print.iter) {
print(change)
}
if(reps > maxit) {
stop("process failed to converge")
}
}
#
#
# Now we recompute W based on XL rather than XR:
#
G <- kronecker(XL[, 1], XL[, 1])
if(r > 1) {
for(i in 2:r) {
G <- cbind(G, kronecker(XL[, i], XL[, i]))
}
}
W <- Zstar %*% G %*% solve(t(G) %*% G) #
finalW <- W
finalXR <- XR #
#
# Finally, we rescale the final W and XR:
# Something must be wrong, with the scaling if not
with
# the whole thing, because the distance from W to the
origin
# should be approximately the %age of variation in the
# scalar product matrix and it comes out way too low.
#
s2 <- diag(var(XR))
XR <- scale(XR)
W <- t(t(W) * s2)
if(workings) {
list(W = W, points = XR, scalar.product = b,
final.XL = XL, unscaled.W = finalW,
unscaled.points = finalXR, unscaled.b = unscaled.b,
Zstar = Zstar, Z2star = Z2star)
}
else list(W = W, points = XR)
}
__________________________________________________
Do You Yahoo!?
Talk to your friends online with Yahoo! Messenger.
http://im.yahoo.com
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news
|