s-news
[Top] [All Lists]

[S] A little statistical question.

To: s-news@wubios.wustl.edu
Subject: [S] A little statistical question.
From: "Gérald Jean" <Gerald.Jean@spgdag.ca>
Date: Fri, 23 Jul 1999 10:36:42 -0400
Sender: owner-s-news@wubios.wustl.edu
Hi S-users,

Although the implementation will be done in S+, my question  is not related
to S+; but since there is such a bunch of very bright people subscribing to
that list I thought I could test my luck with a statistical question!

I have a large dataset, 170K observations by 20 factors plus the response
variable and a weighting variable.  The data comes from the insurance
industry.  We would like to model the frequency of claims, weighted by the
counts in each cell, by some or all the available factors.

Here is my question:

One of the factors, a four level factor, was available in the past but will
not be available when the model will be used.  Due to restrictions on
building age, percentage of the building's value insured, type of heating,
etc,, not all the policy holders had the opportunity to belong to any one
of the four levels of that factor.  What we would like to do is "correct"
the claim's frequencies for that factor before modeling, i.e. bring all
policy holders on the same level with respect to that factor.

Here is how we thought we might proceed:

1) Pick a subset of the whole population such that every policy holder
belonging to that subset had the opportunity to belong to any one of the
levels of that factor.
2) Using that subset fit a GLM (multiplicative model) to the claim's
frequencies using that factor and all other significant factors.
3) Apply the coefficients of that factor to the weight variable in the
whole population (since the model is multiplicative (log link) it is easy
to bring back the coefficients to the response scale).
4) Recalculate the frequencies by dividing the number of claims by the
"corrected" number of units (the weights) per cell.
5) Re-estimate the model on the whole population, excluding the factor in
question, on those "corrected" frequencies using the original number of
units per cell as weights.

Does this make any sense?

Any and all suggestions are welcome,

P.S. Since I am on vacation next week I'll summarize to the list the first
week of August, when I am back.

Thanks in advance to all,

Gérald Jean
Analyste-conseil (statistiques), Actuariat
télephone            : (418) 835-8839
télecopieur          : (418) 835-5865
courrier électronique: gerald.jean@spgdag.ca

"In God we trust all others must bring data"


-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>
  • [S] A little statistical question., G=E9rald_Jean <=