s-news
[Top] [All Lists]

Re: statistical terminology controversy

To: "Lambert.Winnie" <lambert.winnie@ensco.com>
Subject: Re: statistical terminology controversy
From: Tony Plate <tplate@blackmesacapital.com>
Date: Wed, 18 May 2005 15:54:00 -0600
Cc: S-PLUS Newsgroup <s-news@lists.biostat.wustl.edu>
In-reply-to: <8986151694190742869D08450EE4DCDE5C2557@amu-exch.ensco.win>
References: <8986151694190742869D08450EE4DCDE5C2557@amu-exch.ensco.win>
User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)
In machine learning, datasets are often split into three partitions.

The first is used for fitting models, and is often referred to as the "training" set.

The second is used for testing performance of fitted models, and it's common practice to iterate a fit-test process multiple times. This set is sometimes called the "validation" set.

The third is used for testing the final selected version of the fitted model. It's sometimes called the "test" set. It should be used only once (at least should have no feedback to the model construction or fitting process).

Lambert.Winnie wrote:
*This is NOT an S-LUS-specific question,* just letting you know so you don’t have to read any further if not interested in anything non-S-PLUS.

There is a bit of a controversy in my office concerning specific statistical terminology. I developed a set of logistic regression equations that calculate the probability of lightning occurrence for the day using a 15-year data set of several observation types. I stratified the data into two sets: one was used to create the equations, and the other was used to test the equations’ performance. In my field, these are commonly called the ‘dependent’ and ‘independent’ data sets, respectively.

One of us insists that the common terminology be used, the other says the data sets should be called ‘development’ and ‘testing’ since that is what they are used for, and since the terms ‘dependent’ and ‘independent’ refer to other issues in statistics.

Any statistics expert willing to jump into the fray is welcome. There is no money riding on this, only pride.



<Prev in Thread] Current Thread [Next in Thread>