Skip to Content
Interactive Textbook on Clinical Symptom Research Logo


Home Button

Statistical Models for Prognostication
Author Bio
Introduction
Predictions: Statistical Models
Insight: Statistical Models
Ingredients: Statistical Models
Theoretical Aspects
Central Concepts
Currently selected section: Regression Models
Problems: Regression
Practical Advice
Example 1
Example 2
Chapter 8: Statistical Models for Prognostication: Development of Regression Models
        

Internal validity can be studied with a variety of techniques, as described below.

Table 7.1: Techniques for Studying Internal Validity
Split-sample
A straightforward and fairly popular approach is to randomly split the training data in two parts: one to develop the model, and another to measure its performance. With the split- sample approach, model performance is determined on similar, but independent, data. Common splits are 50:50 or
2/3:1/3.
Cross-validation
A more sophisticated approach is to use cross validation,
which can be seen as an extension of the split-sample
method. With split-half cross-validation, the model is
developed on one randomly drawn half and tested on the
other, and vice versa. The average is taken as an estimate
of performance. Other fractions of subjects may be left out,
e.g. 10% to test a model developed on 90% of the sample.
This procedure is repeated 10 times such that all subjects
have once been used to test the model. To improve the
stability of the cross-validation, the whole procedure can be
repeated several times, taking new random sub-samples.
The most extreme cross-validation procedure is to leave
one subject out at a time, which is equivalent to the jack-
knife technique (Efron and Tibshirani, 1993)
(Efron and Tibshirani, 1997).
Bootstrap validation
The most efficient validation is achieved by the bootstrap
(Efron and Tibshirani, 1997). Bootstrapping replicates the
process of sample generation from an underlying population
by drawing samples with replacement from the original data
set, of the same size as the original data set. Models may
be developed in bootstrap samples and tested in the original
sample to replicate validation in new subjects.

Techniques for Validation

In practice, bootstrap validation may be hampered because not all modeling decisions can be performed in an automatic way. For example, a regrouping of categorical variables may be done on subjective grounds, inspired by the findings in the data. When not all modeling decisions can be systematically replayed, the split-sample approach may be considered, provided that a large validation sample can be kept out. Alternatively, one might ignore these decisions and calculate a bootstrap estimate as an upper limit of expected performance.

QUESTION 7.8

A characteristic of internal validation is that the parts of the data that are kept out of the model development phase (to test models) are kept out:

Selection AAt random.
Selection BAccording to time and place.

 

Previous Page