Skip to Content
Interactive Textbook on Clinical Symptom Research Logo


Home Button

Statistical Models for Prognostication
Author Bio
Introduction
Predictions: Statistical Models
Insight: Statistical Models
Ingredients: Statistical Models
Theoretical Aspects
Central Concepts
Regression Models
Currently selected section: Problems: Regression
Practical Advice
Example 1
Example 2




Chapter 8: Statistical Models for Prognostication: Problems with Regression Models
        

Bias-variance Trade-off

Models with a low bias describe the data under study well. Examples include flexible models such as neural networks, which naturally accommodate non-linear relationships; or regression models with many interaction terms. These models may however have a high variability, which means that they may not validate well in new patients (Ennis et al., 1998).

In contrast, simple models with main effects may have a substantial bias, but may be rather robust in prediction for new patients. An example is the application of Bayes' rule in a rather naïve way, i.e. without taking correlations between predictors into account (Idiot's Bayes). This method is equivalent to the application of univariable logistic regression coefficients for prediction. Empirical results, especially discrimination, were favorable in some case studies (Spiegelhalter, 1986).

Predictive modeling may be seen as a balancing act of bias versus variance. The sample size of the data set is of paramount importance. Especially in small data sets, information from outside the data under study, such as findings in other studies and clinical knowledge, is important to guide model development.

QUESTION 8.4

Which model has the lowest bias in describing the data under study?

Selection AA pre-specified model with main effects for 3 predictors
Selection BA pre-specified model with main effects for 3 predictors, supplemented with 2 important interaction terms
Selection CA stepwise selected model, consisting of the 3 pre-specified predictors and the 2 important interaction terms, plus 2 predictors from a set of 10 other candidate covariables

Previous Page