Skip to Content
Interactive Textbook on Clinical Symptom Research Logo


Home Button

Statistical Models for Prognostication
Author Bio
Introduction
Predictions: Statistical Models
Insight: Statistical Models
Currently selected section: Ingredients: Statistical Models
Theoretical Aspects
Central Concepts
Regression Models
Problems: Regression
Practical Advice
Example 1
Example 2
Chapter 8: Statistical Models for Prognostication: Ingredients of Statistical Models
        

Classes of Models

We now focus on regression models. Two other main classes of statistical models are

  • Classification methods, and
  • Neural networks.

Classification methods include methods such as n-nearest-neighbor methods, and classification trees.

Figure 4.3: Classification Tree
Example of a classification tree as described in caption.
Example of a classification tree. Patients with an acute myocardial
infarction are classified according to age, Killip class and number of
leads with ECG elevation (STE). The probability of 30-day mortality
is shown for each classification, e.g. 20% in those older than 73.7
years. Data are from a sample (n=752) of the GUSTO-I data
(Lee et al., 1995); http://www.eur.nl/fgg/mgz/software.html. Note that
a tree based on the full data set (n=40,830) would become very
complex.

 

 

Classification trees are very attractive in their presentation, but are relatively "data hungry." This is because subgroups are created within every branch of the tree, which leads to reduced sample size in every branch further down the tree. Essentially, interaction effects are assumed, while regression models generally assume no interaction. Another disadvantage is that continuous variables need to be categorized.

Neural networks vary in structure and implementation, but generally include one or more so-called "hidden layers" between predictors and the outcome. Neural networks are very flexible, and naturally allow for nonlinearity and interaction in predictor variables.

The linear and nonlinear regression models that we consider all fall within the class of generalized linear models. These are characterized by a linear regression formula. The relationship between predictors and outcome is non-linear because of a link function between the linear predictor and the outcome, such as the log odds or logit in logistic regression analysis.

Figure 4.1: Logistic Link Function
Graphic depiction of logistic link function, described in text.
Illustration of the logistic link function. The relationship between the
probability of an outcome and the logit of the probability is a characteristic curve. The logit is calculated as: ln(probability/(1-probability)). When the logit is 0, the probability is 50%.

Interestingly, neural networks can be viewed as implementations of statistical models that are either more complex nonlinear models, generalized additive models, or generalized nonlinear models (Hastie and Tibshirani, 1990).

Previous Page