|
|
 |
| |
Preliminary
Steps
Suppose
that we have individual patient data available in a data set
with information on a number of potential predictors and on
the outcome of interest. Before the actual modeling starts the
following, preliminary, data analysis steps need to be taken:
-
Construct
frequency tables. The distributions of covariables and
of the outcome give an impression of the data under study.
Covariables with a narrow distribution (a small range of observed
values) may be discarded from the analysis. Cross-tables between
covariables and the outcome are not made yet, since observations
from cross-tables relate to the selection of predictors for
the model (for more, see Development
of Regression Models:Selection of Covariables).
-
Study
missing values. Missing values in one or more predictors
are a common problem. Several methods have been described
to handle missing values. These vary from the omission of
patients with missing values from the analysis, to simple
imputation methods (e.g. filling in the mean value, or the
predicted value based on correlations with other predictors),
to multiple imputation methods (where multiple copies of the
data set are made, each imputed with different predicted values).
When an important predictor has many missing values, it may
be sensible to discard it for the analysis, but hard criteria
for when a variable has too many missing values are not available.
-
Decide
on predictive model type. We focus here on regression
models, while certain problems may be better handled with
classification techniques or neural networks. Regression models
have as advantages that the result can be attractively presented
on paper (as opposed to neural networks, where a computer
is necessary), that insight is obtained on the relative weight
of predictors, and that the technique is widely available
in statistical software packages.
Development
of a prognostic model should start with:
|
|