Tuesday, June 17, 2008

An offset in Response-Variable Type

Response-variable type

The following response (y-variable) types are commonly encountered:

ñ ‘continuous’ (measurements etc.)
ñ counts (of events etc.)
ñ categorical
ñ binary
ñ nominal
ñ ordinal
ñ durations (survival or event-history data)

Mixture of continuous and discrete (e.g., where response is
either zero or a positive amount of something) sometimes occurs, and demands special care.

Response type: continuous

If a model is required for E(y), consider GLM with suitably-chosen link function.
Alternatively, use a linear model, possibly after a non-linear transformation of y.

GLM has advantage of allowing variance to depend on mean in a specified way. For example, with homogeneous multiplicative errors, variance = [E(y)]2.

In a GLM (or GAM) the link function is chosen to achieve linearity (additivity) of the right hand side. Often (but not necessarily) this means linking the mean in
such a way that g[E(y)] can take any real value. For example, if E(y) > 0, g(μ) = log μ will often be a candidate.

Response type: counts

e.g., numbers of arrests made by different police forces in different time periods.
Interest is most often in the rate of occurrence per unit of exposure, where ‘exposure’ might be amount of time, population at risk, person-hours of effort, or a composite.

Most natural starting point is a Poisson model with log link:
yi  Poisson(μi), with, say,
log μi = log ti + 0 + 1xi + 2zi

where ti is the known exposure quantity for the ith count.

The term log ti here, with no unknown coefficient attached to it, is called an offset. It ensures that the other effects are all interpretable as rate-multipliers.

Source: http://springschool.politics.ox.ac.uk/archive/index.asp