Home Home  Article Index Article Index  
GuruPedia  

Selection bias

Selection bias is the error of distorting a statistical analysis by pre- or post-selecting the samples. Typically this causes measures of statistical significance to appear much stronger than they are, but it is also possible to cause completely illusory artifacts. Selection bias can be the result of scientific fraud which manipulate data directly, but more often is either unconscious or due to biases in the instruments used for observation. For example, astronomical observations will typically find more blue galaxies than red ones simply because most instruments are more sensitive to blue light than red light.

There are many types of possible selection bias, including:

Time:

  • Selecting end-points of a time series. For example, to maximise a claimed trend, you could start the time series at an unusually low year, and end on a high one.
  • Early termination of a trial at a time when its results seem particularly significant. (Typically couched in ethical terms.)

Space:

  • Selecting spacial regions, including grid size or zero points (see stratified sampling, cluster sampling). For example, to "prove" an association between cancer and a particular locality, you could adjust the size, orientation and alignment of grid cells until most of the local cancers fit in the same grid as the locality. Then round off the dimensions slightly (so they don't look quite so contrived), and compare cancer rates in various grid cells using the tests designed for randomly assigned grids.

Data:

  • Rejection of "bad" data on arbitrary grounds, instead of according to previously stated or generally agreed criteria

Participants:

  • Pre-screening of trial participants, or advertising for volunteers within particular groups. For example to "prove" that smoking doesn't affect fitness, advertise for both at the local fitness centre, but advertise for smokers during the advanced aerobics class, and for non-smokers during the weight loss sessions.
  • Discounting trial subjects/tests that did not run to completion. For example, in a test of a dieting program, the researcher may simply reject everyone who drops out of the trial. But most of those who drop out are those for whom it wasn't working.

Studies:

  • Selection of which studies to include in a meta-analysis
  • Performing repeated experiments and reporting only the most favourable results. (Perhaps relabelling lab records of other experiments as "calibration tests", "instrumentation errors" or "preliminary surveys".)
  • Presenting the most significant result of a data dredge as if it was a single experiment. (Which is logically the same as the previous item, but curiously is seen as much less dishonest.)

Selection bias is closely related to:

  • sample bias, a selection bias produced by an accidental bias in the sampling technique, as against deliberate or unconscious manipulation.
  • publication bias or reporting bias, the distortion produced in community perception or meta-analyses by not publishing uninteresting (usually negative) results, or results which go against the experimenter's prejudices, a sponsor's interests, or community expectations.
  • confirmation bias, the distortion produced by experiments that are designed to seek confirmatory evidence instead of trying to disprove the hypothesis.


See also: bias (statistics)

Popular Topics

This article is from Wikipedia. All text is available under the terms of the GNU Free Documentation License.  For the live article, click here.

Privacy