Familywise error rate

In statistics, familywise error rate (FWER) is the probability of making one or more false discoveries, or type I errors, among all the hypotheses when performing multiple hypotheses tests.

Definitions

Classification of m hypothesis tests

Suppose we have m null hypotheses, denoted by: H₁, H₂, ..., H_m.
Using a statistical test, we reject the null hypothesis if the test is declared significant. We do not reject the null hypothesis if the test is non-significant.
Summing the test results over H_i will give us the following table and related random variables:

	Null hypothesis is True	Alternative hypothesis is True	Total
Declared significant	$V$	$S$	$R$
Declared non-significant	$U$	$T$	$m - R$
Total	$m_0$	$m - m_0$	$m$

$m_0$ is the number of true null hypotheses, an unknown parameter
$m - m_0$ is the number of true alternative hypotheses
$V$ is the number of false positives (Type I error)
$S$ is the number of true positives
$T$ is the number of false negatives (Type II error)
$U$ is the number of true negatives
$R$ is the number of rejected null hypotheses

$R$ is an observable random variable, while $S$ , $T$ , $U$ , and $V$ are unobservable random variables.

The FWER

The FWER is the probability of making even one type I error In the family,

\mathrm{FWER} = \Pr(V \ge 1), \,

or equivalently,

\mathrm{FWER} = 1 -\Pr(V = 0).

Thus, by assuring $\mathrm{FWER} \le \alpha\,\! \,$ , the probability of making even one type I error in the family is controlled at level $\alpha\,\!$ .

A procedure controls the FWER in the weak sense if the FWER control at level $\alpha\,\!$ is guaranteed only when all null hypotheses are true (i.e. when $m_0$ = $m$ so the global null hypothesis is true).

A procedure controls the FWER in the strong sense if the FWER control at level $\alpha\,\!$ is guaranteed for any configuration of true and non-true null hypotheses (including the global null hypothesis).

Different approaches

FWER procedures (such as the Bonferroni correction) exert a more stringent control over false discovery compared to False discovery rate (FDR) controlling procedures. FWER controlling seek to reduce the probability of even one false discovery, as opposed to the expected proportion of false discoveries. Thus, FDR procedures have greater power at the cost of increased rates of type I errors, i.e., rejecting null hypotheses of no effect when they should be accepted.^[1]

The concept of a family

Within the statistical framework, there are several definitions for the term "family":

First of all, a distinction must be made between exploratory data analysis and confirmatory data analysis: for exploratory analysis – the family constitutes all inferences made and those that potentially could be made, whereas in the case of confirmatory analysis, the family must include only inferences of interest specified prior to the study.

Hochberg & Tamhane (1987) define "family" as "any collection of inferences for which it is meaningful to take into account some combined measure of error".^[2]

According to Cox (1982), a set of inferences should be regarded a family:

To take into account the selection effect due to data dredging
To ensure simultaneous correctness of a set of inferences as to guarantee a correct overall decision

To summarize, a family could best be defined by the potential selective inference that is being faced: A family is the smallest set of items of inference in an analysis, interchangeable about their meaning for the goal of research, from which selection of results for action, presentation or highlighting could be made (Benjamini).

History

Tukey first coined the term experimentwise error rate and "per-experiment" error rate for the error rate that the researcher should use as a control level in a multiple hypothesis experiment.

Since not all tests done in an experiment should constitute a single family (for example: in a multiple-stage experiment, a separate family might be used for each stage), the terminology was changed (by Miller) to "family-wise error-rate" (and was later adopted by Tukey as "batchwise" or "per batch").

Controlling procedures

The following is a concise review of some of the "old and trusted" solutions that ensure strong level $\alpha$ FWER control, followed by some newer solutions. A good review of many of the available methods can be found in the book "Multiple comparison procedures" (Wiley, 1987), by Hochberg and Tamhane.

The Bonferroni procedure

Denote by $p_{i}$ the p-value for testing $H_{i}$
reject $H_{i}$ if $p_{i} \leq \frac{\alpha}{m}$

The Šidák procedure

Testing each hypothesis at level $\alpha_{SID} = 1-(1-\alpha)^\frac{1}{m}$ is Sidak's multiple testing procedure.
This test is more powerful than Bonferroni but the gain is small.

Tukey's procedure

Tukey's procedure is only applicable for pairwise comparisons.
It assumes independence of the observations being tested, as well as equal variation across observations (homoscedasticity).
The procedure calculates for each pair the studentized range statistic: $\frac {Y_{A}-Y_{B}} {SE}$ where $Y_{A}$ is the larger of the two means being compared, $Y_{B}$ is the smaller, and $SE$ is the standard error of the data in question.
Tukey's test is essentially a Student's t-test, except that it corrects for family-wise error-rate.

some newer solutions for strong level $\alpha$ FWER control:

Holm's step-down procedure (1979)

Start by ordering the p-values (from lowest to highest) $P_{(1)} \ldots P_{(m)}$ and let the associated hypotheses be $H_{(1)} \ldots H_{(m)}$

Let $R$ be the smallest $k$ such that $P_{(k)} > \frac{\alpha}{m+1-k}$

Reject the null hypotheses $H_{(1)} \ldots H_{(R-1)}$ . If $R = 1$ then none of the hypotheses are rejected.

This procedure is uniformly better than Bonferroni's ^[3]
It is worth noticing here that the reason why this procedure controls the family-wise error rate for all the m hypotheses at level α in the strong sense, is because it is essentially a closed testing procedure. As such, each intersection is tested using the simple Bonferroni test.

Hochberg's step-up procedure (1988)

Hochberg's step-up procedure (1988) is performed using the following steps:^[4]

Start by ordering the p-values (from lowest to highest) $P_{(1)} \ldots P_{(m)}$ and let the associated hypotheses be $H_{(1)} \ldots H_{(m)}$

For a given $\alpha$ , let $R$ be the largest $k$ such that $P_{(k)} \leq \frac{\alpha}{m+1-k}$

Reject the null hypotheses $H_{(1)} \ldots H_{(R)}$

Hochberg's procedure is more powerful than Holms'. Nevertheless, while Holm’s is a closed testing procedure (and thus, like Bonferroni, has no restriction on the joint distribution of the test statistics), Hochberg’s is based on the Simes test, so it holds only under non-negative dependence.

Dunnett's correction

Charles Dunnett (1955, 1966) described an alternative alpha error adjustment when k groups are compared to the same control group. Now known as Dunnett's test, this method is less conservative than the Bonferroni adjustment.

Scheffé's method

Lua error in package.lua at line 80: module 'strict' not found.

Closed testing procedure

Closed testing procedures control the familywise type I error rate, if in the closed testing procedure all intersection hypotheses are tested using valid local level α tests. Closed testing procedures are a flexible general class of testing procedures that include e.g. the Bonferroni procedure or Holm's step-down procedure.

Resampling procedures

The procedures of Bonferroni and Holm control the FWER under any dependence structure of the p-values (or equivalently the individual test statistics). Essentially, this is achieved by assuming a `worst-case' dependence structure (which is close to an assumption of independence for all practical purposes). But such an approach generally leads to a loss of power. To give an extreme example, when all the p-values are the same (as in a case of perfect dependence), the cutoff value for the Bonferroni procedure can be taken to be simply $\alpha$ instead of $\alpha/m$ .

It is therefore of interest to account for the true dependence structure of the p-values (or the individual test statistics) in order to derive more powerful procedures. This can be achieved by applying resampling methods, such as bootstrapping and permutations methods. The procedure of Westfall and Young (1993) requires a certain condition that does not always hold in practice (namely, subset pivotality).^[5] The procedures of Romano and Wolf (2005a,b) dispense with this condition and are thus more generally valid.^[6]^[7]

Other procedures

Other advanced procedures that ensure strong level $\alpha$ FWER control include the maximum modulus test.

It should also be noted that there are alternatives to the familywise error rate, such as the false discovery rate, which was defined by Benjamini and Hochberg in 1995.

References

↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.

External links

Large-scale Simultaneous Inference – Syllabus, notes, and homework from Efron's course at Stanford. Includes PDFs for each chapter of his book.

[1] Lua error in package.lua at line 80: module 'strict' not found.

[2] Lua error in package.lua at line 80: module 'strict' not found.

[Aickin1996-3] Lua error in package.lua at line 80: module 'strict' not found.

[Hochberg1988-4] Lua error in package.lua at line 80: module 'strict' not found.

[5] Lua error in package.lua at line 80: module 'strict' not found.

[Romano_and_Wolf_2005a-6] Lua error in package.lua at line 80: module 'strict' not found.

[Romano_and_Wolf_2005b-7] Lua error in package.lua at line 80: module 'strict' not found.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Familywise error rate

Contents

Definitions

Classification of m hypothesis tests

The FWER

Different approaches

The concept of a family

History

Controlling procedures

The Bonferroni procedure

The Šidák procedure

Tukey's procedure

Holm's step-down procedure (1979)

Hochberg's step-up procedure (1988)

Dunnett's correction

Scheffé's method

Closed testing procedure

Resampling procedures

Other procedures

See also

References

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools