When we conduct a hypothesis test, we typically end up with a p-value that we compare to some alpha level to decide if we should reject or fail to reject the null hypothesis.

For example, we may conduct a two sample t-test using an alpha level of 0.05 to determine if two population means are equal. Suppose we conduct the test and end up with a p-value of 0.0023. In this case, we would reject the null hypothesis that the two population means are equal since the p-value is less than our chosen alpha level.

P-values are a common metric used to reject or fail to reject some hypothesis, but there is another metric that can also be used: **Bayes Factor**.

Bayes Factor is defined as the ratio of the likelihood of one particular hypothesis to the likelihood of another hypothesis. Typically it is used to find the ratio of the likelihood of an alternative hypothesis to a null hypothesis:

Bayes Factor =likelihood of data given H_{A}/ likelihood of data given H_{0}

For example, if the Bayes Factor is 5 then it means the alternative hypothesis is 5 times as likely as the null hypothesis given the data.

Conversely, if the Bayes Factor is 1/5 then it means that the null hypothesis is 5 times as likely as the alternative hypothesis given the data.

Similar to p-values, we can use thresholds to decide when we should reject a null hypothesis. For example, we may decide that a Bayes Factor of 10 or higher is strong enough evidence to reject the null hypothesis.

Lee and Wagenmaker proposed the following interpretations of Bayes Factor in a 2015 paper:

Bayes Factor |
Interpretation |
---|---|

> 100 | Extreme evidence for alternative hypothesis |

30 – 100 | Very strong evidence for alternative hypothesis |

10 – 30 | Strong evidence for alternative hypothesis |

3 – 10 | Moderate evidence for alternative hypothesis |

1 – 3 | Anecdotal evidence for alternative hypothesis |

1 | No evidence |

1/3 – 1 | Anecdotal evidence for null hypothesis |

1/3 – 1/10 | Moderate evidence for null hypothesis |

1/10 – 1/30 | Strong evidence for null hypothesis |

1/30 – 1/100 | Very strong evidence for null hypothesis |

Extreme evidence for null hypothesis |

**Bayes Factor vs. P-Values**

Bayes Factor and p-values have different interpretations.

**P-values:**

A p-value is interpreted as the probability of obtaining results as extreme as the observed results of a hypothesis test, assuming that the null hypothesis is correct.

For example, suppose you conduct a two sample t-test to determine if two population means are equal. If the test results in a p-value of 0.0023, this means the probability of obtaining this result is just **0.0023 **if the two population means are actually equal. Because this value is so small, we reject the null hypothesis and conclude that we have sufficient evidence to say that the two population means aren’t equal.

**Bayes Factor:**

Bayes Factor is interpreted as the ratio of the likelihood of the observed data occurring under the alternative hypothesis to the likelihood of the observed data occurring under the null hypothesis.

For example, suppose you conduct a hypothesis test and end up with a Bayes Factor of 4. This means the alternative hypothesis is 4 times as likely as the null hypothesis given the data that you actually observed.

**Conclusion**

Some statisticians believe that the Bayes Factor offers an advantage over p-values because it allows you to quantify the evidence for and against two competing hypotheses. For example, evidence can be quantified in favor of or against a null hypothesis, which can’t be done using a p-value.

No matter which approach you use – Bayes Factor or p-values – you still have to decide on a cut-off value if you wish to reject or fail to reject some null hypothesis.

For example, in the table above we saw that a Bayes Factor of 9 would be classified as “moderate evidence for the alternative hypothesis” while a Bayes Factor of 10 would be classified as “strong evidence for the alternative hypothesis.”

In this sense, the Bayes Factor suffers from the same problem as a p-value of 0.06 being considered “not significant” while a p-value of 0.05 may be considered significant.

**Further Reading:**

An Explanation of P-Values and Statistical Significance

A Simple Explanation of Statistical vs. Practical Significance