Why does SciPy return negative p-values for extremely small p-values with the Fisher-exact test?

The Fisher-exact test is a statistical method used in hypothesis testing to determine the likelihood of an observed relationship between two categorical variables. The test is often used in fields such as biology and medicine to determine the significance of an association between a treatment and an outcome.

One common use of the Fisher-exact test is to calculate the p-value, which is a measure of the significance of the association between the two variables. The p-value represents the probability that the relationship observed in the sample data could have occurred by chance. A small p-value (less than 0.05) indicates that the relationship is unlikely to have occurred by chance and is considered statistically significant.

However, there is a known issue with the Fisher-exact test in the scientific computing library, SciPy, where it returns negative p-values for extremely small p-values. This issue arises because of the limitations of floating-point arithmetic in computers.

Computers use a finite number of bits to represent decimal numbers, and this can result in round-off errors when dealing with extremely small numbers. When calculating p-values, the Fisher-exact test in SciPy uses a method that involves logarithmic transformations, which can amplify the round-off errors for extremely small p-values.

As a result, the p-values calculated by the Fisher-exact test in SciPy can become negative for extremely small p-values. This is not a problem with the test itself, but rather a limitation of the numerical method used to calculate the p-values in SciPy.

While negative p-values do not have a meaningful interpretation in statistics, they can be problematic for practitioners who rely on the p-values to make decisions about their data. For example, negative p-values can be misinterpreted as indicating a higher degree of statistical significance than actually exists.

To address this issue, it is recommended to use a more stable numerical method for calculating p-values in SciPy, such as the mid-P method. The mid-P method is a modification of the Fisher-exact test that reduces the impact of round-off errors and provides more accurate p-values for extremely small p-values.

In conclusion, the Fisher-exact test in SciPy can return negative p-values for extremely small p-values due to limitations of floating-point arithmetic in computers. This issue does not affect the validity of the test itself, but it can be misleading for practitioners who rely on the p-values to make decisions about their data. To avoid this issue, it is recommended to use a more stable numerical method, such as the mid-P method, when calculating p-values in SciPy.

Post a Comment

Previous Post Next Post