Did you know...
April 26, 2006 4:03...that there are (more than) 21 algorithms to compute the factorial of a number?
I discovered this fact today, while implementing the Fisher's exact test for a 2x2 contingency table.
(For those interested: Fisher's exact test for a 2x2 contingency table, like the more famous Chi-square test, is used for the analysis of categorical frequency data: its aim is to determine whether the two categorical variables are associated.
We use it to extract the most significant genes from a gene set experiment: the two categories are "clue present/clue absent", while the two data sets are the control set and the experiment set)
Fisher exact test requires the computation of 9 factorials for each iteration, and one of them is on the total. Typically, applets and code you can found on the internet can handle data up to a total of about 100-200 before giving up due to overflow problems. Even code that uses libraries for arithmetic with arbitrary precision can handle numbers up to 6000-12000 before becoming unsuable.
Here, we are facing two problems:
I discovered this fact today, while implementing the Fisher's exact test for a 2x2 contingency table.
(For those interested: Fisher's exact test for a 2x2 contingency table, like the more famous Chi-square test, is used for the analysis of categorical frequency data: its aim is to determine whether the two categorical variables are associated.
We use it to extract the most significant genes from a gene set experiment: the two categories are "clue present/clue absent", while the two data sets are the control set and the experiment set)
Fisher exact test requires the computation of 9 factorials for each iteration, and one of them is on the total. Typically, applets and code you can found on the internet can handle data up to a total of about 100-200 before giving up due to overflow problems. Even code that uses libraries for arithmetic with arbitrary precision can handle numbers up to 6000-12000 before becoming unsuable.
Here, we are facing two problems:
- numbers (integers and doubles) provided by the machine (physichal CPU or VM) have a finite precision and extension
- the simple and dumb algorithm (the one that performs all the multiplications) is very SLOW