I have 2 input variables:
- a vector of p-values (p) with N elements (unsorted)
- and N x M matrix with p-values obtained by random permutations (pr) with M iterations. N is quite large, 10K to 100K or more. M let’s say 100.
I’m estimating the False Discovery Rate (FDR) for each element of p representing how many p-values from random permutations will pass if the current p-value (from p) will be the threshold.
I wrote the function with ARRAYFUN, but it takes lot of time for large N (2 min for N=20K), comparable to for-loop.
function pfdr = fdr_from_random_permutations(p, pr)
%# ... skipping arguments checks
pfdr = arrayfun( @(x) mean(sum(pr<=x))./sum(p<=x), p);
Any ideas how to make it faster?
Comments about statistical issues here are also welcome.
The test data can be generated as p = rand(N,1); pr = rand(N,M);.
Well, the trick was indeed sorting the vectors. I give credit to @EgonGeerardyn for that. Also, there is no need to use
mean. You can just divide everything afterwards byM. Whenpis sorted, finding the amount of values that are less than currentx, is just a running index.pris a more interesting case – I used a running index calledplaceto discover how many elements are less thanx.Edit(2): Here is the fastest version I come up with:
And the benchmark results for
N = 10000/4 ; M = 100/4:and for
N = 10000 ; M = 100;