[SOLVED] R fast quantile calculation for confidence intervals

Issue

This Content is from Stack Overflow. Question asked by gaut

I am trying to calculate quantiles for every “slice” of a dataset, in order to get some kind of “confidence intervals” at a 99% level. I manage this with base R, but it is excruciatingly slow. Any idea to speed up, or for a better approach, is welcome.

a <- (1:20000)/100
b <- 20001:40000

speedseq <- data.frame(a, b)
work_quantile <- rep(NA, nrow(speedseq))
    
myfunc <- function() {
  

  for(i in 1: nrow(speedseq)) {
    work_quantile[i] <- quantile(speedseq$b[speedseq$a>=(speedseq$a[i] - 1) & 
                                                       speedseq$a<=speedseq$a[i]], na.rm = T, probs = 0.99)
    if(i%%10000==0) print(round(i/nrow(speedseq),3))
  }
  mean(is.na(work_quantile))

}

microbenchmark::microbenchmark(myfunc(), times = 1)
   Unit: seconds
     expr      min       lq     mean   median       uq      max neval
 myfunc() 5.185645 5.185645 5.185645 5.185645 5.185645 5.185645     1



Solution

You could parallelize it.

library(parallel)

cl <- makeCluster(detectCores() - 1)
clusterExport(cl, 'speedseq')

r0 <- parSapply(cl, 1:nrow(speedseq), \(i) unname(quantile(speedseq$b[speedseq$a >= (speedseq$a[i] - 1) & 
                                                    speedseq$a <= speedseq$a[i]], na.rm=T, probs=0.99)))
stopifnot(all.equal(work_quantile, r0))

stopCluster(cl)

Other approaches:

If you want slices of 1, 1:2, 1:3, … 1:nrow, you could do

r1 <- vapply(1:nrow(speedseq), \(x) quantile(speedseq$b[seq.int(1, x)], .99), numeric(1))

If you want 1:100, 2:101, 2:102, …, (nrow – 99):nrow, you could do

r2 <- vapply(1:(nrow(speedseq) - 99), \(x) quantile(speedseq$b[seq.int(x, x + 99)], .99), numeric(1))

If you want just slices of 100 each you could do

r3 <- vapply(seq(1, nrow(speedseq), 100), \(x) quantile(speedseq$b[seq.int(x, x + 99)], .99), numeric(1))


This Question was asked in StackOverflow by gaut and Answered by jay.sf It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?