# Issue

This Content is from Stack Overflow. Question asked by gaut

I am trying to calculate quantiles for every “slice” of a dataset, in order to get some kind of “confidence intervals” at a 99% level. I manage this with base R, but it is excruciatingly slow. Any idea to speed up, or for a better approach, is welcome.

``````a <- (1:20000)/100
b <- 20001:40000

speedseq <- data.frame(a, b)
work_quantile <- rep(NA, nrow(speedseq))

myfunc <- function() {

for(i in 1: nrow(speedseq)) {
work_quantile[i] <- quantile(speedseq\$b[speedseq\$a>=(speedseq\$a[i] - 1) &
speedseq\$a<=speedseq\$a[i]], na.rm = T, probs = 0.99)
if(i%%10000==0) print(round(i/nrow(speedseq),3))
}
mean(is.na(work_quantile))

}

microbenchmark::microbenchmark(myfunc(), times = 1)
Unit: seconds
expr      min       lq     mean   median       uq      max neval
myfunc() 5.185645 5.185645 5.185645 5.185645 5.185645 5.185645     1

``````

# Solution

You could parallelize it.

``````library(parallel)

cl <- makeCluster(detectCores() - 1)
clusterExport(cl, 'speedseq')

r0 <- parSapply(cl, 1:nrow(speedseq), \(i) unname(quantile(speedseq\$b[speedseq\$a >= (speedseq\$a[i] - 1) &
speedseq\$a <= speedseq\$a[i]], na.rm=T, probs=0.99)))
stopifnot(all.equal(work_quantile, r0))

stopCluster(cl)
``````

Other approaches:

If you want slices of 1, 1:2, 1:3, … 1:nrow, you could do

``````r1 <- vapply(1:nrow(speedseq), \(x) quantile(speedseq\$b[seq.int(1, x)], .99), numeric(1))
``````

If you want 1:100, 2:101, 2:102, …, (nrow – 99):nrow, you could do

``````r2 <- vapply(1:(nrow(speedseq) - 99), \(x) quantile(speedseq\$b[seq.int(x, x + 99)], .99), numeric(1))
``````

If you want just slices of 100 each you could do

``````r3 <- vapply(seq(1, nrow(speedseq), 100), \(x) quantile(speedseq\$b[seq.int(x, x + 99)], .99), numeric(1))
``````

``` This Question was asked in  StackOverflow by  gaut and Answered by jay.sf It is licensed under the terms of
CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.```