Issue
This Content is from Stack Overflow. Question asked by gaut
I am trying to calculate quantiles for every “slice” of a dataset, in order to get some kind of “confidence intervals” at a 99% level. I manage this with base R, but it is excruciatingly slow. Any idea to speed up, or for a better approach, is welcome.
a <- (1:20000)/100
b <- 20001:40000
speedseq <- data.frame(a, b)
work_quantile <- rep(NA, nrow(speedseq))
myfunc <- function() {
for(i in 1: nrow(speedseq)) {
work_quantile[i] <- quantile(speedseq$b[speedseq$a>=(speedseq$a[i] - 1) &
speedseq$a<=speedseq$a[i]], na.rm = T, probs = 0.99)
if(i%%10000==0) print(round(i/nrow(speedseq),3))
}
mean(is.na(work_quantile))
}
microbenchmark::microbenchmark(myfunc(), times = 1)
Unit: seconds
expr min lq mean median uq max neval
myfunc() 5.185645 5.185645 5.185645 5.185645 5.185645 5.185645 1
Solution
You could parallelize it.
library(parallel)
cl <- makeCluster(detectCores() - 1)
clusterExport(cl, 'speedseq')
r0 <- parSapply(cl, 1:nrow(speedseq), \(i) unname(quantile(speedseq$b[speedseq$a >= (speedseq$a[i] - 1) &
speedseq$a <= speedseq$a[i]], na.rm=T, probs=0.99)))
stopifnot(all.equal(work_quantile, r0))
stopCluster(cl)
Other approaches:
If you want slices of 1, 1:2, 1:3, … 1:nrow, you could do
r1 <- vapply(1:nrow(speedseq), \(x) quantile(speedseq$b[seq.int(1, x)], .99), numeric(1))
If you want 1:100, 2:101, 2:102, …, (nrow – 99):nrow, you could do
r2 <- vapply(1:(nrow(speedseq) - 99), \(x) quantile(speedseq$b[seq.int(x, x + 99)], .99), numeric(1))
If you want just slices of 100 each you could do
r3 <- vapply(seq(1, nrow(speedseq), 100), \(x) quantile(speedseq$b[seq.int(x, x + 99)], .99), numeric(1))
This Question was asked in StackOverflow by gaut and Answered by jay.sf It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.