Work is not distributed 'on the fly' #499
Replies: 3 comments
-
EDIT: perhaps this is an issue in I have a similar use case and came here to report a new issue, when I found it had already been reported. The problem you describe is well known, the technical term is "load balancing", see for example the manual for When running under OSes that support forking, you will achieve optimal load balancing like this:
Under MS Windows, which does not support forking, In my use case, I have many jobs that are relatively fast (relative to the time of starting R) and one job that is really slow. On linux, this is not a problem, I just sort the jobs by estimated computation time, and use I don't know the inner workings of future, but I think it should be possible to have persistent workers and efficient load balancing. As I understand the situation (I hope I am wrong though), scheduling = FALSE implies that the workers are discarded when they have finished a single job. Looking forward to suggestions to workaround this. Kind regards, |
Beta Was this translation helpful? Give feedback.
-
Actually, this is not correct. If you use library(doFuture)
registerDoFuture()
plan(multisession, workers = 2)
y <- foreach(x = 1:6, .options.future = list(scheduling = FALSE)) %dopar% { c(x=x, pid=Sys.getpid()) }
do.call(rbind, y)
# x pid
# [1,] 1 28332
# [2,] 2 28363
# [3,] 3 28332
# [4,] 4 28363
# [5,] 5 28332
# [6,] 6 28363 Note how there are only two process IDs: 28332 and 28363. If there would have a new worker started each time, then there would have been six different. |
Beta Was this translation helpful? Give feedback.
-
Oh, that's great! Then |
Beta Was this translation helpful? Give feedback.
-
Consider the following example where all except the first work package take the same amount of time.
The result is that all workers except the one evaluating
f(1)
finish almost simultaneously, while worker nr. 1 lags behind and it's remaining work is not distributed while all other workers are idle:Is this an issue or is there a way to specify that unfinished work should be distributed to all available workers? I observed the phenomenon with the
future.apply
andfurrr
package, so I think it's directly related tofuture
.Beta Was this translation helpful? Give feedback.
All reactions