When there is more work than workers, Dask has to decide which tasks to prioritize over others. Dask can determine these priorities automatically to optimize performance, or a user can specify priorities manually according to their needs.
Dask uses the following priorities, in order:
User priorities: A user defined priority is provided by the
priority=keyword argument to functions like
map(). Tasks with higher priorities run before tasks with lower priorities with the default priority being zero.
future = client.submit(func, *args, priority=10) # high priority task future = client.submit(func, *args, priority=-10) # low priority task df = df.persist(priority=10) # high priority computation
Priorities can also be specified using the dask annotations machinery:
with dask.annotate(priority=10): future = client.submit(func, *args) # high priority task with dask.annotate(priority=-10): future = client.submit(func, *args) # low priority task with dask.annotate(priority=10): df = df.persist() # high priority computation
First in first out chronologically: Dask prefers computations that were submitted early. Because users can submit computations asynchronously it may be that several different computations are running on the workers at the same time. Generally Dask prefers those groups of tasks that were submitted first.
As a nuance, tasks that are submitted within a close window are often considered to be submitted at the same time.
x = x.persist() # submitted first and so has higher priority # wait a while x = x.persist() # submitted second and so has lower priority
In this case “a while” depends on the kind of computation. Operations that are often used in bulk processing, like
persistconsider any two computations submitted in the same sixty seconds to have the same priority. Operations that are often used in real-time processing, like
mapare considered the same priority if they are submitted within the 100 milliseconds of each other. This behavior can be controlled with the
x = x.persist() # wait one minute x = x.persist(fifo_timeout='10 minutes') # has the same priority a = client.submit(func, *args) # wait no time at all b = client.submit(func, *args, fifo_timeout='0ms') # is lower priority
Graph Structure: Within any given computation (a compute or persist call) Dask orders tasks in such a way as to minimize the memory-footprint of the computation. This is discussed in more depth in the task ordering documentation.
If multiple tasks each have exactly the same priorities outlined above, then the order in which tasks arrive at a worker, in a last in first out manner, is used to determine the order in which tasks run.