Prometheus monitoring

Prometheus monitoring#

Prometheus is a widely popular tool for monitoring and alerting a wide variety of systems. A distributed cluster offers a number of Prometheus metrics if the prometheus_client package is installed. The metrics are exposed in Prometheus’ text-based format at the /metrics endpoint on both schedulers and workers.

Available metrics#

Apart from the metrics exposed per default by the prometheus_client, schedulers and workers expose a number of Dask-specific metrics.

Scheduler metrics#

The scheduler exposes the following metrics about itself:

dask_scheduler_clients: Number of clients connected
dask_scheduler_client_connections_added_total: Total number of client connections added to the scheduler

Note

This metric does not count distinct clients. If a client disconnects and reconnects later on, it will be counted twice.
dask_scheduler_client_connections_removed_total: Total number of client connections removed from the scheduler

Note

This metric does not count distinct clients. If a client disconnects, then reconnects and disconnects again, it will be counted twice.
dask_scheduler_desired_workers: Number of workers scheduler needs for task graph
dask_scheduler_gil_contention_seconds_total: Value representing cumulative total of potential GIL contention, in the form of cumulative seconds during which any thread held the GIL locked. Other threads may or may not have been actually trying to acquire the GIL in the meantime.

Note

Requires gilknocker to be installed, and distributed.admin.system-monitor.gil.enabled configuration to be set.
dask_scheduler_gc_collection_seconds_total: Total time spent on garbage collection

Note

Due to measurement overhead, this metric only measures time spent on garbage collection for generation=2
dask_scheduler_workers: Number of workers known by scheduler
dask_scheduler_workers_added_total: Total numbers of workers added to the scheduler
dask_scheduler_workers_removed_total: Total number of workers removed from the scheduler
dask_scheduler_last_time_total: Cumulative SystemMonitor time
dask_scheduler_tasks: Number of tasks known by scheduler
dask_scheduler_tasks_suspicious_total: Total number of times a task has been marked suspicious
dask_scheduler_tasks_forgotten_total: Total number of processed tasks no longer in memory and already removed from the scheduler job queue.

Note

Task groups on the scheduler which have all tasks in the forgotten state are not included.
dask_scheduler_tasks_compute_seconds_total: Total time (per prefix) spent computing tasks
dask_scheduler_tasks_transfer_seconds_total: Total time (per prefix) spent transferring
dask_scheduler_tasks_output_bytes: Current size of in memory tasks, broken down by task prefix, without duplicates. Note that when a task output is transferred between worker, you’ll typically end up with a duplicate, so this measure is going to be lower than the actual cluster-wide managed memory. See also dask_worker_memory_bytes, which does count duplicates.
dask_scheduler_task_groups: Number of task groups known by scheduler
dask_scheduler_prefix_state_totals_total: Accumulated count of task prefix in each state
dask_scheduler_tick_count_total: Total number of ticks observed since the server started
dask_scheduler_tick_duration_maximum_seconds: Maximum tick duration observed since Prometheus last scraped metrics. If this is significantly higher than what’s configured in distributed.admin.tick.interval (default: 20ms), it highlights a blocked event loop, which in turn hampers timely task execution and network comms.

Semaphore metrics#

The following metrics about Semaphore objects are available on the scheduler:

dask_semaphore_max_leases: Maximum leases allowed per semaphore.

Note

This will be constant for each semaphore during its lifetime.
dask_semaphore_active_leases: Amount of currently active leases per semaphore
dask_semaphore_pending_leases: Amount of currently pending leases per semaphore
dask_semaphore_acquire_total: Total number of leases acquired per semaphore
dask_semaphore_release_total: Total number of leases released per semaphore

Note

If a semaphore is closed while there are still leases active, this count will not equal dask_semaphore_acquire_total after execution.
dask_semaphore_average_pending_lease_time_s: Exponential moving average of the time it took to acquire a lease per semaphore

Note

This only includes time spent on scheduler side, it does not include time spent on communication.

Note

This average is calculated based on order of leases instead of time of lease acquisition.

Work-stealing metrics#

If Work Stealing is enabled, the scheduler exposes these metrics:

dask_stealing_request_count_total: Total number of stealing requests
dask_stealing_request_cost_total: Total cost of stealing requests

Worker metrics#

The worker exposes these metrics about itself:

dask_worker_tasks

Number of tasks at worker

dask_worker_threads

Number of worker threads

dask_worker_gil_contention_seconds_total

Value representing cumulative total of potential GIL contention, in the form of cumulative seconds during which any thread held the GIL locked. Other threads may or may not have been actually trying to acquire the GIL in the meantime.

Note

Requires gilknocker to be installed, and distributed.admin.system-monitor.gil.enabled configuration to be set.

dask_worker_gc_collection_seconds_total

Total time spent on garbage collection

Note

Due to measurement overhead, this metric only measures time spent on garbage collection for generation=2

dask_worker_latency_seconds

Latency of worker connection

dask_worker_memory_bytes

Memory breakdown

dask_worker_transfer_incoming_bytes

Total size of open data transfers from other workers

dask_worker_transfer_incoming_count

Number of open data transfers from other workers

dask_worker_transfer_incoming_count_total

Total number of data transfers from other workers since the worker was started

dask_worker_transfer_outgoing_bytes

Size of open data transfers to other workers

dask_worker_transfer_outgoing_bytes_total

Total size of open data transfers to other workers since the worker was started

dask_worker_transfer_outgoing_count

Number of open data transfers to other workers

dask_worker_transfer_outgoing_count_total

Total number of data transfers to other workers since the worker was started

dask_worker_tick_count_total

Total number of ticks observed since the server started

dask_worker_tick_duration_maximum_seconds

Maximum tick duration observed since Prometheus last scraped metrics. If this is significantly higher than what’s configured in distributed.admin.tick.interval (default: 20ms), it highlights a blocked event loop, which in turn hampers timely task execution and network comms.

dask_worker_spill_bytes_total

Total size of spilled/unspilled data since the worker was started; in other words, cumulative disk I/O that is attributable to spill activity. This includes a memory_read measure, which allows to derive cache hit ratio:

cache hit ratio = memory_read / (memory_read + disk_read)

dask_worker_spill_count_total

Total number of spilled/unspilled keys since the worker was started; in other words, cumulative disk accesses that are attributable to spill activity. This includes a memory_read measure, which allows to derive cache hit ratio:

cache hit ratio = memory_read / (memory_read + disk_read)

dask_worker_spill_time_seconds_total

Total amount of time that was spent spilling/unspilling since the worker was started, broken down by activity: (de)serialize, (de)compress, (un)spill.

If the crick package is installed, the worker additionally exposes:

dask_worker_tick_duration_median_seconds: Median tick duration at worker
dask_worker_task_duration_median_seconds: Median task runtime at worker
dask_worker_transfer_bandwidth_median_bytes: Bandwidth for transfer at worker