Prometheus MonitoringΒΆ

Prometheus is a widely popular tool for monitoring and alerting a wide variety of systems. Dask.distributed exposes scheduler and worker metrics in a prometheus text based format. Metrics are available at http://scheduler-address:8787/metrics.

Available metrics are as following

Metric name Description Scheduler Worker
python_gc_objects_collected_total Objects collected during gc. Yes Yes
python_gc_objects_uncollectable_total Uncollectable object found during GC. Yes Yes
python_gc_collections_total Number of times this generation was collected. Yes Yes
python_info Python platform information. Yes Yes
dask_scheduler_workers Number of workers connected. Yes  
dask_scheduler_clients Number of clients connected. Yes  
dask_scheduler_tasks Number of tasks at scheduler. Yes  
dask_worker_tasks Number of tasks at worker.   Yes
dask_worker_connections Number of task connections to other workers.   Yes
dask_worker_threads Number of worker threads.   Yes
dask_worker_latency_seconds Latency of worker connection.   Yes
dask_worker_tick_duration_median_seconds Median tick duration at worker.   Yes
dask_worker_task_duration_median_seconds Median task runtime at worker.   Yes
dask_worker_transfer_bandwidth_median_bytes Bandwidth for transfer at worker in Bytes.   Yes