Prometheus monitoring and alerting

Prometheus is an open-source systems monitoring and alerting toolkit. Prometheus collects and stores metrics as time-series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels.

Users can measure the internal status of a QuestDB instance via an HTTP endpoint exposed by QuestDB at port 9003. This document describes how to enable metrics via this endpoint, how to configure Prometheus to scrape metrics from a QuestDB instance, and how to enable alerting from QuestDB to Prometheus Alertmanager.

Prerequisites

Scraping Prometheus metrics from QuestDB

QuestDB has a /metrics HTTP endpoint on port 9003 to expose Prometheus metrics. Before being able to query metrics, they must be enabled via the metrics.enabled key in server configuration:

/path/to/server.conf
metrics.enabled=true

When running QuestDB via Docker, port 9003 must be exposed and the metrics configuration can be enabled via the QDB_METRICS_ENABLED environment variable:

Docker
docker run \
-e QDB_METRICS_ENABLED=TRUE \
-p 8812:8812 -p 9000:9000 -p 9003:9003 -p 9009:9009 \
-v "$(pwd):/var/lib/questdb" \
questdb/questdb:9.0.1

To verify that metrics are being exposed correctly by QuestDB, navigate to http://<questdb_ip>:9003/metrics in a browser, where <questdb_ip> is the IP address of an instance, or execute a basic curl like the following example:

Given QuestDB running at 127.0.0.1
curl http://127.0.0.1:9003/metrics
# TYPE questdb_json_queries_total counter
questdb_json_queries_total 0

# TYPE questdb_memory_tag_MMAP_DEFAULT gauge
questdb_memory_tag_MMAP_DEFAULT 77872

# TYPE questdb_memory_malloc_count gauge
questdb_memory_malloc_count 659

# ...

To configure Prometheus to scrape these metrics, provide the QuestDB instance IP and port 9003 as a target. The following example configuration file questdb.yml assumes there is a running QuestDB instance on localhost (127.0.0.1) with port 9003 available:

questdb.yml
global:
scrape_interval: 5s
external_labels:
monitor: 'questdb'

scrape_configs:
- job_name: 'questdb'
scrape_interval: 5s
static_configs:
- targets: ['127.0.0.1:9003']

Start Prometheus and pass this configuration on launch:

prometheus --config.file=questdb.yml

Prometheus should be available on 0.0.0.0:9090 and navigating to http://0.0.0.0:9090/targets should show that QuestDB is being scraped successfully:

Prometheus targets tab showing a QuestDB instance status

In the graphing tab of Prometheus (http://0.0.0.0:9090/graph), autocomplete can be used to graph QuestDB-specific metrics which are all prefixed with questdb_:

Prometheus graphing tab showing QuestDB instance metrics on a chart

The following metrics are available:

MetricTypeDescription
questdb_commits_totalcounterNumber of total commits of all types (in-order and out-of-order) executed on the database tables.
questdb_o3_commits_totalcounterNumber of total out-of-order (O3) commits executed on the database tables.
questdb_committed_rows_totalcounterNumber of total rows committed to the database tables.
questdb_physically_written_rows_totalcounterNumber of total rows physically written to disk. Greater than committed_rows with [out-of-order ingestion. Write amplification is questdb_physically_written_rows_total / questdb_committed_rows_total.
questdb_rollbacks_totalcounterNumber of total rollbacks executed on the database tables.
questdb_json_queries_totalcounterNumber of total REST API queries, including retries.
questdb_json_queries_completed_totalcounterNumber of successfully executed REST API queries.
questdb_pg_wire_queries_totalcounterNumber of total PGWire queries.
questdb_pg_wire_queries_completed_totalcounterNumber of successfully executed PGWire queries.
questdb_unhandled_errors_totalcounterNumber of total unhandled errors occurred in the database. Such errors usually mean a critical service degradation in one of the database subsystems.
questdb_jvm_major_gc_count_totalcounterNumber of times major JVM garbage collection was triggered.
questdb_jvm_major_gc_time_totalcounterTotal time spent on major JVM garbage collection in milliseconds.
questdb_jvm_minor_gc_count_totalcounterNumber of times minor JVM garbage collection pause was triggered.
questdb_jvm_minor_gc_time_totalcounterTotal time spent on minor JVM garbage collection pauses in milliseconds.
questdb_jvm_unknown_gc_count_totalcounterNumber of times JVM garbage collection of unknown type was triggered. Non-zero values of this metric may be observed only on some, non-mainstream JVM implementations.
questdb_jvm_unknown_gc_time_totalcounterTotal time spent on JVM garbage collection of unknown type in milliseconds. Non-zero values of this metric may be observed only on some, non-mainstream JVM implementations.
questdb_memory_tag_MMAP_DEFAULTgaugeAmount of memory allocated for mmaped files.
questdb_memory_tag_NATIVE_DEFAULTgaugeAmount of allocated untagged native memory.
questdb_memory_tag_MMAP_O3gaugeAmount of memory allocated for O3 mmapped files.
questdb_memory_tag_NATIVE_O3gaugeAmount of memory allocated for O3.
questdb_memory_tag_NATIVE_RECORD_CHAINgaugeAmount of memory allocated for SQL record chains.
questdb_memory_tag_MMAP_TABLE_WRITERgaugeAmount of memory allocated for table writer mmapped files.
questdb_memory_tag_NATIVE_TREE_CHAINgaugeAmount of memory allocated for SQL tree chains.
questdb_memory_tag_MMAP_TABLE_READERgaugeAmount of memory allocated for table reader mmapped files.
questdb_memory_tag_NATIVE_COMPACT_MAPgaugeAmount of memory allocated for SQL compact maps.
questdb_memory_tag_NATIVE_FAST_MAPgaugeAmount of memory allocated for SQL fast maps.
questdb_memory_tag_NATIVE_LONG_LISTgaugeAmount of memory allocated for long lists.
questdb_memory_tag_NATIVE_HTTP_CONNgaugeAmount of memory allocated for HTTP connections.
questdb_memory_tag_NATIVE_PGW_CONNgaugeAmount of memory allocated for PostgreSQL Wire Protocol connections.
questdb_memory_tag_MMAP_INDEX_READERgaugeAmount of memory allocated for index reader mmapped files.
questdb_memory_tag_MMAP_INDEX_WRITERgaugeAmount of memory allocated for index writer mmapped files.
questdb_memory_tag_MMAP_INDEX_SLIDERgaugeAmount of memory allocated for indexed column view mmapped files.
questdb_memory_tag_NATIVE_REPLgaugeAmount of memory mapped for replication tasks.
questdb_memory_free_countgaugeNumber of times native memory was freed.
questdb_memory_mem_usedgaugeCurrent amount of allocated native memory.
questdb_memory_malloc_countgaugeNumber of times native memory was allocated.
questdb_memory_realloc_countgaugeNumber of times native memory was reallocated.
questdb_memory_rssgaugeResident Set Size (Linux/Unix) / Working Set Size (Windows).
questdb_memory_jvm_freegaugeCurrent amount of free Java memory heap in bytes.
questdb_memory_jvm_totalgaugeCurrent size of Java memory heap in bytes.
questdb_memory_jvm_maxgaugeMaximum amount of Java heap memory that can be allocated in bytes.
questdb_http_connectionsgaugeNumber of currently active HTTP connections.
questdb_json_queries_cachedgaugeNumber of current cached REST API queries.
questdb_line_tcp_connectionsgaugeNumber of currently active InfluxDB Line Protocol TCP connections.
questdb_pg_wire_connectionsgaugeNumber of currently active PostgreSQL Wire Protocol connections.
questdb_pg_wire_select_queries_cachedgaugeNumber of current cached PostgreSQL Wire Protocol SELECT queries.
questdb_pg_wire_update_queries_cachedgaugeNumber of current cached PostgreSQL Wire Protocol UPDATE queries.
questdb_json_queries_cache_hits_totalcounterNumber of total cache hits for JSON queries.
questdb_json_queries_cache_misses_totalcounterNumber of total cache misses for JSON queries.
questdb_json_queries_completed_totalcounterTotal number of completed JSON queries.
questdb_jvm_major_gc_count_totalcounterTotal number of major garbage collection events.
questdb_jvm_major_gc_time_totalcounterTotal time spent on major garbage collection.
questdb_jvm_minor_gc_count_totalcounterTotal number of minor garbage collection events.
questdb_jvm_minor_gc_time_totalcounterTotal time spent on minor garbage collection.
questdb_jvm_unknown_gc_count_totalcounterTotal number of unknown type garbage collection events.
questdb_jvm_unknown_gc_time_totalcounterTotal time spent on unknown type garbage collection.
questdb_memory_tag_MMAP_BLOCK_WRITERgaugeAmount of memory allocated for block writer mmapped files.
questdb_memory_tag_MMAP_IMPORTgaugeAmount of memory allocated for import operations.
questdb_memory_tag_MMAP_PARALLEL_IMPORTgaugeAmount of memory allocated for parallel import operations.
questdb_memory_tag_MMAP_PARTITION_CONVERTERgaugeAmount of memory allocated for partition converter operations.
questdb_memory_tag_MMAP_SEQUENCER_METADATAgaugeAmount of memory allocated for sequencer metadata.
questdb_memory_tag_MMAP_TABLE_WAL_READERgaugeAmount of memory allocated for table WAL reader mmapped files.
questdb_memory_tag_MMAP_TABLE_WAL_WRITERgaugeAmount of memory allocated for table WAL writer mmapped files.
questdb_memory_tag_MMAP_TX_LOGgaugeAmount of memory allocated for transaction log mmapped files.
questdb_memory_tag_MMAP_TX_LOG_CURSORgaugeAmount of memory allocated for transaction log cursor mmapped files.
questdb_memory_tag_MMAP_UPDATEgaugeAmount of memory allocated for update operations.
questdb_memory_tag_NATIVE_CB1gaugeAmount of memory allocated for native circular buffer 1.
questdb_memory_tag_NATIVE_CB2gaugeAmount of memory allocated for native circular buffer 2.
questdb_memory_tag_NATIVE_CB3gaugeAmount of memory allocated for native circular buffer 3.
questdb_memory_tag_NATIVE_CB4gaugeAmount of memory allocated for native circular buffer 4.
questdb_memory_tag_NATIVE_CB5gaugeAmount of memory allocated for native circular buffer 5.
questdb_memory_tag_NATIVE_CIRCULAR_BUFFERgaugeAmount of memory allocated for native circular buffers.
questdb_memory_tag_NATIVE_DIRECT_BYTE_SINKgaugeAmount of memory allocated for native direct byte sink.
questdb_memory_tag_NATIVE_DIRECT_CHAR_SINKgaugeAmount of memory allocated for native direct char sink.
questdb_memory_tag_NATIVE_DIRECT_UTF8_SINKgaugeAmount of memory allocated for native direct UTF-8 sink.
questdb_memory_tag_NATIVE_FAST_MAP_INT_LISTgaugeAmount of memory allocated for native fast map integer list.
questdb_memory_tag_NATIVE_FUNC_RSSgaugeAmount of memory allocated for native function RSS.
questdb_memory_tag_NATIVE_GROUP_BY_FUNCTIONgaugeAmount of memory allocated for native group by function.
questdb_memory_tag_NATIVE_ILP_RSSgaugeAmount of memory allocated for native ILP RSS.
questdb_memory_tag_NATIVE_IMPORTgaugeAmount of memory allocated for native import operations.
questdb_memory_tag_NATIVE_INDEX_READERgaugeAmount of memory allocated for native index reader.
questdb_memory_tag_NATIVE_IO_DISPATCHER_RSSgaugeAmount of memory allocated for native IO dispatcher RSS.
questdb_memory_tag_NATIVE_JITgaugeAmount of memory allocated for native JIT.
questdb_memory_tag_NATIVE_JIT_LONG_LISTgaugeAmount of memory allocated for native JIT long list.
questdb_memory_tag_NATIVE_JOIN_MAPgaugeAmount of memory allocated for native join map.
questdb_memory_tag_NATIVE_LATEST_BY_LONG_LISTgaugeAmount of memory allocated for native latest by long list.
questdb_memory_tag_NATIVE_LOGGERgaugeAmount of memory allocated for native logger.
questdb_memory_tag_NATIVE_MIGgaugeAmount of memory allocated for native MIG.
questdb_memory_tag_NATIVE_MIG_MMAPgaugeAmount of memory allocated for native MIG mmapped files.
questdb_memory_tag_NATIVE_OFFLOADgaugeAmount of memory allocated for native offload.
questdb_memory_tag_NATIVE_PARALLEL_IMPORTgaugeAmount of memory allocated for native parallel import.
questdb_memory_tag_NATIVE_PATHgaugeAmount of memory allocated for native path.
questdb_memory_tag_NATIVE_ROSTIgaugeAmount of memory allocated for native rosti.
questdb_memory_tag_NATIVE_SAMPLE_BY_LONG_LISTgaugeAmount of memory allocated for native sample by long list.
questdb_memory_tag_NATIVE_SQL_COMPILERgaugeAmount of memory allocated for native SQL compiler.
questdb_memory_tag_NATIVE_TABLE_READERgaugeAmount of memory allocated for native table reader.
questdb_memory_tag_NATIVE_TABLE_WAL_WRITERgaugeAmount of memory allocated for native table WAL writer.
questdb_memory_tag_NATIVE_TABLE_WRITERgaugeAmount of memory allocated for native table writer.
questdb_memory_tag_NATIVE_TEXT_PARSER_RSSgaugeAmount of memory allocated for native text parser RSS.
questdb_memory_tag_NATIVE_TLS_RSSgaugeAmount of memory allocated for native TLS RSS.
questdb_memory_tag_NATIVE_UNORDERED_MAPgaugeAmount of memory allocated for native unordered map.
questdb_pg_wire_errors_totalcounterTotal number of errors in PostgreSQL wire protocol.
questdb_pg_wire_select_cache_hits_totalcounterTotal number of cache hits for PostgreSQL wire protocol select queries.
questdb_pg_wire_select_cache_misses_totalcounterTotal number of cache misses for PostgreSQL wire protocol select queries.
questdb_wal_apply_physically_written_rows_totalcounterTotal number of physically written rows during WAL apply.
questdb_wal_apply_rows_per_secondgaugeRate of rows applied per second during WAL apply.
questdb_wal_apply_written_rows_totalcounterTotal number of rows written during WAL apply.
questdb_wal_written_rows_totalcounterTotal number of rows written to WAL.
questdb_wal_seq_txngaugeSum of all committed transaction sequence numbers. Used in conjunction with questdb_wal_writer_txn.
questdb_wal_writer_txngaugeSum of all transaction sequence numbers applied to tables. With no pending transactions in the WAL, equal to questdb_wal_seq_txn. When its lag behind questdb_wal_seq_txn is steadily growing, indicates QuestDB is unable to keep up with writes.
questdb_workers_job_start_micros_maxgaugeMaximum time taken to start a worker job in microseconds.
questdb_workers_job_start_micros_mingaugeMinimum time taken to start a worker job in microseconds.

Most of the above metrics are volatile, i.e. they're collected since the current database start. The exception are questdb_wal_seq_txn and questdb_wal_writer_txn, because transaction sequence numbers are persistent.

Configuring Prometheus Alertmanager

note

Full details on logging configurations can be found within the Logging & Metrics documentation.

QuestDB includes a log writer that sends any message logged at critical level (by default) to Prometheus Alertmanager over a TCP/IP socket connection. To configure this writer, add it to the writers config alongside other log writers.

Alertmanager may be started via Docker with the following command:

docker run -p 127.0.0.1:9093:9093 --name alertmanager quay.io/prometheus/alertmanager

To discover the IP address of this container, run the following command which specifies alertmanager as the container name:

docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' alertmanager

To run QuestDB and point it towards Alertmanager for alerting, first create a file ./conf/log.conf with the following contents. 172.17.0.2 in this case is the IP address of the docker container for alertmanager that was discovered by running the docker inspect command above.

./conf/log.conf
# Which writers to enable
writers=stdout,alert

# stdout
w.stdout.class=io.questdb.log.LogConsoleWriter
w.stdout.level=INFO

# Prometheus Alerting
w.alert.class=io.questdb.log.LogAlertSocketWriter
w.alert.level=CRITICAL
w.alert.alertTargets=172.17.0.2:9093

Start up QuestDB in Docker using the following command:

docker run \
-p 9000:9000 -p 8812:8812 -p 9009:9009 -p 9003:9003 \
-v "$(pwd)::/var/lib/questdb" \
questdb/questdb:6.1.3

When alerts are successfully triggered, QuestDB logs will indicate the sent and received status:

2021-12-14T18:42:54.222967Z I i.q.l.LogAlertSocketWriter Sending: 2021-12-14T18:42:54.122874Z I i.q.l.LogAlertSocketWriter Sending: 2021-12-14T18:42:54.073978Z I i.q.l.LogAlertSocketWriter Received [0] 172.17.0.2:9093: {"status":"success"}
2021-12-14T18:42:54.223377Z I i.q.l.LogAlertSocketWriter Received [0] 172.17.0.2:9093: {"status":"success"}