Monitoring & telemetry
We provide logging to stdout
and an optional OpenTelemetry exporter for our traces.
OpenTelemetry exporting can be enabled by specifying --enable-otel
via the command-line or the
MIDEN_NODE_ENABLE_OTEL
environment variable when operating the node.
We do not export OpenTelemetry logs or metrics. Our end goal is to derive these based off of our tracing information. This approach is known as wide-events, structured logs, and Observibility 2.0.
What we're exporting are traces
which consist of spans
(covering a period of time), and events
(something happened
at a specific instance in time). These are extremely useful to debug distributed systems - even though miden
is still
centralized, the node
components are distributed.
OpenTelemetry provides a Span Metrics Converter which can be used to convert our traces into more conventional metrics.
What gets traced
We assign a unique trace (aka root span) to each RPC request, batch build, and block build process.
Span and attribute naming is unstable and should not be relied upon. This also means changes here will not be considered breaking, however we will do our best to document them.
RPC request/response
Not yet implemented.
Block building
This trace covers the building, proving and submission of a block.
Span tree
block_builder.build_block
┝━ block_builder.select_block
│ ┝━ mempool.lock
│ ┕━ mempool.select_block
┝━ block_builder.get_block_inputs
│ ┝━ block_builder.summarize_batches
│ ┕━ store.client.get_block_inputs
│ ┕━ store.rpc/GetBlockInputs
│ ┕━ store.server.get_block_inputs
│ ┝━ validate_nullifiers
│ ┝━ read_account_ids
│ ┝━ validate_notes
│ ┝━ select_block_header_by_block_num
│ ┝━ select_note_inclusion_proofs
│ ┕━ select_block_headers
┝━ block_builder.prove_block
│ ┝━ execute_program
│ ┕━ block_builder.simulate_proving
┝━ block_builder.inject_failure
┕━ block_builder.commit_block
┝━ store.client.apply_block
│ ┕━ store.rpc/ApplyBlock
│ ┕━ store.server.apply_block
│ ┕━ apply_block
│ ┝━ select_block_header_by_block_num
│ ┕━ update_in_memory_structs
┝━ mempool.lock
┕━ mempool.commit_block
┕━ mempool.revert_expired_transactions
┕━ mempool.revert_transactions
Batch building
This trace covers the building and proving of a batch.
Span tree
batch_builder.build_batch
┝━ batch_builder.wait_for_available_worker
┝━ batch_builder.select_batch
│ ┝━ mempool.lock
│ ┕━ mempool.select_batch
┝━ batch_builder.get_batch_inputs
│ ┕━ store.client.get_batch_inputs
┝━ batch_builder.propose_batch
┝━ batch_builder.prove_batch
┝━ batch_builder.inject_failure
┕━ batch_builder.commit_batch
┝━ mempool.lock
┕━ mempool.commit_batch
Verbosity
We log important spans and events at info
level or higher, which is also the default log level.
Changing this level should rarely be required - let us know if you're missing information that should be at info
.
The available log levels are trace
, debug
, info
(default), warn
, error
which can be configured using the
RUST_LOG
environment variable e.g.
export RUST_LOG=debug
The verbosity can also be specified by component (when running them as a single process):
export RUST_LOG=warn,block-producer=debug,rpc=error
The above would set the general level to warn
, and the block-producer
and rpc
components would be overridden to
debug
and error
respectively. Though as mentioned, it should be unusual to do this.
Configuration
The OpenTelemetry trace exporter is enabled by adding the --enable-otel
flag to the node's start command:
miden-node bundled start --enable-otel
The exporter can be configured using environment variables as specified in the official documents.
Note: we only support gRPC as the export protocol.
Example: Honeycomb configuration
This is based off Honeycomb's OpenTelemetry setup guide.
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io:443 \
OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=your-api-key" \
miden-node bundled start --enable-otel
Honeycomb queries, triggers and board examples
Example Queries
Here are some useful Honeycomb queries to help monitor your Miden node:
Block building performance:
VISUALIZE
HEATMAP(duration_ms) AVG(duration_ms)
WHERE
name = "block_builder.build_block"
GROUP BY block.number
ORDER BY block.number DESC
LIMIT 100
Batch processing latency:
VISUALIZE
HEATMAP(duration_ms) AVG(duration_ms) P95(duration_ms)
WHERE
name = "batch_builder.build_batch"
GROUP BY batch.id
LIMIT 100
Block proving failures:
VISUALIZE
COUNT
WHERE
name = "block_builder.build_block"
AND status = "error"
CALCULATE RATE
Transaction volume by block:
VISUALIZE
MAX(transactions.count)
WHERE
name = "block_builder.build_block"
GROUP BY block.number
ORDER BY block.number DESC
LIMIT 100
RPC request rate by endpoint:
VISUALIZE
COUNT
WHERE
name contains "rpc"
GROUP BY name
RPC latency by endpoint:
VISUALIZE
AVG(duration_ms) P95(duration_ms)
WHERE
name contains "rpc"
GROUP BY name
RPC errors by status code:
VISUALIZE
COUNT
WHERE
name contains "rpc"
GROUP BY status_code
Example Triggers
Create triggers in Honeycomb to alert you when important thresholds are crossed:
Slow block building:
- Query:
VISUALIZE
AVG(duration_ms)
WHERE
name = "block_builder.build_block"
- Trigger condition:
AVG(duration_ms) > 30000
(adjust based on your expected block time) - Description: Alert when blocks take too long to build (more than 30 seconds on average)
High failure rate:
- Query:
VISUALIZE
COUNT
WHERE
name = "block_builder.build_block" AND error = true
- Trigger condition:
COUNT > 100 WHERE error = true
- Description: Alert when more than 100 block builds are failing
Advanced investigation with BubbleUp
To identify the root cause of performance issues or errors, use Honeycomb's BubbleUp feature:
- Create a query for a specific issue (e.g., high latency for block building)
- Click on a specific high-latency point in the visualization
- Use BubbleUp to see which attributes differ significantly between normal and slow operations
- Inspect the related spans in the trace to pinpoint the exact step causing problems
This approach helps identify patterns like:
- Which types of transactions are causing slow blocks
- Which specific operations within block/batch processing take the most time
- Correlations between resource usage and performance
- Common patterns in error cases