Performance Metrics Collection
The GD node includes a comprehensive performance metrics collection system that tracks various aspects of node execution throughout its lifecycle.
Architecture
sequenceDiagram
participant Node
participant MetricsCollector
participant Storage
Note over Node,Storage: Initialization Phase
Node->>MetricsCollector: Create collector
Note right of MetricsCollector: Records start time
Note right of MetricsCollector: Records PID
Note right of MetricsCollector: Initial memory snapshot
Note over Node,Storage: Import Phase
Node->>MetricsCollector: record_timestamp("imports_done")
MetricsCollector->>MetricsCollector: Capture memory metrics
Note over Node,Storage: Initialization Phase
Node->>MetricsCollector: record_timestamp("init_start")
MetricsCollector->>MetricsCollector: Capture memory metrics
Node->>MetricsCollector: Update node info & transports
Node->>MetricsCollector: record_timestamp("init_done")
MetricsCollector->>MetricsCollector: Capture memory metrics
Note over Node,Storage: Runtime Phase
Node->>MetricsCollector: record_timestamp("runtime_start")
MetricsCollector->>MetricsCollector: Capture memory metrics
Note over Node,Storage: Termination Phase
Node->>MetricsCollector: record_timestamp("terminate_start")
MetricsCollector->>MetricsCollector: Capture memory metrics
Node->>MetricsCollector: mark_terminated(exit_code, reason)
Node->>MetricsCollector: record_timestamp("terminate_done")
MetricsCollector->>MetricsCollector: Capture memory metrics
MetricsCollector->>MetricsCollector: Calculate durations
MetricsCollector->>Storage: save_metrics()
Collection Phases
The metrics collection system tracks five distinct phases of node execution, using the MetricsEvent enum:
- Start-up Phase (START)
Process ID capture
Start timestamp recording
Initial memory usage snapshot
- Import Phase (IMPORTS_DONE)
Module import completion time
Memory usage after imports
Import duration calculation
- Initialization Phase (INIT_START, INIT_DONE)
Initialization start/end times
Node name and instance name recording
Active transport configuration
Memory usage during initialization
Initialization duration calculation
- Runtime Phase (RUNTIME_START)
Runtime start timestamp
Memory usage at runtime start
Continuous runtime duration tracking
- Termination Phase (TERMINATE_START, TERMINATE_DONE)
Termination start/end timestamps
Exit code recording
Termination reason logging
Final memory metrics capture
Total duration calculations
Node Status Information
The metrics system tracks node state transitions:
- State: Progresses through:
“starting” (initial state)
“running” (after initialization)
“terminated” (final state)
Exit Code: Integer value indicating how the node terminated
Termination Reason: String description of why the node terminated
Memory Metrics
At each phase, the following memory metrics are collected:
RSS (Resident Set Size): Actual physical memory used by the process
VMS (Virtual Memory Size): Total virtual memory allocated
These metrics are stored with event-specific prefixes in InfluxDB, for example: - memory_start_rss - memory_runtime_start_rss - memory_terminate_done_vms
Duration Calculations
The system automatically calculates several duration metrics in seconds:
Import Time: Duration of module imports
Initialization Time: Time spent in initialization
Runtime Duration: Time between runtime start and termination start (or current time for active nodes). This represents the actual execution time of the node, excluding initialization overhead.
Termination Duration: Time taken to clean up and exit
Process Time: Total time from process start until current time or termination, including all overhead (imports, initialization, runtime, and termination)
Total Process Time: Overall lifetime of the node from start to termination
Key Differences:
- Process Time vs Runtime Duration:
Process Time tracks the entire lifetime of the node process from the very start, including all overhead
Runtime Duration only measures the actual execution time of the node between RUNTIME_START and TERMINATE_START events
- This distinction is useful for:
Identifying initialization overhead
Debugging startup performance issues
Understanding total resource usage vs active runtime usage
Time Storage
Time values are handled consistently across the system:
Internal storage: All timestamps and durations are stored in seconds
File storage: Times are written in seconds
InfluxDB storage: Times stored in seconds as floating-point values
Storage Fields
InfluxDB Storage Format: - Timestamps stored as: timestamp_{event_name} - Memory metrics stored as: memory_{event_name}_{metric_type} - Duration metrics stored as: duration_{metric_name}
Storage Backends
- The metrics collection system supports InfluxDB Storage (InfluxDBMetricsCollector) and if disabled, a NullMetricsCollector that does nothing.
Stores metrics in InfluxDB time-series database
Default port: 8086
Automatically creates database if not exists
Metrics tagged with node name, instance name, and measurement tag
- Configuration via environment variables:
GD_NODE_METRICS_TAG={tag}INFLUXDB_HOST=http://localhost:8086PLATFORM_METRICS=1PLATFORM_METRICS_INFLUX_DB={dbname}(optional, defaults to “metrics”)
Usage Examples
Using a tag named “my_test” and InfluxDB running on localhost:
export GD_NODE_METRICS_TAG=my_test
export INFLUXDB_HOST=http://localhost:8086
export PLATFORM_METRICS=1
export PLATFORM_METRICS_INFLUX_DB=platform_metrics
python3 -m gd_node --name node_name --inst instance_name