Performance Metrics Collection

The GD node includes a comprehensive performance metrics collection system that tracks various aspects of node execution throughout its lifecycle.

Architecture

        sequenceDiagram
    participant Node
    participant MetricsCollector
    participant Storage

    Note over Node,Storage: Initialization Phase
    Node->>MetricsCollector: Create collector
    Note right of MetricsCollector: Records start time
    Note right of MetricsCollector: Records PID
    Note right of MetricsCollector: Initial memory snapshot

    Note over Node,Storage: Import Phase
    Node->>MetricsCollector: record_timestamp("imports_done")
    MetricsCollector->>MetricsCollector: Capture memory metrics

    Note over Node,Storage: Initialization Phase
    Node->>MetricsCollector: record_timestamp("init_start")
    MetricsCollector->>MetricsCollector: Capture memory metrics
    Node->>MetricsCollector: Update node info & transports
    Node->>MetricsCollector: record_timestamp("init_done")
    MetricsCollector->>MetricsCollector: Capture memory metrics

    Note over Node,Storage: Runtime Phase
    Node->>MetricsCollector: record_timestamp("runtime_start")
    MetricsCollector->>MetricsCollector: Capture memory metrics

    Note over Node,Storage: Termination Phase
    Node->>MetricsCollector: record_timestamp("terminate_start")
    MetricsCollector->>MetricsCollector: Capture memory metrics
    Node->>MetricsCollector: mark_terminated(exit_code, reason)
    Node->>MetricsCollector: record_timestamp("terminate_done")
    MetricsCollector->>MetricsCollector: Capture memory metrics
    MetricsCollector->>MetricsCollector: Calculate durations
    MetricsCollector->>Storage: save_metrics()
    

Collection Phases

The metrics collection system tracks five distinct phases of node execution, using the MetricsEvent enum:

  1. Start-up Phase (START)
    • Process ID capture

    • Start timestamp recording

    • Initial memory usage snapshot

  2. Import Phase (IMPORTS_DONE)
    • Module import completion time

    • Memory usage after imports

    • Import duration calculation

  3. Initialization Phase (INIT_START, INIT_DONE)
    • Initialization start/end times

    • Node name and instance name recording

    • Active transport configuration

    • Memory usage during initialization

    • Initialization duration calculation

  4. Runtime Phase (RUNTIME_START)
    • Runtime start timestamp

    • Memory usage at runtime start

    • Continuous runtime duration tracking

  5. Termination Phase (TERMINATE_START, TERMINATE_DONE)
    • Termination start/end timestamps

    • Exit code recording

    • Termination reason logging

    • Final memory metrics capture

    • Total duration calculations

Node Status Information

The metrics system tracks node state transitions:

  • State: Progresses through:
    • “starting” (initial state)

    • “running” (after initialization)

    • “terminated” (final state)

  • Exit Code: Integer value indicating how the node terminated

  • Termination Reason: String description of why the node terminated

Memory Metrics

At each phase, the following memory metrics are collected:

  • RSS (Resident Set Size): Actual physical memory used by the process

  • VMS (Virtual Memory Size): Total virtual memory allocated

These metrics are stored with event-specific prefixes in InfluxDB, for example: - memory_start_rss - memory_runtime_start_rss - memory_terminate_done_vms

Duration Calculations

The system automatically calculates several duration metrics in seconds:

  • Import Time: Duration of module imports

  • Initialization Time: Time spent in initialization

  • Runtime Duration: Time between runtime start and termination start (or current time for active nodes). This represents the actual execution time of the node, excluding initialization overhead.

  • Termination Duration: Time taken to clean up and exit

  • Process Time: Total time from process start until current time or termination, including all overhead (imports, initialization, runtime, and termination)

  • Total Process Time: Overall lifetime of the node from start to termination

Key Differences:

  • Process Time vs Runtime Duration:
    • Process Time tracks the entire lifetime of the node process from the very start, including all overhead

    • Runtime Duration only measures the actual execution time of the node between RUNTIME_START and TERMINATE_START events

    • This distinction is useful for:
      • Identifying initialization overhead

      • Debugging startup performance issues

      • Understanding total resource usage vs active runtime usage

Time Storage

Time values are handled consistently across the system:

  • Internal storage: All timestamps and durations are stored in seconds

  • File storage: Times are written in seconds

  • InfluxDB storage: Times stored in seconds as floating-point values

Storage Fields

InfluxDB Storage Format: - Timestamps stored as: timestamp_{event_name} - Memory metrics stored as: memory_{event_name}_{metric_type} - Duration metrics stored as: duration_{metric_name}

Storage Backends

The metrics collection system supports InfluxDB Storage (InfluxDBMetricsCollector) and if disabled, a NullMetricsCollector that does nothing.
  • Stores metrics in InfluxDB time-series database

  • Default port: 8086

  • Supports both http:// and https:// URL schemes

  • Automatically creates database if not exists

  • Metrics tagged with node name, instance name, and measurement tag

  • Configuration via environment variables:
    • GD_NODE_METRICS_TAG={tag}

    • INFLUXDB_HOST=http://localhost:8086

    • PLATFORM_METRICS=1

    • PLATFORM_METRICS_INFLUX_DB={dbname} (optional, defaults to “metrics”)

Usage Examples

Using a tag named “my_test” and InfluxDB running on localhost:

export GD_NODE_METRICS_TAG=my_test
export INFLUXDB_HOST=http://localhost:8086
export PLATFORM_METRICS=1
export PLATFORM_METRICS_INFLUX_DB=platform_metrics
python3 -m gd_node --name node_name --inst instance_name