LogoLogo
umh-core
umh-core
  • Introduction
  • Getting Started
  • Usage
    • Unified Namespace
      • Overview
      • Payload Formats
      • Topic Convention
      • Producing Data
      • Consuming Data
    • Data Flows
      • Overview
      • Bridges
      • Stand-alone Flow
      • Stream Processor 🚧
    • Data Modeling 🚧
      • Data Models 🚧
      • Data Contracts 🚧
      • Stream Processors 🚧
  • Production
    • Updating
    • Sizing Guide
    • Corporate Firewalls
    • Metrics
    • Migration from Classic
  • Reference
    • Configuration Reference
    • Container Layout
    • State Machines
    • Environment Variables
  • UMH Core vs UMH Classic
  • UMH Classic
    • Go To Documentation
  • Ressources
    • Website
    • Blog
Powered by GitBook
On this page
  • How It Works
  • 1 β€” Redpanda Service
  • 2 β€” Container Monitor
  • 3 β€” Agent Monitor
  • 4 β€” DataFlow Component (Bridge)
  • Aggregate Bridge FSM
  • 4.1 Connection Service FSM
  • 4.2 Benthos Flow (Source /Sink)
  • Quick Defaults
  1. Reference

State Machines

State machines are the core orchestration mechanism in UMH-Core. Every component is managed by a finite state machine (FSM) with clearly defined states and transitions. This provides predictable, observable behavior and enables reliable error handling and recovery.

How It Works

UMH-Core uses hierarchical state machines where components build upon each other:

  • Bridge (formerly Protocol Converter) = Connection + Source Flow + Sink Flow

  • Flow (DataFlow Component) = Benthos instance with lifecycle management

  • Benthos Flow = Individual Benthos process with detailed startup phases

  • Connection = Network probe service (typically nmap-based)

Each component inherits lifecycle states (to_be_created, creating, removing, removed) and adds operational states specific to its function. The Agent continuously reconciles desired vs actual state, triggering appropriate transitions based on observed conditions.

1 β€” Redpanda Service

State
Verified
What it means
How it is entered
How it leaves

stopped

βœ…

redpanda process not running.

stop_done from stopping, or initial create.

start β†’ starting

starting

βœ…

S6 launching broker; health checks pending.

start event from stopped.

start_done β†’ idle start_failed β†’ stopped

idle

βœ…

Broker healthy, no data for 30 s (default idle window).

start_done or no_data_timeout from active.

data_received β†’ active degraded β†’ degraded stop β†’ stopping

active

βœ…

Broker healthy & BytesIn/OutPerSec > 0.

data_received from idle.

no_data_timeout β†’ idle degraded β†’ degraded stop β†’ stopping

⚠️ degraded

βœ…

Broker running but β‰₯1 health‑check failing (disk-space-low, cpu-saturated, etc.).

degraded from idle/active.

recovered β†’ idle stop β†’ stopping

stopping

βœ…

Graceful shutdown (draining clients).

stop from any running state.

stop_done β†’ stopped


2 β€” Container Monitor

State
Verified
Meaning
Enter trigger
Exit trigger

active

βœ…

CPU < 85 %, RAM < 90 %, Disk < 90 %.

metrics_all_ok after monitor start or from degraded.

metrics_not_ok β†’ degraded

⚠️ degraded

βœ…

One of the above limits breached for 15 s.

metrics_not_ok

metrics_all_ok β†’ active

monitoring_stopped

βœ…

Watchdog disabled.

stop_monitoring_done

start_monitoring β†’ monitoring_starting

monitoring_starting

βœ…

Monitor service booting.

start_monitoring

start_monitoring_done β†’ degraded (initial)

monitoring_stopping

βœ…

Monitor shutting down.

stop_monitoring

stop_monitoring_done β†’ monitoring_stopped


3 β€” Agent Monitor

State
Verified
Meaning
Enter
Exit

active

βœ…

Agent connected & internal tasks OK.

metrics_all_ok

metrics_not_ok β†’ degraded

⚠️ degraded

βœ…

Cloud unreachable / auth error / task panic.

metrics_not_ok

metrics_all_ok β†’ active

monitoring_stopped

βœ…

Agent health monitor off.

stop_monitoring_done

start_monitoring β†’ monitoring_starting

monitoring_starting

βœ…

Starting health checks.

start_monitoring

start_monitoring_done β†’ degraded (initial)

monitoring_stopping

βœ…

Halting checks.

stop_monitoring

stop_monitoring_done β†’ monitoring_stopped


4 β€” DataFlow Component (Bridge)

Aggregate Bridge FSM

State
Verified
Meaning
Enter
Exit

stopped

βœ…

All sub‑services stopped.

stop_done or after create.

start β†’ starting

starting

βœ…

Launching source & sink Benthos + connection monitor.

start

start_done β†’ idle start_failed β†’ starting_failed

starting_failed

βœ…

At least one sub‑service failed during start.

start_failed

Manual retry (start) or removal

idle

βœ…

Sub‑services healthy, no payload for 30 s.

start_done, no_data_received, recovered

data_received β†’ active benthos_degraded β†’ degraded stop β†’ stopping

active

βœ…

Data moving through at least one flow.

data_received

no_data_received β†’ idle benthos_degraded β†’ degraded stop β†’ stopping

⚠️ degraded

βœ…

β‰₯1 sub‑FSM degraded/down (connection lost, flow error).

benthos_degraded

benthos_recovered β†’ idle stop β†’ stopping

stopping

βœ…

Stopping Benthos + connection monitor.

stop

stop_done β†’ stopped

4.1 Connection Service FSM

State
Verified
Meaning

starting

βœ…

Probe service launching.

up

βœ…

Target reachable.

down

βœ…

Target unreachable.

⚠️ degraded

βœ…

Flaky / intermittent responses.

stopping

βœ…

Probe shutting down.

stopped

βœ…

Probe disabled.

4.2 Benthos Flow (Source /Sink)

State
Verified
Meaning

stopped

βœ…

Service file present, process not running.

starting

βœ…

S6 launched process.

starting_config_loading

βœ…

Benthos parsing YAML pipeline.

starting_waiting_for_healthchecks

βœ…

Pipeline loaded; waiting for plugin health.

starting_waiting_for_service_to_remain_running

βœ…

Stability grace period.

idle

βœ…

Flow running, no msgs for idle window.

active

βœ…

Processing messages.

⚠️ degraded

βœ…

Flow running but error state (e.g., endpoint retries).

stopping

βœ…

Graceful SIGTERM underway.

Idle/Active timeout: default 30 s (DFC_IDLE_WINDOW).


Quick Defaults

Parameter
Default
Source Const / Env

Idle window (Redpanda)

30 s

REDPANDA_IDLE_WINDOW

Idle window (Bridge)

30 s

DFC_IDLE_WINDOW

Container CPU limit

85 %

CONTAINER_CPU_LIMIT

Container RAM limit

90 %

CONTAINER_RAM_LIMIT

Container Disk limit

90 %

CONTAINER_DISK_LIMIT

PreviousContainer LayoutNextEnvironment Variables

Last updated 3 days ago