State Machines
State machines are the core orchestration mechanism in UMH Core. Every component is managed by a finite state machine (FSM) with clearly defined states and transitions. This provides predictable, observable behavior and enables reliable error handling and recovery.
How It Works
UMH Core uses hierarchical state machines where components build upon each other:
Bridge (formerly Protocol Converter) = Connection + Source Flow + Sink Flow
Flow (DataFlow Component) = Benthos instance with lifecycle management
Benthos Flow = Individual Benthos process with detailed startup phases
Connection = Network probe service (typically nmap-based)
Each component inherits lifecycle states (to_be_created
, creating
, removing
, removed
) and adds operational states specific to its function. The Agent continuously reconciles desired vs actual state, triggering appropriate transitions based on observed conditions.
1 β Redpanda Service
stopped
β
redpanda
process not running.
stop_done from stopping, or initial create.
start β starting
starting
β
S6 launching broker; health checks pending.
start event from stopped.
start_done β idle start_failed β stopped
idle
β
Broker healthy, no data for 30 s (default idle window).
start_done or no_data_timeout from active.
data_received β active degraded β degraded stop β stopping
active
β
Broker healthy & BytesIn/OutPerSec
> 0.
data_received from idle.
no_data_timeout β idle degraded β degraded stop β stopping
β οΈ degraded
β
Broker running but β₯1 healthβcheck failing (disk-space-low
, cpu-saturated
, etc.).
degraded from idle/active.
recovered β idle stop β stopping
stopping
β
Graceful shutdown (draining clients).
stop from any running state.
stop_done β stopped
2 β Container Monitor
active
β
CPU < 85 %, RAM < 90 %, Disk < 90 %.
metrics_all_ok after monitor start or from degraded.
metrics_not_ok β degraded
β οΈ degraded
β
One of the above limits breached for 15 s.
metrics_not_ok
metrics_all_ok β active
monitoring_stopped
β
Watchdog disabled.
stop_monitoring_done
start_monitoring β monitoring_starting
monitoring_starting
β
Monitor service booting.
start_monitoring
start_monitoring_done β degraded (initial)
monitoring_stopping
β
Monitor shutting down.
stop_monitoring
stop_monitoring_done β monitoring_stopped
3 β Agent Monitor
active
β
Agent connected & internal tasks OK.
metrics_all_ok
metrics_not_ok β degraded
β οΈ degraded
β
Cloud unreachable / auth error / task panic.
metrics_not_ok
metrics_all_ok β active
monitoring_stopped
β
Agent health monitor off.
stop_monitoring_done
start_monitoring β monitoring_starting
monitoring_starting
β
Starting health checks.
start_monitoring
start_monitoring_done β degraded (initial)
monitoring_stopping
β
Halting checks.
stop_monitoring
stop_monitoring_done β monitoring_stopped
4 β DataFlow Component (Bridge)
Aggregate Bridge FSM
stopped
β
All subβservices stopped.
stop_done or after create.
start β starting
starting
β
Launching source & sink Benthos + connection monitor.
start
start_done β idle start_failed β starting_failed
starting_failed
β
At least one subβservice failed during start.
start_failed
Manual retry (start) or removal
idle
β
Subβservices healthy, no payload for 30 s.
start_done, no_data_received, recovered
data_received β active benthos_degraded β degraded stop β stopping
active
β
Data moving through at least one flow.
data_received
no_data_received β idle benthos_degraded β degraded stop β stopping
β οΈ degraded
β
β₯1 subβFSM degraded/down (connection lost, flow error).
benthos_degraded
benthos_recovered β idle stop β stopping
stopping
β
Stopping Benthos + connection monitor.
stop
stop_done β stopped
4.1 Connection Service FSM
starting
β
Probe service launching.
up
β
Target reachable.
down
β
Target unreachable.
β οΈ degraded
β
Flaky / intermittent responses.
stopping
β
Probe shutting down.
stopped
β
Probe disabled.
4.2 Benthos Flow (Source /Sink)
stopped
β
Service file present, process not running.
starting
β
S6 launched process.
starting_config_loading
β
Benthos parsing YAML pipeline.
starting_waiting_for_healthchecks
β
Pipeline loaded; waiting for plugin health.
starting_waiting_for_service_to_remain_running
β
Stability grace period.
idle
β
Flow running, no msgs for idle window.
active
β
Processing messages.
β οΈ degraded
β
Flow running but error state (e.g., endpoint retries).
stopping
β
Graceful SIGTERM underway.
Idle/Active timeout: default 30 s (
DFC_IDLE_WINDOW
).
5 β Topic Browser Service
The Topic Browser service manages real-time topic discovery and caching.
stopped
β
Service not running
Initial state or stop_done
start β starting
starting
β
Service initialization
start
benthos_started β starting_benthos
starting_benthos
β
Benthos starting
benthos_started
redpanda_started β starting_redpanda
starting_redpanda
β
Redpanda connection
redpanda_started
start_done β idle
idle
β
Healthy, no active data
start_done or recovered
data_received β active
active
β
Processing topic data
data_received
no_data_timeout β idle
β οΈ degraded_benthos
β
Benthos degraded
benthos_degraded
recovered β idle
β οΈ degraded_redpanda
β
Redpanda degraded
redpanda_degraded
recovered β idle
stopping
β
Graceful shutdown
stop
stop_done β stopped
Default: Active (runs automatically) Transitions: idle β active based on topic activity Recovery: Automatic from degraded states when underlying services recover
Quick Defaults
Idle window (Redpanda)
30 s
REDPANDA_IDLE_WINDOW
Idle window (Bridge)
30 s
DFC_IDLE_WINDOW
Container CPU limit
85 %
CONTAINER_CPU_LIMIT
Container RAM limit
90 %
CONTAINER_RAM_LIMIT
Container Disk limit
90 %
CONTAINER_DISK_LIMIT
Last updated