LogoLogo
umh-core
umh-core
  • Introduction
  • Getting Started
  • Usage
    • Unified Namespace
      • Overview
      • Payload Formats
      • Topic Convention
      • Producing Data
      • Consuming Data
    • Data Flows
      • Overview
      • Bridges
      • Stand-alone Flow
      • Stream Processor 🚧
    • Data Modeling 🚧
      • Data Models 🚧
      • Data Contracts 🚧
      • Stream Processors 🚧
  • Production
    • Updating
    • Sizing Guide
    • Corporate Firewalls
    • Metrics
    • Migration from Classic
  • Reference
    • Configuration Reference
    • Container Layout
    • State Machines
    • Environment Variables
  • UMH Core vs UMH Classic
  • UMH Classic
    • Go To Documentation
  • Ressources
    • Website
    • Blog
Powered by GitBook
On this page
  • Why Data Modeling Matters
  • From Implicit to Explicit
  • Object Hierarchy
  • Quick Example
  • 1. Raw Data Input
  • 2. Data Model Definition
  • 3. Data Contract
  • 4. Stream Processor
  • 5. Structured Output
  • Key Benefits
  • Getting Started
  • Architecture Context
  • Related Documentation
  1. Usage

Data Modeling 🚧

🚧 Roadmap Item - Unified data-modelling builds on our existing data contract foundation to provide a comprehensive approach to industrial data modeling.

UMH Core's unified data-modelling system provides a structured approach to defining, validating, and processing industrial data. It bridges the gap between raw sensor data and meaningful business information through a clear hierarchy of components.

Why Data Modeling Matters

Manufacturing companies typically start with implicit data modeling - using bridges to contextualize data factory by factory. This bottom-up approach works well for single sites: look at what's available in your PLC/Kepware, add basic metadata, rename cryptic tags like XYDAG324 to temperature, and publish to the UNS.

But as companies scale across multiple factories, they hit a wall:

  • Inconsistent schemas: Each site names the same equipment differently (motor_speed vs rpm vs rotational_velocity)

  • No standardization: Pump data from Factory A has different fields than identical pumps in Factory B

  • Analytics nightmares: Cross-site dashboards and analytics require custom mapping for every location

  • Knowledge silos: Each site's contextualization is trapped in local configurations

Explicit data modeling solves this by defining standardized templates that enforce consistency across the entire enterprise. Instead of each factory doing its own contextualization, you define once: "Every Pump has these exact fields: pressure, temperature, motor.current, motor.rpm" - then apply that template everywhere.

From Implicit to Explicit

Approach
Scope
Benefits
Limitations

Implicit (Current Bridges)

Per-factory contextualization

Quick setup, site-specific optimization

Inconsistent across sites, no templates

Explicit (Data Modeling)

Enterprise-wide standardization

Consistent schemas, reusable templates, cross-site analytics

Requires upfront design, more rigid

UMH's unified data-modelling bridges this gap: keep the flexibility of per-site bridges for raw data collection, but add explicit modeling on top for enterprise standardization.

Object Hierarchy

The unified data-modelling system uses a four-layer hierarchy:

Payload-Shape → Data-Model → Data-Contract → Stream-Processor
Layer
Purpose
Example

Canonical schema fragment (timeseries default)

timeseries, blob

Reusable class; tree of fields, folders, sub-models

Motor, Pump, Temperature

Binds model version; decides retention & sinks

_temperature:v1, _pump:v1

Runtime pipeline for model instances

furnaceTemp_sp, pump41_sp

Quick Example

Here's how the system transforms raw PLC data into structured, validated information:

1. Raw Data Input

Topic: umh.v1.corpA.plant-A.line-4.furnace1._raw.temperature_F
Payload: { "value": 1500, "timestamp_ms": 1733904005123 }

2. Data Model Definition

datamodels:
  - name: Temperature
    version: v1
    structure:
      temperature_in_c:
        type: timeseries

3. Data Contract

datacontracts:
  - name: _temperature
    version: v1
    model: Temperature:v1
    sinks:
      timescaledb: true

4. Stream Processor

streamprocessors:
  - name: furnaceTemp_sp
    contract: _temperature:v1
    location:
      level0: corpA
      level1: plant-A
      level2: line-4
      level3: furnace1
    sources:
      tF: "umh.v1.corpA.plant-A.line-4.furnace1._raw.temperature_F"
    mapping:
      temperature_in_c: "(tF - 32) * 5 / 9"

5. Structured Output

Topic: umh.v1.corpA.plant-A.line-4.furnace1._temperature.temperature_in_c
Payload: { "value": 815.6, "timestamp_ms": 1733904005123 }
Database: Auto-created TimescaleDB hypertable

Key Benefits

  • Unified YAML Dialect: Single configuration language for all transformations

  • Generic ISA-95 Support: Built-in hierarchical naming (level0-4)

  • Schema Registry Integration: All layers pushed to Redpanda Schema Registry

  • Automatic Validation: UNS output plugin rejects non-compliant messages

  • Sub-Model Reusability: Define once, use across multiple assets

  • Enterprise Reliability: Combines MQTT simplicity with data-center-grade features

  • Generic Hierarchical Support: Built-in hierarchical naming (level0-4+) supports ISA-95, KKS, or custom standards

Getting Started

Architecture Context

This unified approach builds on UMH's hybrid architecture, combining:

  • MQTT for lightweight edge communication

  • Kafka for reliable enterprise messaging

  • Data Contracts for application-level guarantees

  • Schema Registry for centralized validation

Related Documentation

PreviousStream Processor 🚧NextData Models 🚧

Last updated 3 days ago

- Create reusable data structures

- Bind models to storage and retention policies

- Implement real-time data transformation

- Use the web interface for deployment

For deeper technical background on why this hybrid approach is necessary, see our .

- Detailed runtime configuration

- Topic structure and payload formats

- Integration with other flow types

Define Data Models
Create Data Contracts
Deploy Stream Processors
comprehensive analysis of MQTT limitations and data contract solutions
Stream Processors Implementation
Unified Namespace
Data Flows Overview
Data-Model
Data-Contract
Stream-Processor
Payload-Shape
Configure in Management Console