Data Contracts 🚧
🚧 Roadmap Item - Data contracts bind data models to storage, retention, and processing policies, ensuring consistent data handling across your industrial systems.
Data contracts define the operational aspects of your data models - where data gets stored, how long it's retained, and what processing rules apply. They bridge the gap between logical data structure (models) and physical data management.
Overview
Data contracts are stored in the datacontracts:
configuration section and reference specific versions of data models:
Core Properties
Name and Versioning
Naming Convention:
Contract names start with underscore (
_temperature
,_pump
)Versions follow semantic versioning (
v1
,v2
, etc.)Model references include version (
Temperature:v1
)
Model Binding
Each contract binds to exactly one data model version:
This binding is immutable - to change the model, create a new contract version.
Data Sinks
Contracts specify where data gets stored and processed:
Available Sinks:
timescaledb
: Automatic TimescaleDB hypertable creationcustom_dfc
: Custom data flow configurationscloud_storage
: S3-compatible storageanalytics_pipeline
: Stream analytics processing
Retention Policies
Define how long data is kept:
Complete Examples
Simple Temperature Contract
Complex Pump Contract
Generated Database Schema
When a contract with TimescaleDB sink is deployed, UMH automatically creates:
Hypertable Structure
For the _pump:v1
contract:
Location Structure
The location
field stores ISA-95 hierarchy:
Contract Evolution
Version Management
Contracts support controlled evolution:
Backward Compatibility
Multiple contract versions can coexist
Existing stream processors continue using their bound contract version
Database schemas adapt automatically for new fields
No downtime required for contract updates
Sink Configuration Details
TimescaleDB Sink
Behavior:
Auto-creates hypertable
{contract_name}_{version}
Generates appropriate column types from model constraints
Creates location indexes for ISA-95 queries
Handles sub-model field flattening automatically
Custom Data Flow Sink
Behavior:
Sends data to external systems
Supports authentication and batching
Configurable retry policies
Schema validation before transmission
Cloud Storage Sink
Behavior:
Partitioned storage by time
Multiple format support (JSON, Parquet, Avro)
Compression options
Automated lifecycle management
Validation and Enforcement
Schema Enforcement
All data contracts are registered in Redpanda Schema Registry:
Publish-time validation: Messages are validated before acceptance
Consumer protection: Invalid messages are rejected automatically
Evolution safety: Schema changes must maintain compatibility
Runtime Validation
The UNS output plugin enforces contract compliance:
Best Practices
Contract Design
Single responsibility: One contract per logical entity type
Semantic naming: Use descriptive, underscore-prefixed names
Version explicitly: Always specify model and contract versions
Plan for growth: Consider future sink requirements
Retention Planning
Match business needs: Align retention with regulatory requirements
Consider storage costs: Balance retention vs. storage expenses
Plan for archival: Design archival strategies for historical data
Sink Selection
TimescaleDB for time-series: Optimal for sensor data and analytics
Cloud storage for archives: Long-term, cost-effective storage
Custom DFC for integration: External system connectivity
Schema Evolution
Additive changes: Add fields rather than modifying existing ones
Test compatibility: Validate schema evolution before deployment
Document changes: Maintain clear change logs
Integration with Stream Processors
Data contracts are consumed by stream processors:
The stream processor:
Validates output against the contract's model schema
Routes data to configured sinks
Applies retention policies automatically
Enforces location hierarchy requirements
Related Documentation
Last updated