This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Concepts

The software of the United Manufacturing Hub is designed as a modular system. Our software serves as a basic building block for connecting and using various hardware and software components quickly and easily. This enables flexible use and thus the possibility to create comprehensive solutions for various challenges in the industry.

High-level architecture

The United Manufacturing Hub consists out of three layers and two packages for installation:

Overview

The following sub-chapters will explain the layers and the two packages further. If you want to deep-dive into the actual architecture, you can scroll further down

Lower level: data acquisition

The data sources connected to the edge device provide the foundation for automatic data collection. The data sources can be external sensors (e.g. light barriers, vibration sensors), input devices (e.g. button bars), Auto-ID technologies (e.g. barcode scanners), industrial cameras and other data sources such as machine PLCs. The wide range of data sources allows the connection of all machines, either directly via the machine PLC or via simple and fast retrofitting with external sensors.

Examples:

  • sensorconnect (to automatically read out IO-Link Master and their connected sensors)
  • cameraconnect (to automatically read out GenICam compatible cameras and push the result into MQTT, in development)
  • barcodereader (to connect USB barcodereader and push data into MQTT)
  • Node-RED (e.g., for propertiary and / or machine specific protocols)
  • PLC4X (in development)

Middle layer: data infrastructure

This layer is the central part of the United Manufacturing Hub. It provides an infrastructure including data models to fulfull all manufacturing needs for data processing and storage.

It starts by making all acquired data accessible in real-time for data processing using either established solutions like Node-RED or your own written software using a microservice approach and MQTT. Therefore, adding new data, processing it or integrating it with other systems on-the-edge is very easy. We recommend to start transforming data into the central data model at this step.

To send the raw and / or processed data to a central place (cloud or on-premise) we use our self-written MQTT bridge. Internet connections or network in general is often unstable in manufacturing environments and therefore one needs to safely buffer messages across internet or electricity downtimes. As existing MQTT bridge solutions were unreliable we developed our own.

Once the data arrives at the server it can be further processed using the same methods as on-the-edge (MQTT microservice, Node-RED, etc.). The real-time data can also integrated into MES or ERP systems.

All processed data is then stored into databases using load-balanced microservices with caching. Therefore, one can archieve high-availability and enourmous scalability through the load-balanced microservices. Furthermore, common requests and operations are cached in redis

Relational data (e.g., data about orders and products) as well as time series data in high resolution (e.g., machine data like temperature) can be stored in the TimescaleDB database (difference between InfluxDB and timescaleDB have been described in detail). Blob data (e.g., camera pictures) will be stored in a blob storage either directly in Minio or using a Minio gateway in a cloud specific storage like AWS S3 or Microsoft Azure Blob Storage.

We do not allow direct access to the databases for performance and security reasons. Instead, we’ve put an additional, self-written, component in front called factoryinsight. factoryinsight provides a REST API to access raw data from the databases as well as processed data in form of KPI’s like ‘OEE losses’ or similar. All requests are load-balanced, cached and executed only on a replica of the database.

To insert data via a REST API we’ve developed two additional services grafana-proxy and factoryinput.

Examples:

  • TimescaleDB
  • Node-RED
  • factoryinput
  • factoryinsight
  • minio

Higher level: visualization

As a standard dashboarding tool the United Manufacturing Hub uses Grafana in combination with self-written plugins, which allow every user (even without programming knowledge) to quickly and easilyy compose personally tailored dashboards with the help of modular building blocks.

Examples:

The right side: deployment options

The entire stack can be deployed using only a configuration file (values.yaml) and the corresponding Helm charts factorycube-server and factorycube-edge.

This allows to deploy the architecture in hybrid setups, from deploying it on-the-edge IIoT gateways to on-premise servers to the cloud (e.g., Azure AKS)

Low-level architecture

If you want to go more into detail, here is the detailled architecture:

Software

1 - Node-RED in Industrial IoT: a growing standard

How an open-source tool is establishing itself in a highly competitive environment against billion dollar companies

Using Node-RED and UaExpert to extract data from the PLC of a Saw

Most people know Node-RED from the areas of smart home or programming introductions (those workshops where you connect things with microcontrollers). Yet, very few people realize that it is frequently used in manufacturing as well.

For those of you that do not know it yet, here is the official self-description from the Node-RED website:

Node-RED is a programming tool for wiring together hardware devices, APIs and online services in new and interesting ways.

It provides a browser-based editor that makes it easy to wire together flows using the wide range of nodes in the palette that can be deployed to its runtime in a single-click.

And the best thing: it is open-source

The project started in early 2013 in IBM’s research centers. In 2016 it was one of the founding projects of the JS Foundation. Since the version release 1.0 in 2019 it is considered safe for production use.

A self-conducted survey in the same year showed that from 515 respondents, 31.5% use Node-RED in manufacturing, and from 868 respondents, 24% said they have created a PLC application using it 1. Also, 24.2 % of 871 respondents said that they use InfluxDB in combination with Node-RED. The reason we think that TimescaleDB is better suited for the Industrial IoT than InfluxDB has been described in this article.

But how widespread is it really in manufacturing? What are these users doing with Node-RED? Let’s deep dive into that!

Usage of Node-RED in Industry

Gathering qualitative data of industry usage of specific solutions can be hard to almost impossible as very few companies are open about the technologies they use. However, we can still gather quantitative data, which strongly indicates a heavy usage in various industries in data extraction and processing.

First, it is preinstalled on more and more various automation systems like PLCs. Wikpedia has a really good overview here (and it checks out!). Siemens, in particular, is starting to use it more often, see also Node-RED with SIMATICIOT2000 or the Visual Flow Creator

Furthermore, various so-called “nodes” are available that can only be used in manufacturing environments, e.g., to read out data from specific devices. These nodes also have quite impressive download numbers.

Some examples:

We’ve talked with the former developer of two of these nodes, Klaus Landsdorf from the German company Iniationware, which offers companies support in the topics of OPC-UA, Modbus, BACnet and data modeling.

Klaus confirmed our hypothesis:

We get many requests from German hardware manufacturers that rely on Node-RED and on these industry-specific nodes like OPC-UA. The OPC-UA project was sponsored by just two small companies with round about 5% of the costs for development in the case of the IIoT OPC-UA contribution package. But in view of using the package and testing it across multiple industrial manufacturing environments to ensure a high stability, we had many and also big companies aboard. In education we have a great response from ILS, because they are using the Iniationware package node-red-contrib-iiot-opcua to teach their students about OPC-UA essentials. Unfortunately, just a few companies understand the idea of a commercial backing for open-source software companies by yearly subscriptions, which could safe a lot of money for each of them. Do it once, stable and share the payment in open-source projects! That would bring a stable community and contribution packages for the specific reley on industrial needs like LTS versions. Simplified: it needs a bit money to make money in a long term as well as to provide stable and up to date Node-RED packages.

It is also described in the community as being production-ready and used quite frequently. In a topic discussing the question of production readiness a user with the name of SonoraTechnical says:

Although anecdotal, just Friday, I was speaking to an engineer at a major OPC Software Vendor who commented that they see Node-RED frequently deployed by industrial clients and even use it internally for proving out concepts and technology.

Another one with the name of gemini86 explains the advantages compared with commercial solutions:

I’m also (very much) late to the party on this, but I work in manufacturing and use AB, Siemens, Codesys, etc. I also use Node-RED for SCADA and database bridging. Our site has well pumps in remote areas where data and commands are sent over 900mhz ethernet radios, and Node-RED handles the MQTT <> modbusRTU processing. Node-RED has been as stable and quick, if not quicker than any Siemens or AB install with comparable network functionality. In fact, I struggled to get my S7-1200 to properly communicate with modbusRTU devices at all. I was completely baffled by their lack of documentation on getting it to work. Their answer? “Use profibus/profinet.” So, I myself prefer Node-RED for anything to do with serial or network communications.

Last but not least, it is very frequently used in scientific environments. There are over 3.000 research papers available on Google Scholar on the usage of Node-RED in industrial environments!

Therefore, it is safe to say that it is widespread, with growing numbers of users in industry. But what exactly can you do with it? Let us give some examples of how we are using it!

What you can do with it

The United Manufacturing Hub relies on Node-RED as a tool to

  1. Extract data from production machines using various protocols (OPC/UA, Modbus, S7, HTTP, TCP, …)
  2. Processing and unifying data points into our standardized data model
  3. Customer-specific integrations into existing systems, e.g., MES or ERP systems like SAP or Oracle
  4. Combining data from various machines and triggering actions (machine to machine communication or, in short, M2M)
  5. Creating small interactive and customer-specific dashboards to trigger actions like specifying stop reasons

Let’s explain each one by going through them step-by-step:

1. Extract data from production machines using various protocols

One central challenge of Industrial IoT is obtaining data. The shopfloor is usually fitted out with machines from various vendors and of different ages. As there is almost little or no standardization in the protocols or semantics, the data extraction process needs to be customized for each machine.

With Node-RED, various protocols are available as so-called “nodes” - from automation protocols like OPC/UA (see earlier) to various IT protocols like TCP or HTTP. For any other automation protocol, you can use PTC Kepware, which supports over 140 various PLC protocols.

2. Processing and unifying data points into our standardized data model

Node-RED was originally developed for

visualizing and manipulating mappings between MQTT topics 2 and this is what we are still using it for today. All these data points that have been extracted from various production machines now need to be standardized to match our data model. The machine state needs to be calculated, the machines’ output converted from various formats into a simple /count message, etc.

More information about this can be found in our data model for Industrial IoT.

Example of working with the United Manufacturing Hub. Everything is flow-based.

3. Customer-specific integrations into existing systems

It is not just good for extracting and processing data. It is also very good for pushing this processed data back into other systems, e.g., MES or ERP systems like Oracle or SAP. These systems usually have REST APIs, e.g., here is an example for the REST API for the Oracle ERP.

As the customer implementations of those systems are usually different, the resulting APIs are mostly also different. Therefore, one needs a system that is quick to use to handle those APIs. And Node-RED is perfect for this.

4. Machine to machine communication

The AGV automatically gets the finished products from one machine and brings them to empty stations, which is a good example for M2M

As a result of our data architecture machine to machine communication (M2M) is enabled by default. The data from all edge devices is automatically sent to a central MQTT broker and is available to all connected devices (that have been allowed access to that data).

It is easy to gather data from various machines and trigger additional actions, e.g., to trigger the Automated Guided Vehicle (AGV) to fetch material from the production machine when one station is empty of material.

And the perfect tool to set those small triggers is, as you might have guessed, Node-RED.

5. Creating small interactive and customer-specific dashboards

Example of a dashboard using node-red-dashboard. It features a multi-level stop reason selection and the visualization of production speed.

Sometimes the machine operators need time-sensitive dashboards to retrieve real-time information or to interact with the system. As many companies still do not have a good and reliable internet connection or even network infrastructure, one cannot wait until the website is fully loaded to enter a stop reason. Therefore, sometimes it is crucial to have a dashboard as close to the machine as possible (and not sitting somewhere in the cloud).

For this one, you can use the node-red-dashboard node, which allows you to easily create dashboards and interact with the data via MQTT.

Bonus: What not to do: process control

However, we strongly recommend NOT using it to intervene in the production process, e.g., process control or ensuring safety mechanisms for two reasons:

  1. IT tools and systems like Node-RED are not designed to ensure the safety of machines or people, e.g., guaranteed time-sensitive reactions to a triggered safety alert
  2. It would also be almost impossible to get that certified and approved due to 1. For these aspects, very good and safe tools, like PLCs or NCs, are already out there in the automation world.

Summary

The slogan: “The best things in life are free” also applies in manufacturing:

Node-RED is on the same level as “professional” closed-source and commercial solutions and is used by thousands of researchers and hundreds of daily users in various manufacturing industries.

It is included and enabled in every installation of the United Manufacturing Hub - in the cloud and on the edge.

More information on how we use the system can be found in our Quick Start.


  1. https://nodered.org/about/community/survey/2019/ ↩︎

  2. https://nodered.org/about/ ↩︎

2 - The UMH datamodel / MQTT

All events or subsequent changes in production are transmitted via MQTT in the following data model

Introduction

All events or subsequent changes in production are transmitted via MQTT in the following data model. This ensures that all participants are always informed about the latest status.

The data model in the MQTT Broker can be divided into four levels. In general, the higher the level, the lower the data frequency and the more the data is prepared.

If you do not know the idea of MQTT (important keywords: “broker”, “subscribe”, “publish”, “topic”), we recommend reading the wikipedia article first.

All MQTT messages consist out of one JSON with at least two elements in it:

KeyData type/formatDescription
timestamp_msintthe amount of milliseconds since the 1970-01-01 (also called UNIX timestamp in milliseconds)
<valueName>int, str, dicta value that can be int, str, or even in dict format

1st level: Raw data

Data from this level are all raw data, which are not yet contextualized(i.e., assigned to a machine). These are, in particular, all data from sensorconnect and cameraconnect.

Topic: ia/raw/

Topic structure: `ia/raw/.+'

All raw data coming in via sensorconnect.

Topic: ia/rawImage/

Topic structure: ia/rawImage/<TransmitterID>/<MAC Adress of Camera>

All raw data coming in via cameraconnect.

keydata typedescription
image_idstra unique identifier for every image acquired (e.g. format:<MACaddress>_<timestamp_ms>)
image_bytesstrbase64 encoded image in JPG format in bytes
image_heightintheight of the image in pixel
image_widthintwidth of the image in pixel
image_channelsintamount of included color channels (Mono: 1, RGB: 3)

2nd level: Contextualized data

In this level the data is already assigned to a machine.

Topic structure: ia/<customerID>/<location>/<AssetID>/<Measurement> e.g. ia/dccaachen/aachen/demonstrator/count.

An asset can be a step, machine, plant or line. It uniquely identifies the smallest location necessary for modeling the process.

By definition all topic names should be lower case only!

/count

Topic: ia/<customerID>/<location>/<AssetID>/count

Here a message is sent every time something has been counted. This can be, for example, a good product or scrap.

count in the JSON is an integer. scrap in the JSON is an integer, which is optional. It means scrap pieces of count are scrap. If not specified it is 0 (all produced goods are good).

keydata typedescription
countintquantity of produced item

/scrapCount

Topic: ia/<customerID>/<location>/<AssetID>/scrapCount

Here a message is sent every time products should be marked as scrap. It works as follows: A message with scrap and timestamp_ms is sent. It starts with the count that is directly before timestamp_ms. It is now iterated step by step back in time and step by step the existing counts are set to scrap until a total of scrap products have been scraped.

scrap in the JSON is an integer.

keydata typedescription
scrapintNumber of item from count that is considered as scrap. When scrap is equal to 0, that means all produced goods are good quality

/barcode

Topic: ia/<customerID>/<location>/<AssetID>/barcode

keydata typedescription
barcodestrA message is sent here each time the barcode scanner connected to the transmitter via USB reads a barcode via barcodescanner

/activity

Topic: ia/<customerID>/<location>/<AssetID>/activity

keydata typedescription
activityboolA message is sent here every time the machine runs or stops (independent whether it runs slow or fast, or which reason the stop has. This is covered in state)

/detectedAnomaly

Topic: ia/<customerID>/<location>/<AssetID>/detectedAnomaly

keydata typedescription
detectedAnomalystrA message is sent here each time a stop reason has been identified automatically or by input from the machine operator (independent whether it runs slow or fast, or which reason the stop has. This is covered in state)

/addShift

Topic: ia/<customerID>/<location>/<AssetID>/addShift

keydata typedescription
timestamp_ms_endintA message is sent here each time a new shift is started. The value represents a UNIX timestamp in milliseconds

/addOrder

Topic: ia/<customerID>/<location>/<AssetID>/addOrder

A message is sent here each time a new order is started.

keydata typedescription
product_idstrRepresents the current product name
order_idstrRepresents the current order name
target_unitsintRepresents the amount of target units to be produced (in the same unit as count)

/addProduct

Topic: ia/<customerID>/<location>/<AssetID>/addProduct

A message is sent here each time a new product is added.

keydata typedescription
product_idstrRepresents the current product name
time_per_unit_in_secondsfloatSpecifies the target time per unit in seconds

/startOrder

Topic: ia/<customerID>/<location>/<AssetID>/startOrder

A message is sent here each time a new order is started.

keydata typedescription
order_idstrRepresents the order name

/endOrder

Topic: ia/<customerID>/<location>/<AssetID>/endOrder

A message is sent here each time a new order is started.

keydata typedescription
order_idstrRepresents the order name

/processValue

Topic: ia/<customerID>/<location>/<AssetID>/processValue

A message is sent here every time a process value has been prepared. Unique naming of the key.

keydata typedescription
<valueName>int or floatRepresents a process value, e.g. temperature.

/productImage

Topic structure: ia/<customer>/<location>/<assetID>/productImage

/productImage has the same data format as ia/rawImage, only with a changed topic.

/productImage can be acquired in two ways, either from ia/rawImage or /rawImageClassification. In the case of /rawImageClassification, only the Image part is extracted to /productImage, while the classification information is stored in the relational database.

keydata typedescription
image_idstra unique identifier for every image acquired (e.g. format:<MACaddress>_<timestamp_ms>)
image_bytesstrbase64 encoded image in JPG format in bytes
image_heightintheight of the image in pixel
image_widthintwidth of the image in pixel
image_channelsintamount of included color channels (Mono: 1, RGB: 3)

/productTag

Topic structure: ia/<customer>/<location>/<assetID>/productTag

/productTag is usually generated by contextualizing a processValue to a product.

keydata typedescription
AIDstrlorem ipsum
namestrlorem ipsum
valueintlorem ipsum

See also Digital Shadow for more information how to use this message

/productTagString

Topic structure: ia/<customer>/<location>/<assetID>/productTagString

/productTagString is usually generated by contextualizing a processValue to a product.

keydata typedescription
AIDstrlorem ipsum
namestrlorem ipsum
valuestrlorem ipsum

See also Digital Shadow for more information how to use this message

/addParentToChild

Topic structure: ia/<customer>/<location>/<assetID>/addParentToChild

/productTagString is usually generated whenever a product is transformed into another product. It can be used multiple times for the same child to model that one product can consists out of multiple parents.

keydata typedescription
childAIDstrThe AID of the child
parentAIDstrThe AID of the parent

See also Digital Shadow for more information how to use this message

3rd level: production data

This level contains only highly aggregated production data.

/state

Topic: ia/<customerID>/<location>/<AssetID>/state

A message is sent here each time the asset changes status. Subsequent changes are not possible. Different statuses can also be process steps, such as “setup”, “post-processing”, etc. You can find a list of all supported states here

keydata typedescription
stateintValue of state according to this datamodel

/cycleTimeTrigger

Topic: ia/<customerID>/<location>/<AssetID>/cycleTimeTrigger

A message should be sent under this topic whenever an assembly cycle is started.

currentStation in the JSON is a string lastStation in the JSON is a string sanityTime_in_s in the JSON is a integer

keydata typedescription
currentStationstr
lastStationstr
sanityTime_in_sint

/uniqueProduct

Topic: ia/<customerID>/<location>/<AssetID>/uniqueProduct

A message is sent here each time a product has been produced or modified. A modification can take place, for example, due to a downstream quality control.

There are two cases of when to send a message under the uniqueProduct topic:

  • The exact product doesn’t already have a UID (-> This is the case, if it has not been produced at an asset incorporated in the digital shadow). Specify a space holder asset = “storage” in the MQTT message for the uniqueProduct topic.
  • The product was produced at the current asset (it is now different from before, e.g. after machining or after something was screwed in). The newly produced product is always the “child” of the process. Products it was made out of are called the “parents”.
keydata typedescription
begin_timestamp_msintStart time
end_timestamp_msintCompletion time
product_idstrThe product ID that is currently produced
isScrapboolInformation whether the current product is of poor quality and will be sorted out. Default value (if not specified otherwise) is false
uniqeProductAlternativeIDstrlorem ipsum

See also Digital Shadow for more information how to use this message

/scrapUniqueProduct

Topic: ia/<customerID>/<location>/<AssetID>/scrapUniqueProduct

A message is sent here each time a unique product has been scrapped.

keydata typedescription
UIDstrUnique ID of the current single product

4th level: Recommendations for action

/recommendations

Topic: ia/<customerID>/<location>/<AssetID>/recommendations

Shopfloor insights are recommendations for action that require concrete and rapid action in order to quickly eliminate efficiency losses on the store floor.

keydata type/formatdescription
recommendationUIDintUnique ID of the recommendation. Used to subsequently deactivate a recommendation (e.g. if it has become obsolete)
recommendationTypeintThe ID / category of the current recommendation. Used to narrow down the group of people
recommendationValuesdictValues used to form the actual recommendation set

Explanation of IDs

There are various different IDs that you can find in the MQTT messages. This section is designed to give an overview.

Namedata type/formatdescription
product_idintThe type of product that should be produced in an order for a specific asset. Can be used to retrieve the target speed.
order_idintOrder ID, which provides the type of product (see product_id) and the amount of pieces that should be produced.
uniqeProductAlternativeIDintIn short: AID. Used to describe a single product of the type product_id in the order order_id. This is the ID that might be written on the product (e.g., with a physical label, lasered, etc.) and is usually the relevant ID for engineers and for production planning. It usually stays the same.
UIDintShort for unique product ID. Compared to the AID the UID changes whenever a product changes its state. Therefore, a product will change its UID everytime it is placed on a new asset. It is used mainly on the database side to lookup a specific product in a specific state.

These IDs are linked together in the database.

TimescaleDB structure

Here is a scheme of the timescaleDB structure:

(open the image using the right click for a better resolution)

3 - Available states for assets

This data model maps various machine states to relevant OEE buckets.

Introduction

This data model is based on the following specifications:

  • Weihenstephaner Standards 09.01 (for filling)
  • Omron PackML (for packaging/filling)
  • EUROMAP 84.1 (for plastic)
  • OPC 30060 (for tobacco machines)
  • VDMA 40502 (for CNC machines)

Additionally, the following literature is respected:

  • Steigerung der Anlagenproduktivität durch OEE-Management (Focke, Steinbeck)

Abbreviations

  • WS –> “TAG NAME”: Valuename (number)
  • PackML –> Statename (number)
  • EUROMAP –> Statusname (number)
  • Tobacco –> ControlModeName (number)

ACTIVE (10000-29999)

The asset is actively producing.

10000: ProducingAtFullSpeedState

The asset is running on full speed.

Examples for ProducingAtFullSpeedState

  • WS_Cur_State: Operating
  • PackML/Tobacco: Execute

20000: ProducingAtLowerThanFullSpeedState

The asset is NOT running on full speed.

Examples for ProducingAtLowerThanFullSpeedState

  • WS_Cur_Prog: StartUp
  • WS_Cur_Prog: RunDown
  • WS_Cur_State: Stopping
  • PackML/Tobacco: Stopping
  • WS_Cur_State: Aborting
  • PackML/Tobacco: Aborting
  • WS_Cur_State: Holding
  • WS_Cur_State: Unholding
  • PackML/Tobacco: Unholding
  • WS_Cur_State: Suspending
  • PackML/Tobacco: Suspending
  • WS_Cur_State: Unsuspending
  • PackML/Tobacco: Unsuspending
  • PackML/Tobacco: Completing
  • WS_Cur_Prog: Production
  • EUROMAP: MANUAL_RUN
  • EUROMAP: CONTROLLED_RUN

NOT INCLUDED FOR NOW:

  • WS_Prog_Step: all

UNKNOWN (30000-59999)

The asset is in an unspecified state.

30000: UnknownState

We do not have any data for that asset (e.g. connection to PLC aborted).

Examples for UnknownState

  • WS_Cur_Prog: Undefined
  • EUROMAP: Offline

40000: UnspecifiedStopState

The asset is not producing, but we do not know why (yet).

Examples for UnspecifiedStopState

  • WS_Cur_State: Clearing
  • PackML/Tobacco: Clearing
  • WS_Cur_State: Emergency Stop
  • WS_Cur_State: Resetting
  • PackML/Tobacco: Clearing
  • WS_Cur_State: Held
  • EUROMAP: Idle
  • Tobacco: Other
  • WS_Cur_State: Stopped
  • PackML/Tobacco: Stopped
  • WS_Cur_State: Starting
  • PackML/Tobacco: Starting
  • WS_Cur_State: Prepared
  • WS_Cur_State: Idle
  • PackML/Tobacco: Idle
  • PackML/Tobacco: Complete
  • EUROMAP: READY_TO_RUN

50000: MicrostopState

The asset is not producing for a short period (typically around 5 minutes), but we do not know why (yet).

MATERIAL (60000-99999)

The asset has issues with materials.

60000: InletJamState

The machine does not perform its intended function due to a lack of material flow in the infeed of the machine detected by the sensor system of the control system (machine stop). In the case of machines that have several inlets, the condition of lack in the inlet refers to the main flow, i.e. to the material (crate, bottle) that is fed in the direction of the filling machine (central machine). The defect in the infeed is an extraneous defect, but because of its importance for visualization and technical reporting, it is recorded separately.

Examples for InletJamState

  • WS_Cur_State: Lack

70000: OutletJamState

The machine does not perform its intended function as a result of a jam in the good flow discharge of the machine detected by the sensor system of the control system (machine stop). In the case of machines that have several discharges, the jam in the discharge condition refers to the main flow, i.e. to the good (crate, bottle) that is fed in the direction of the filling machine (central machine) or is fed away from the filling machine. The jam in the outfeed is an external fault 1v, but it is recorded separately “because” of its importance for visualization and technical reporting.

Examples for OutletJamState

  • WS_Cur_State: Tailback

80000: CongestionBypassState

The machine does not perform its intended function due to a shortage in the bypass supply or a jam in the bypass discharge of the machine detected by the sensor system of the control system (machine stop). This condition can only occur in machines that have two outlets or inlets and in which the bypass is in turn the inlet or outlet of an upstream or downstream machine of the filling line (packaging and palletizing machines). The jam/shortage in the auxiliary flow is an external fault, but is recorded separately due to its importance for visualization and technical reporting.

Examples for CongestionBypassState

  • WS_Cur_State: Lack/Tailback Branch Line

90000: MaterialIssueOtherState

The asset has a material issue, but it is not further specified.

Examples for MaterialIssueOtherState

  • WS_Mat_Ready (Information about which material is lacking)
  • PackML/Tobacco: Suspended

PROCESS (100000-139999)

The asset is in a stop which is belongs to the process and cannot be avoided.

100000: ChangeoverState

The asset is in a changeover process between products.

Examples for ChangeoverState

  • WS_Cur_Prog: Program-Changeover
  • Tobacco: CHANGE OVER

110000: CleaningState

The asset is currently in a cleaning process.

Examples for CleaningState

  • WS_Cur_Prog: Program-Cleaning
  • Tobacco: CLEAN

120000: EmptyingState

The asset is currently emptied, e.g. to prevent mold for food products over the long breaks like the weekend.

Examples for EmptyingState

  • Tobacco: EMPTY OUT

130000: SettingUpState

The machine is currently preparing itself for production, e.g. heating up.

Examples for SettingUpState

  • EUROMAP: PREPARING

OPERATOR (140000-159999)

The asset is stopped because of the operator.

140000: OperatorNotAtMachineState

The operator is not at the machine.

150000: OperatorBreakState

The operator is in a break. note: different than planned shift as it could count to performance losses

Examples for OperatorBreakState

  • WS_Cur_Prog: Program-Break

PLANNING (150000-179999)

The asset is stopped as it is planned to stop (planned idle time).

160000: NoShiftState

There is no shift planned at that asset.

170000: NoOrderState

There is no order planned at that asset.

TECHNICAL (180000-229999)

The asset has a technical issue.

180000: EquipmentFailureState

The asset itself is defect, e.g. a broken engine.

Examples for EquipmentFailureState

  • WS_Cur_State: Equipment Failure

190000: ExternalFailureState

There is a external failure, e.g. missing compressed air

Examples for ExternalFailureState

  • WS_Cur_State: External Failure

200000: ExternalInterferenceState

There is an external interference, e.g. the crane to move the material is currently unavailable.

210000: PreventiveMaintenanceStop

A planned maintenance action.

Examples for PreventiveMaintenanceStop

  • WS_Cur_Prog: Program-Maintenance
  • PackML: Maintenance
  • EUROMAP: MAINTENANCE
  • Tobacco: MAINTENANCE

220000: TechnicalOtherStop

The asset has a technical issue, but it is not specified further.

Examples for TechnicalOtherStop

  • WS_Not_Of_Fail_Code
  • PackML: Held
  • EUROMAP: MALFUNCTION
  • Tobacco: MANUAL
  • Tobacco: SET UP
  • Tobacco: REMOTE SERVICE

4 - Digital Shadow - track and trace

A system of features allowing tracking and tracing of individual parts through the production process. This article explains how it can be applied and how it works.

Introduction

During the production process a lot of data is generated (e.g., process values like temperature or images), but typically not linked to specific products. When investigating product returns, one needs to gather data from various data sources to understand what exactly happened, which might take so much time that it is not done at all.

The life would be much easier for a quality inspector if he would be able to enter the ID of the defect product and then receive all related information to it - from temperatures during the production to test results to product images.

Solution: We’ve expanded the United Manufacturing Hub so that the end-user only needs to do two things:

  1. Connect data sources with MQTT, for example by leveraging barcodereader and sensorconnect
  2. Process this raw data with Node-RED and send MQTT messages according to the UMH specification
  3. Access the processed data either in a BI tool or using Grafana

To allow this to happen the backend has been modified to support multiple new MQTT message types in mqtt-to-postgresql and to provide more endpoints to fetch Digital Shadow related data from the database factoryinsight.

Dataflow

Concepts of digital shadow in the United Manufacturing Hub

(right click on the image and open it for a better resolution)

This is the overview of the digital shadow concept. It follows the general design principles of the United Manufacturing Hub by sending all raw sensor data first to an MQTT broker and then continuously processing it.

The following chapters are going through the concept from left to right (from the inputs of the digital shadow to the outputs).

Step 1: Gathering data from the shopfloor

Data sources are connected by sending their data to the central MQTT broker. UMH recommends to stick to the data definition of the UMH datamodel for the topics and messages, but the implementation is client specific and can be modeled for the individual problem.

Example

This example is acquired using barcodereader

Topic: ia/rawBarcode/2020-0102/210-156
Topic structure: ia/rawBarcode/<transmitterID>/<barcodeReaderID>

{
    "timestamp_ms": 1588879689394, 
    "barcode": "1284ABCtestbarcode"
}

Step 2: contextualizing the data using microservices and MQTT

Now the information is available at the MQTT broker because of that to all subscribed services.

In the next step this raw data is contextualized, which means to link it to specific products. To identify a product two different type of IDs are used: AID’s and UID’s (identifiers are later explained in detail).

The raw data from the data sources needs to be converted to four different MQTT message types:

To do that we recommend writing microservices. You can do that either in Node-RED (our recommendation) or in a programming language of your choice. These microservices convert messages under a raw topic into messages under processValue or processValueString.

This typically only requires resending the message under the appropriate topic or breaking messages with multiple values apart into single ones.

Generating the contextualized messages

The goal is to convert messages under the processValue and the processValueString topics, containing all relevant data, into messages under the topic productTag, productTagString and addParentToChild. The latter messages contain AID’s which hold the contextualization information - they are tied to a single product.

The implementation of the generation of the above mentioned messages with contextualized information is up to the user and depends heavily on the specific process. To help with this we want to present a general logic and talk about the advantages and disadvantages of it:

General steps:

  1. Make empty containers for predefined messages to mqtt-to-postgresql when the first production step took place
  2. Fill containers step by step when relevant messages come in.
  3. If full, send the container.
  4. If the message from the first production step for the new product is received before the container is full, send container and set missing fields to null. Also send an error message.

Example process:

  1. parent ID 1 scanned (specifically the later explained AID) -> barcode sent under processValueString topic
  2. screws fixed -> torque processValue send
  3. child AID scanned -> barcode processValueString send
  4. parent AID 2 scanned -> barcode processValueString send

Example of generating a message under productTagString topic containing the measured torque value for the Example process:

  • when parent AID scanned: make empty container for message because scanning parent AID is first step
{
    "timestamp_ms": 
    "AID":
    "name": "torque",
    "value":
}
  • when torque value comes in: fill in value and timestamp
{
    "timestamp_ms": 13498435234,
    "AID":
    "name": "torque",
    "value": 1.458
}
  • when child AID comes in: fill it in:
{
    "timestamp_ms": 13498435234,
    "AID": "34258349857",
    "name": "torque",
    "value": 1.458
}

Now the container is full: send it away.

Important: always send the uniqueProduct message first and afterwards the messages for the related productTag/productTagString and messages on the addParentToChild topic.

Advantages and disadvantages of presented process

ProCon
simplenot stateless
general usability goodmight need a lot of different containers if the number of e.g. productTag messages gets to big

Identifiers

The explaination of the IDs can be found in the UMH datamodel (especially the definition of the terms AID and UID)

Definition of when to change the UID

If we can move a product from point “A” in the production to point “B” or back without causing problems from a process perspective, the UID of the product should stay the same. (For example if the product only gets transported between point “A” and “B”).

If moving the object produces problems (e.g. moving a not yet tested object in the bin “tested products”), the object should have gotten a new UID on its regular way.

Example 1: Testing

Even though testing a product doesn’t change the part itself, it changes its state in the production process:

  • it gets something like a virtual “certificate”
  • the value increases because of that

-> Make a new UID.

Example 2: Transport

Monitored Transport from China to Germany (This would be a significant distance: transport data would be useful to include into digital shadow)

  • parts value increases
  • transport is separately paid
  • not easy to revert

-> Make a new UID

Life of a single UID
Typecreation UIDdeath UID
without inheritance at creationtopic: storage/uniqueProduct/addParentToChild (UID is parent)
with inheritance at creationtopic: <asset>/uniqueProduct + addParentToChild (UID is child)/addParentToChild (UID is parent)

MQTT messages under the productTag topic should not be used to indicate transport of a part. If transport is relevant, change the UID (-> send a new MQTT message to mqtt-to-postgresql under the uniqueProduct topic).

Example process to show the usage of AID’s and UID’s in the production:
Explanation of the diagram:

Assembly Station 1:

  • ProductA and ProductB are combined into ProductC
  • Because ProductA and ProductB have not been “seen” by the digital shadow, they get a new UID and asset = “storage” assigned (placeholder asset for unknown/unspecified origin).
  • After ProductC is now produced it gets a new UID and as an asset, Assy1, because it is the child at Assembly Station 1
  • The AID of the child can always be freely chosen out of the parent AID’s. The AID of ProductA (“A”) is a physical label. Because ProductB doesn’t have a physical Label, it gets a generated AID. For ProductC (child) we can now choose either the AID from ProductA or from ProductB. Because “A” is a physical label, it makes sense to use the AID of ProductA.

MQTT messages to send at Assembly 1:

  • uniqueProduct message for ProductA origin, with asset = storage, under the topic: ia/testcustomer/testlocation/storage/uniqueProduct

    {
      "begin_timestamp_ms": 1611171012717,
      "end_timestamp_ms": 1611171016443,
      "product_id": "test123",
      "is_scrap": false,
      "uniqueProductAlternativeID": "A"
    }
    
  • uniqueProduct message for ProductB origin, with asset = storage, under the topic: ia/testcustomer/testlocation/storage/uniqueProduct

    {
      "begin_timestamp_ms": 1611171012717,
      "end_timestamp_ms": 1611171016443,
      "product_id": "test124",
      "is_scrap": false,
      "uniqueProductAlternativeID": "B"
    }
    
  • uniqueProduct message for ProductC, with asset = Assy1, under the topic: ia/testcustomer/testlocation/Assy1/uniqueProduct

    {
      "begin_timestamp_ms": 1611171012717,
      "end_timestamp_ms": 1611171016443,
      "product_id": "test125",
      "is_scrap": false,
      "uniqueProductAlternativeID": "A"
    }
    
  • addParentToChild message describing the inheritance from ProductA to ProductC, under the topic: ia/testcustomer/testlocation/Assy1/addParentToChild

    {
    "timestamp_ms": 124387,
    "childAID": "A",
    "parentAID": "A"
    }
    
  • addParentToChild message describing the inheritance from ProductB to ProductC, under the topic: ia/testcustomer/testlocation/Assy1/addParentToChild

    {
    "timestamp_ms": 124387,
    "childAID": "A",
    "parentAID": "B"
    }
    
  • productTag message for e.g. a measured process value like the temperature,under the topic: ia/testcustomer/testlocation/Assy1/productTag

    {
    "timestamp_ms": 1243204549,
    "AID": "A",
    "name": "temperature",
    "value": 35.4
    }
    

Now the ProductC is transported to Assembly Station 2. Because it is a short transport, doesn’t add value etc. we do not need to produce a new UID after the transport of ProductA.

Assembly Station 2:

  • ProductC stays the same (in the sense that it is keeping its UID before and after the transport), because of the easy transport.
  • ProductD is new and not produced at assembly station 2, so it gets asset = “storage” assigned
  • ProductC and ProductD are combined into ProductE. ProductE gets a new UID. Both AID’s are physical. We again freely choose the AID we want to use (AID C was chosen, maybe because after the assembly of ProductC and ProductD, the AID Label on ProductD is not accessible while the AID Label on the ProductC is).

Assembly Station 3:

  • At Assembly Station ProductE comes in and is turned into ProductF
  • ProductF gets a new UID and keeps the AID of ProductE. It now gets the Assy3 assigned as asset.

Note that the uniqueProduct MQTT message for ProductD would not be under the Topic of Assembly2 as asset but for example under storage. The convention is, that every part never seen by digital shadow “comes” from storage even though the UID and the related uniqueProduct message is created at the current station.

Batches of parts

If for example a batch of screws is supplied to one asset with only one datamatrix code (one AID) for all screws together, there will only be one MQTT message under the topic uniqueProduct created for the batch with one AID, a newly generated UID and with the default supply asset storage.

  • The batch AID is then used as parent for a MQTT message under the topic addParentToChild. (-> mqtt-to-postgres will repeatedly fetch the same parent uid for the inheritanceTable)
  • The batch AID only changes when new batch AID is scanned.

Step 3: mqtt-to-postgresql

The mqtt-to-postgresql microservice now uses the MQTT messages it gets from the broker and writes the information in the database. The microservice is not use-case specific, so the user just needs to send it the correct MQTT messages.

mqtt-to-postgresql now needs to generate UID’s and save the information in the database, because the database uses UID’s to store and link all the generated data efficiently. Remember that the incoming MQTT messages are contextualized with AID’s.

We can divide the task of mqtt-to-postgresql in three (regarding the digital shadow):

  1. Use the MQTT message under the Topic uniqueProduct which gives us the AID and the Asset and make an entry in the uniqueProduct table containing the AID and a newly generated UID.

    1. Generate UID (with snowflake: https://en.wikipedia.org/wiki/Snowflake_ID)
    2. Store new UID and all data from uniqueProduct MQTT Message in the uniqueProductTable
  2. Use productTag and productTagString topic MQTT messages. The AID and the AssetId is used to look for the uniqueProduct the messages belong to. The value information is then stored with the UID in the TimescaleDB

    1. Look in TimescaleDB, uniqueProductTable for the uniqueProduct with the same Asset and AID from the productTag message (the child)
    2. Get the UID when found from the child (that is why it is important to send the uniqueProduct message before sending productTag/productTagString).
    3. Write value information without AID, instead with the found UID in the uniqueProductTable
  3. Use the addParentToChild message. Retrieve the child UID by using the child AID and the Asset. Get the parent UID’s by finding the last time the parents AID’s were stored in the uniqueProductTable.

    1. Look in TimescaleDB, uniqueProductTable for the uniqueProduct with the same Asset and AID as written in the child of the /addParentToChild message
    2. Look in the TimescaleDB, uniqueProductTable for all other assets for the last time the AID of the parent was used and get the UID
    3. Write UID of child and UID of the parent in the productInheritanceTable

Possible Problems:

Step 4: Database and the database model

The structure of the database might be changed in the future.

Four tables are especially relevant:

  • uniqueProductTable contains entries with a pair of one UID and one AID and other data.
  • productTagTable and productTagStringTable store information referenced to the UID’s in the uniqueProductTable. Stored is everything from individual measurements to quality classes.
  • productInheritanceTable contains pairs of child and parent UID’s. The table as a whole thereby contains the complete inheritance information of each individual part. One entry describes one edge of the inheritance graph.

The new relevant tables are dotted, the uniqueProductTable changes are bold in the timescaleDB structure visualization.

Step 5: factoryinsight

To make the relevant data from digital shadow available we need to provide new REST API’s. factoryinsight is the microservice doing that task. It accepts specific requests, accesses the timescale database and returns the data in the desired format.

Implemented functionality for digital shadow

The following function returns all uniqueProducts for that specific asset in a specified time range. One datapoint contains one childUID, AID and all parentAID’s regarding the asset. All uniqueProductTags and uniqueProductTagStrings (value and timestamp) for the childUID are returned to the same datapoint.

get /{customer}/{location}/{asset}/uniqueProductsWithTags from <timestamp1> to <timestamp2> (in RFC 3999 Format).

Example Return with two data points:

{
  "columnNames":
  [
    "UID",
    "AID",
    "TimestampBegin",
    "TimestampEnd",
    "ProductID",
    "IsScrap",
    "torque2",
    "torque1",
    "torque3",
    "torque4",
    "VH_Type123",
    "Gasket_Type123"
  ],
  "datapoints":
  [
    [
      2,
      "57000458",
      1629807326485,
      null,
      15,
      false,
      5.694793469033914,
      5.500782656464146,
      5.868141105450906,
      5.780416969961664,
      "57000458",
      "12000459"
    ],
    [
      6,
      "57000459",
      1629807443961,
      null,
      15,
      false,
      5.835010327979067,
      5.9666619086350945,
      5.425482064635844,
      5.6943075975030535,
      "57000459",
      "12000460"
    ]
  ]
}

Implemented logic of factoryinsight to achieve the functionality

  1. Get all productUID’s and AID’s from uniqueProductTable within the specified time and from the specified asset.
  2. Get all parentUID’s from the productInheritanceTable for each of the selected UID’s.
  3. Get the AID’s for the parentUID’s from the uniqueProductTable.
  4. Get all key, value pairs from the productTagTable and productTagStringTable for the in step 1 selected UID’s.
  5. Return all parent AID’s under the column name of the corresponding parent productID’s. Return the child AID and UID. Return the productTag and productTagString values under the column name of the corresponding valueNames.

Step 6: SQL Database to connect to Tableau server

This is currently not included in the stack

For the digital shadow functionality we need to give the tableau server access to the data. Because the tableau server can’t directly connect to the REST API, we need to either use a database in between, or a tableau web data connector. We were advised against the tableau web data connector (general info about tableau webdata connectors: https://help.tableau.com/current/pro/desktop/en-us/examples_web_data_connector.htm ).

Because of that we implemented a SQL database in combination with Node-RED. Node-RED requests data from the REST API in regular intervals and pushes it into the SQL database. From there on we can access the data with the Tableau server.

Industry Example

To test the digital shadow functionality and display its advantages we implemented the solution in a model factory.

This graphic displays the events and following MQTT messages, mqtt-to-postgresql receives.

Long term: planned features

We plan to integrate further functionalities to the digital shadow. Possible candidates are:

  • multiple new REST API’s to use the digital shadow more flexible
  • detailed performance analysis and subsequent optimization to enable digital shadow for massive production speed and complexity
  • A buffer in microservice mqtt-to-postgresql. If productTag/productTagString messages are sent to the microservice before writing the message uniqueProduct in the database the tags should be stored until the uniqueProduct message arrives. A buffer could hold productTag/productTagString messages and regularly try to write them in the database.

5 - Integration into Azure

This article explains how the United Manufacturing Hub can be integrated into Microsoft Azure.

This section is currently in development. Published articles so far:

6 - Open source in Industrial IoT: an open and robust infrastructure instead of reinventing the wheel.

How we are keeping up with the established players in Industrial IoT and why we believe the United Manufacturing Hub is changing the future of Industrial IoT and Industry 4.0 with the help of Open Source.

Image author: Christopher Burns from Unsplash

How do we keep up with the big players in the industry despite limited resources and small market share? The best way to do this is to break new ground and draw on the collective experience of organizations and their specialists instead of trying to reinvent the wheel.

The collaborative nature of open source enables companies and individuals alike to turn their visions into reality and keep up with established players such as Siemens, Microsoft, and Rockwell, even without a large number of programmers and engineers. This is the path we are taking at United Manufacturing Hub.

Open source software has long since outgrown the insider stage and has become a veritable trend that is becoming the standard in more and more industries. Many, in the IT world, common and intensively used applications (e.g. Kubernetes, TensorFlow, f-prime by NASA 1) have nowadays emerged in a collaborative approach and are available for free.

Open-Source on Mars: the Mars helicopter Ingenuity, relies heaviliy on open-source components like f-prime. Image author: JPL/NASA

Typically, these applications are not yet ready for production or Industry 4.0 use. Some, such as Grafana, are intended for completely different industries (Observability & Monitoring).

However, the source code of these software projects is freely accessible to everyone and can be individually adapted to specific needs. Thus, the application in the Industrial IoT is also no problem. In part, those applications were programmed over decades 2 by several thousand developers and are continuously developed further 3.

The status quo

Today, it is common to develop proprietary Industrial IoT programs and software platforms - the opposite of open source.

A reason behind is, that companies do not want to have foreign code written into their applications and they want to offer the customer a self-made, end-to-end solution.

It is common for a team of over 20 or even 30 people to be assigned to develop a dashboard or IoT gateway, with the focus on a pretty looking (usually self-branded) user interface (UI) and design. Existing open source solutions or automation standards are rarely built upon.

Self-developed, in-house architectures are often strongly influenced by company-specific know-how and therefore usually also favor the company’s own products and services in their interfaces.

The result: the wheel is often reinvented in both the software and hardware areas. The resulting architectures create a lock-in effect that leads to a dependency of the manufacturing companies on their software and hardware suppliers.

Reinventing the wheel: The software world

In our opinion, good examples in the category “reinvented the wheel” from the software world are:

  1. Self-developed visualizations such as visualizations from InfluxDB, PI Vision from OSIsoft or WAGO IoT Cloud Visualization (instead of Grafana).

  2. Flow-based low code / no code apps such as Wire Graph by Eurotech (instead of node-red)

  3. The bulk of Industrial IoT platforms that are claiming to be a “one-stop solution.” Such platforms are trying to cover every aspect from data acquisition, over processing, to visualization with in-house solutions (instead of relying on established technologies and just filling the gaps in the stack).

Both Grafana and node-red are highly professional solutions in their respective fields, which have already been used in various software projects for several years. Orchestrating such specialized applications means that offered and tested solutions can be put to good use.

Reinventing the wheel: The hardware world

There are numerous examples in the Industrial IoT hardware world where there is a conscious or unconscious deviation from established industry standards of the automation industry.

We have particularly noticed this with vendors in the field of Overall Equipment Effectiveness (OEE) and production overviews. Although they usually have very good dashboards, they still rely on self-developed microcontrollers combined with consumer tablets (instead of established automation standards such as a PLC or an industrial edge PC) for the hardware. In this case, the microcontroller, usually called IoT gateway, is considered a black box, and the end customer only gets access to the device in rare cases.

The advantages cannot be denied:

  1. the system is easy to use,
  2. usually very inexpensive,
  3. and requires little prior knowledge.

Unfortunately, these same advantages can also become disadvantages:

  1. the house system integrator and house supplier is not able to work with the system, as it has been greatly reduced for simplicity.
  2. all software extensions and appearing problems, such as integrating software like an ERP system with the rest of the IT landscape, must be discussed with the respective supplier. This creates a one-sided market power (see also Lock-In).

Another problem that arises when deviating from established automation standards: a lack of reliability.

Normally, the system always need to work because failures lead to production downtime (the operator must report the problem). The machine operator just wants to press a button to get a stop reason or the desired information. He does not want to deal with WLAN problems, browser update or updated privacy policies on the consumer tablet.

The strongest argument: Lock-In

In a newly emerging market, it is especially important for a manufacturing company not to make itself dependent on individual providers. Not only to be independent if a product/company is discontinued but also to be able to change providers at any time.

Particularly pure SaaS (Software-as-a-Service) providers should be handled with caution:

  • A SaaS offering typically uses a centralized cloud-based server infrastructure for multiple customers simultaneously. By its very nature, this makes it difficult to integrate into the IT landscape, e.g., to link with the MES system installed locally in the factory.
  • In addition, a change of provider is practically only possible with large-scale reconfiguration/redevelopment.
  • Lastly, there is a concern regarding the data ownership and security of closed systems and multiple SaaS offerings.

Basically, exaggerating slightly to make the point, it is important to avoid highly sensitive production data with protected process parameters getting to foreign competitors.

One might think that the manufacturing company is initially entitled to all rights to the data - after all, it is the company that “produced” the data.

In fact, according to the current situation, there is no comprehensive legal protection of the data, at least in Germany, if this is not explicitly regulated by contract, as the Verband der deutschen Maschinenbauer (VDMA) (Association of German Mechanical Engineering Companies) admits 4.

Even when it comes to data security, some people feel queasy about handing over their data to someone else, possibly even a US startup. Absolutely rightly so, says the VDMA, because companies based in the USA are obliged to allow US government authorities access to the data at any time 5.

An open source project can give a very good and satisfactory answer here:

United Manufacturing Hub users can always develop the product further without the original developers, as the source code is fully open and documented.

All subcomponents are fully open and run on almost any infrastructure, from the cloud to a Raspberry Pi, always giving the manufacturing company control over all its data.

Interfaces with other systems are either included directly, greatly simplifying their development, or can be retrofitted themselves without being nailed down to specific programming languages.

Unused potential

In the age of Industry 4.0, the top priority is for companies to operate as efficiently as possible by taking full advantage of their potential.

Open source software, unlike classic proprietary software, enables this potential to be fully exploited. Resources and hundreds of man-hours can be saved by using free solutions and standards from the automation industry.

Developing and offering a proprietary dashboard or IoT gateway that is reliable, stable, and free of bugs is wasting valuable time.

Another hundred, if not a thousand, man-hours are needed until all relevant features such as single sign-on, user management, or logging are implemented. Thus, it is not uncommon that even large companies, the market leaders in the industry, do not operate efficiently, and the resulting products are in the 6-to-7-digit price range.

But the efficiency goes even further:

Open source solutions also benefit from the fact that a community is available to help with questions. This service is rarely available with proprietary solutions. All questions and problems must be discussed with the multi-level support hotline instead of simply Googling the solution.

And so, unfortunately, most companies take a path that is anything but efficient. But isn’t there a better way?

United Manufacturing Hub’s open source approach.

Who says that you have to follow thought patterns or processes that everyone else is modeling? Sometimes it’s a matter of leaving established paths, following your own convictions, and initiating a paradigm shift. That is the approach we are taking.

We cannot compete with the size and resources of the big players. That is why we do not even try to develop in one or two years, with a team of 20 to 30 programmers what large companies have developed in hundreds of thousands of hours.

But that’s not necessary because the resulting product is unlikely to keep up with the open source projects or established automation standards. That is why the duplicated work is not worth the struggle .

The open source software code is freely accessible and thus allows maximum transparency and, at the same time, security. It offers a flexibility that is not reached by programs developed in the traditional way. By using open source software, the United Manufacturing Hub is taking an efficient way of developing. It allows us to offer a product of at least equal value but with considerably fewer development costs.

Example OEE dashboard created in Grafana

Simplicity and efficiency in the age of Industrial IoT.

At United Manufacturing Hub, we combine open source technologies with industry-specific requirements. To do this, we draw on established software such as Docker, Kubernetes or Helm 1 and create, for example, data models, algorithms, and KPIs (e.g. the UMH data model, the factoryinsight and mqtt-to-postresql components) that are needed in the respective industries.

By extracting all data from machine controls (OPC/UA, etc.), we ensure the management and distribution of data on the store floor. Also, if additional data is needed, we offer individual solutions using industry-specific certified sensor retrofit kits, for example, at a steel manufacturer. More on this in one of the later parts of this series.

Summary

Why should we reinvent the wheel when we can focus our expertise on the areas we can provide the most value to our customers?

Leveraging open source solutions allow us to expose a stable and robust infrastructure that enables our customers to meet the challenges of Industrial IoT.

Because, in fact, manufacturing and Industrial IoT is not about developing new software at the drop of a hat. It is more about solving individual problems and challenges. This is done by drawing on a global network of experts who have developed special applications in their respective fields. These applications allow all hardware and software components to be quickly and easily established in the overall architecture through a large number of interfaces.


  1. For the common technologies see also Understanding the technologies. ↩︎

  2. https://www.postgresql.org/docs/current/history.html ↩︎

  3. https://github.com/kubernetes/kubernetes ↩︎

  4. Leitfaden Datennutzung. Orientierungshilfe zur Vertragsgestaltung für den Mittelstand. Published by VDMA in 2019. ↩︎

  5. *Digitale Marktabschottung: Auswirkungen von Protektionismus auf Industrie 4.0 * Published by VDMA’s Impulse Foundation in 2019. ↩︎

7 - An introduction into certificates and secure communication in IoT for normal people

This article explains the two fundamental approaches to encrypting your messages with your IoT devices, from passwords (symmetric) to certificates (asymmetric).

Asymmetric vs symmetric encryption. Symmetric encryption uses the same key for encrypting as for decrypting. Asymmetric encryption uses different ones

Introduction

Newcomers to IT security are often confused about the difference between passwords and certificates. This article gives newcomers to IT security an introduction into the topic of secure communication with IoT devices. The focus is to explain the two fundamental methods to ensure that nobody else can read your communication, namely symmetric (e.g., passwords) and asymmetric encryption (e.g., certificates).

The need to ensure that nobody else than you and the recipient can read messages exist for thousands of years and has always been a battle between people creating encryption methods and so-called code-breakers, that focus on trying to decipher other messages. Wars have been started, won and lost because of a knowledge advantage gained through secure communications (see also ENIGMA or Zimmerman-Telegram).

In today’s world, the topic of secure communication is more relevant than ever: companies want to protect their intellectual knowledge from other companies and want to protect themselves from Ransomware attacks.

In IoT or Industrial IoT it is important as well:

  • How to ensure that the data send to the cloud from my microcontroller or PLC 1 is not read or modified?
  • How to push updates onto my PLCs or IoT devices and prevent that someone else modifies or pushes their own updates?
  • What are these certificates that are required for a connection to Azure IoT Hub, an MQTT broker 2, or any other API?

Let’s have fun and deep dive in!

Symmetric encryption

Let’s start with the simpler method of both approaches: symmetric encryption. In cryptography, the field of encryption, methods are always explained using Alice, Bob, and Mallory. All of them (there are way more, see also the Wikipedia article on Alice and Bob) are fictional characters. Alice usually initiates the communication and wants to send a message to Bob without any third party like Mallory being able to read it.

Alice, Bob, and Mallory

In IoT, Alice could be a Raspberry Pi with a temperature sensor that is trying to send a message to Bob, who is a service running in the cloud. In Industrial IoT, Alice could be A PLC and Bob an Industrial IoT platform.

Alice, Bob, and Mallory practically

Before Alice and Bob can communicate with each other using a symmetric encryption they need to do two things first:

  1. align on the encryption method
  2. align on the secret, also called password

Let’s explain symmetric encryption using Alice and Bob and one of the oldest, yet, one of the most famous encryptions, Caesar’s cipher. This encryption technique is named after Julius Caesar’s who used it for his private correspondence.

before their first messages, Alice and Bob need to align to use this cipher and then additionally to use a number between 1-26 (which is also called a secret) first.

When Alice wants to send a message to Bob, Alice needs to encrypt the plaintext using the cipher and the secret. For Caesar’s cipher, it works by shifting the alphabet x characters to the right (and x is the chosen number from above, the secret).

Caesar’s cipher

For example, a right shift of 3 is resulting in the replacement of the letter A with D. Alice then sends the encrypted message to Bob. Bob then reverts the encryption by shifting the alphabet x characters back to the left, so a D gets converted back into an A. Now Bob can read the plaintext.

Theoretically, a third party like Mallory could not decrypt the encrypted message because Mallory doesn’t know the secret. You might ask now: “But hey, couldn’t Mallory just try out all numbers?” and you would be right. Having a secret or password that can only be between the numbers 1-26 is considered unsafe, but please keep in mind that this was the technology in ancient Rome.

In the example above we could easily crack the secret by trying out, which is also called brute-forcing. To prevent that more and more encryption methods emerged and almost as many methods to crack them - from good old brute-forcing to deriving parts of the secret by knowing already knowing certain parts of the plaintext (this was how the Allied cracked the German ENIGMA during World War 2).

Today, we have encryptions like AES rooted in advanced mathematics and to which there are currently no shortcuts except brute-forcing. These keys usually have a fixed length, for example AES-128 has a 128-bit key, and are derived from an easy-to-remember password using a key derivation function.

Brute-forcing the keys is not feasible with current or foreseeable hardware. The KDF is calculation intensive, which prevents brute-forcing easy-to-remember passwords (as long as they are reasonable long), so this method is currently approved in the U.S. for government documents of the highest classification.

Small side-topic: to login into systems, for example websites, you need to enter a username and a password. In these cases the password is not stored on the server, it is only stored as a hash, so even if the database of the website gets compromised one cannot derive the password from the hash. If you are interested in this topic, we suggest reading the following article.

Brute-forcing can also be improved by not checking everything randomly, but by e.g. checking the top 1000 common passwords or words from a dictionary (= dictionary attack).

So how do you then create good passwords? Several years ago the best practice was to use smaller randomly generated passwords, which should change frequently. But humans are humans and most of them will just add a ! or 1 at the end of the password. So nowadays the recommendation is to go for lengthy and still easy rememberable passwords. if2s&sd5 is more insecure and way harder to remember than AlexanderIs25YearsOldAndLikesToHaveHisDeskClean.

Obligatory xkcd: https://xkcd.com/936/

But the approach itself, the symmetric encryption with a secret / password, has two fundamental flaws:

  1. How can Alice and Bob exchange the secret without Mallory reading it? Meeting in person is nowadays not always an option.
  2. How can Bob know that the message is coming from Alice and that nothing was changed or deleted by Mallory?

For this, some very smart people invented the approach of Public-Key-Encryption, also known as asymmetric encryption.

Asymmetric encryption

In 1977 Ronald Rivest, Adi Shamir, and Leonard Adleman published an alternative method, which they called after their first letter in their name: RSA 3

Little fun fact: a similar method had already been developed at the British Agency GCHQ in 1970, but was classified at that time was not published until 1997 4

Public-Key-Encryption

How does it work? For normal people here is a simple explaination from Computerphile on YouTube.

For the video haters out there, here is a small summary in text form:

Instead of using the same key for encrypting and decrypting you find a mathematic formula, that you can use separate keys, that belong together, but one cannot be derived from the other.

In practice, you have one key, the public key, that everyone knows, and which you use to encrypt a message. To decrypt the message you have to use another key, the so-called private key. When Alice generates these keys, she will keep the private key for herself and publish the public key to everyone.

She can also do the reverse: she can encrypt a message with her private key, so everyone will be able to decrypt it with her public key. Therefore, Bob can be sure that the message is coming from Alice (because only she has the private key and encrypt it like that).

If you are interested in the maths behind it, we recommend taking a look at the original paper. Or take a look at this video tutorial

The following text assumes that you have watched the video above (or read the small summary).

These public keys or private keys, that Computerphile is talking about, is what you would call “certificates”. They are usually stored as separate files in many different formats and file extensions. Here are some common ones for IoT:

  • .pub, .pem, .crt for public keys
  • .key or no file type at all for private keys
  • .csr contains no keys at all, but is related (see also Certificate Signing Requests)
  • .pfx, .p12, are other formats

What’s most important is the content of the file, so very often you can just change the file extension and load it in (most are PEM encoded, which looks like this). Sometimes the content differs (e.g., .pfx files or files in OpenSSH format are different than the usual PEM encoded .pem files), so you need conversion tools for that. If you are familiar with Google, it should be easy to find the correct tools and commands for that :)

Public Key Infrastructure (PKI)

But how can Alice ensure that the Public Key is actually from Bob and not from Mallory? What Computerphile did not talk about are Certificate Authorities (CA) and Intermediaries.

A CA is basically just a public-key pair that acts as a trusted instance. It can prove that a public key actually belongs to the person, computer, or website it claims to belong to. It does that by encrypting the public key of Bob with the CA’s private key (= Signing). These CA’s are mostly preinstalled on your operating system and help you in identifying whether your bank login information is actually send to the bank and not to someone else.

In some instances, it might make sense to create your own Public Key Infrastructure (PKI). You can find a guide on how to do that for example for MQTT brokers in our documentation

How it works in IoT

Practically, symmetric and asymmetric encryptions are used in parallel and combined with more advanced security topics like for example Diffie-Hellmann-Key-Exchange or Hashing. For more information on these advanced topics we can recommend to watch the YouTube videos from Computerphile (Diffie-Hellmann-Key-Exchange and Hashing)

The main reason for combining both technologies is computing speed. As a compromise, often a secret is generated and then shared between both parties using a (slow) asymmetric encryption. All the remaining messages are then encrypted using (fast) symmetric encryptions.

HTTPS

You are using these technologies already every day when you are visiting HTTPS websites like the one you are currently on (starts with https://). If you are sending it via HTTP, everyone on the internet can read your messages (see screenshot).

Source: Ibrahim Ali Ibrahim Diyeb, Anwar Saif, Nagi Ali Al-Shaibany,"Ethical Network Surveillance using Packet Sniffing Tools: A Comparative Study", International Journal of Computer Network and Information Security(IJCNIS), Vol.10, No.7, pp.12-22, 2018.DOI: 10.5815/ijcnis.2018.07.02 (https://www.researchgate.net/publication/326419957_Ethical_Network_Surveillance_using_Packet_Sniffing_Tools_A_Comparative_Study)

Security overview over our website docs.umh.app at 2021-10-18. Exact algorithms might change based on your browser

Example: MQTT and CA’s

In IoT we are relying on the same algorithms (SSL/TLS) that allow you to do safe online banking, with the exception that we cannot rely on external CA’s, but that we are creating it entirely ourselves.

We have a CA, a intermediate CA (to prevent using the root CA all the time), a server certificate for the MQTT broker and then client certificates for the MQTT clients.

Small recap: a CA is a trusted public-key pair. The intermediate CA is a public-key pair, that has been trusted by the root CA to take over the role of signing for a given time and in a given scope. A root CA is the “head” of the CA chain and can do everything.

The server certificate is the public side of the public-key pair created specifically for the MQTT broker, and where the intermediate CA signed it saying “yup, this is really the server mqtt.umh.app and it belongs to us. It can also act as a server.”.

The client certificates are signed by the intermediate CA as well with the permission to act only as clients (and not as server).

The public key of the root CA is embedded in all of our software, so that these devices know that, even if they never talked to each other before, they are communicating with whom they should communicate (e.g., a client connects to an MQTT broker and not to another client or someone entirely else).

This is the reason you need to copy these certificates onto the devices.

Example: viewing and analyzing the MQTT certificates

Last but not least, let’s take a look at the MQTT Public-Key infrastructure in detail. Specifically, let’s analyze these certificates.

In the folder deployment/factorycube-server/developmentCertificates/pki you find an example PKI infrastructure for factorycube-server. Almost all files are autogenerated with easy-rsa and you can actually ignore most of them. You can create your own PKI by following our tutorial

Let’s deep dive into 4 different certificates, to understand the structure of a PKI better:

  1. ca.crt
  2. issued/factorycube-server-vernemq.crt
  3. issued/TESTING.crt
  4. private/TESTING.key

Each certificate will be opened in xca, which is a graphical tool for creating and managing a PKI.

After importing the above certificates xca will look like this:

xca certificate section, after importing the above certificates

xca private key section, after importing the above certificates

ca.crt / “Easy-RSA CA”

This is the public-key of the root certificate (also known as root CA). We see that the certificate in the file ca.crt has the name “Easy-RSA CA”.

When opening it in an editor, it will look like gibberish:

ca.crt in raw format

After importing it to xca and clicking on it we can see all its details:

ca.crt in xca

ca.crt in xca - extensions

This includes:

  • Fingerprint (to easily identify a key)
  • Expiration (on the bottom)
  • That is is a certificate authority (see second screenshot CA:TRUE)
  • That it can be used to sign other certificates (see second screenshot X509v3 Key Usage)

In the overview panel in xca we see after importing the other certificates, that the others are shown under this certificate. xca automatically detects that ca.crt is the root CA and that is has signed the other certificates. It will therefore visualize it like that.

factorycube-server-vernemq.crt / “factorycube-server-vernemq”

This is the public-key of the MQTT server key-pair and you can view the same information as in the certificate above with some exceptions:

factorycube-server-vernemq.crt in xca

factorycube-server-vernemq.crt in xca - extensions

  1. It is signed by Easy-RSA CA (see Signature)
  2. It is not a CA and can only sign (but cannot sign other certificates, see X509v3 Key Usage)
  3. The whole purpose is beeing a Webserver for the domain factorycube-server-vernemq (if you access it under a different domain the client will say that the certificate is invalid). You can see it under the points X509v3 Extended Key Usage and X509v3 Subject Alternative Name

TESTING.crt / “TESTING”

This is a client key and is only allowed to be used as a client (see X509v3 Extended Key Usage).

TESTING.crt in xca - extensions

TESTING.key / “TESTING”

This is the private key for the certificate above.

TESTING.key in xca

Outlook

“But is it really secure?”, one might ask. Probably every security expert would answer that with “It depends” (if someone says yes, they are probably no expert), quickly followed by “it is the best that we have so far”. Technologies like these are officially recommended by basically all national and international security agencies, even for critical applications like energy or banking sectors (NIST for the USA, BSI for Germany).

With the current rise of quantum computing the current algorithms might be considered ineffective in some time into the future, but we already have post-quantum algorithms out there that can safely be integrated into the current infrastructures. If you are interested, you can visit the website of the current tournament of the NIST


  1. Programmable Logic Controller, industrial computer controlling production machines, https://en.wikipedia.org/wiki/Programmable_logic_controller ↩︎

  2. Message Queuing Telemetry Transport, common IoT protocol, https://en.wikipedia.org/wiki/MQTT ↩︎

  3. http://people.csail.mit.edu/rivest/Rsapaper.pdf ↩︎

  4. https://web.archive.org/web/20080227001905/http://www.cesg.gov.uk/site/publications/media/notense.pdf ↩︎

8 - Why we chose timescaleDB over InfluxDB

TimescaleDB is better suited for the Industrial IoT than InfluxDB, because it is stable, mature and failure resistant, it uses the very common SQL as a query language and you need a relational database for manufacturing anyway

Introduction

The introduction and implementation of an Industrial IoT strategy is already complicated and tedious. There is no need to put unnecessary obstacles in the way through lack of stability, new programming languages, or more databases than necessary. You need a piece of software that you can trust with your company’s most important data.

We are often asked why we chose timescaleDB instead of InfluxDB. Both are time-series databases suited for large amounts of machine and sensor data (e.g., vibration or temperature).

We started with InfluxDB (probably due to its strong presence in the home automation and Grafana communities) and then ended up with timescaleDB based on three arguments. In this article, we would like to explain our decision and provide background information on why timescaleDB makes the most sense for the United Manufacturing Hub.

Argument 1: Reliability & Scalability

A central requirement for a database: it cannot lose or corrupt your data. Furthermore, as a central element in an Industrial IoT stack, it must scale with growing requirements.

TimescaleDB

TimescaleDB is built on PostgreSQL, which has been continuously developed for over 25 years and has a central place in the architecture of many large companies like Uber, Netflix, Spotify or reddit. This has created a fault-tolerant database that can scale horizontally across multiple servers. In short: it is boring and works.

InfluxDB

In contrast, InfluxDB is a relatively young startup that was funded at 119.9 M USD (as of 2021-05-03) but still doesn’t have 25+ years of expertise to fall back on.

On the contrary: Influx has completely rewritten the database twice in the last 5 years 1 2. Rewriting software can improve fundamental issues or add exciting new features. However, it is usually associated with breaking changes in the API and new unintended bugs. This results in additional migration projects, which take time and risk system downtime or data loss.

Due to its massive funding, we get the impression that they add quite a lot of exciting new features and functionalities (e.g., an own visualization tool). However, after testing, we noticed that the stability suffers under these new features.

In addition, Influx only offers the horizontally scalable version of the database in the paid version, which will scare off companies wanting to use it on a larger scale as you will be fully dependent on the provider of that software (vendor lock-in).

Summary

With databases, the principle applies: Better boring and working than exciting and unreliable.

We can also strongly recommend an article by timescaleDB.

Argument 2: SQL is better known than flux

The second argument refers to the query language, i.e., the way information can be retrieved from the database.

SQL (timescaleDB)

TimescaleDB, like PostgreSQL, relies on SQL, the de facto standard language for relational databases. Advantages: A programming language established for over 45 years, which almost every programmer knows or has used at least once. Any problem? No problem, just Google it, and some smart person has already solved it on Stack Overflow. Integration with PowerBI? A standard interface that’s already integrated!

SELECT time, (memUsed / procTotal / 1000000) as value
FROM measurements
WHERE time > now() - '1 hour';

Example SQL code to get the average memory usage for the last hour.

flux (InfluxDB)

InfluxDB, on the other hand, relies on the homegrown flux, which is supposed to simplify time-series data queries. It sees time-series data as a continuous stream upon which are applied functions, calculations and transformations3.

Problem: as a programmer, you have to rethink a lot because the language is flow-based and not based on relational algebra. It takes some time to get used to it, but it is still an unnecessary hurdle for those not-so-tech-savvy companies who already struggle with Industrial IoT.

From some experience, we can also say that the language quickly reaches its limits. In the past, we worked with additional Python scripts that extract the data from InfluxDB via Flux, then process it and then play it back again.

// Memory used (in bytes)
memUsed = from(bucket: "telegraf/autogen")
  |> range(start: -1h)
  |> filter(fn: (r) =>
    r._measurement == "mem" and
    r._field == "used"
  )

// Total processes running
procTotal = from(bucket: "telegraf/autogen")
  |> range(start: -1h)
  |> filter(fn: (r) =>
    r._measurement == "processes" and
    r._field == "total"
    )

// Join memory used with total processes and calculate
// the average memory (in MB) used for running processes.
join(
    tables: {mem:memUsed, proc:procTotal},
    on: ["_time", "_stop", "_start", "host"]
  )
  |> map(fn: (r) => ({
    _time: r._time,
    _value: (r._value_mem / r._value_proc) / 1000000
  })
)

Example Flux code for the same SQL code.

Summary

In summary, InfluxDB puts unnecessary obstacles in the way of not-so-tech-savvy companies with flux, while PostgreSQL relies on SQL, which just about every programmer knows.

We can also strongly recommend the blog post by timescaleDB on exactly this topic.

Argument 3: relational data

Finally, the argument that is particularly important for production: Production data is more relational than time-series based.

Relational data is, simply put, all table-based data that you can store in Excel in a meaningful way, for example, shift schedules, orders, component lists, or inventory.

Relational data. Author: AutumnSnow, License: CC BY-SA 3.0

TimescaleDB provides this by default through the PostgreSQL base, whereas with InfluxDB, you always have to run a second relational database like PostgreSQL in parallel.

If you have to run two databases anyway, you can reduce complexity and directly use PostgreSQL/timescaleDB.

Not an argument: Performance for time-series data

Often the duel between timescaleDB and InfluxDB is fought on the performance level. Both databases are efficient, and 30% better or worse does not matter if both databases are 10x-100x faster 4 than classical relational databases like PostgreSQL or MySQL.

Even if it is not important, there is strong evidence that timescaleDB is actually more performant. Both databases regularly compare their performance against other databases, and InfluxDB never compares itself to timescaleDB. However, timescaleDB has provided a detailed performance guide of influxDB.

Summary

Who do you trust more? The nerdy and boring, or the good-looking accountant, with 25 new exciting tools?

The same goes for databases: Boring is awesome.


  1. https://www.influxdata.com/blog/new-storage-engine-time-structured-merge-tree/ ↩︎

  2. https://www.influxdata.com/blog/influxdb-2-0-open-source-beta-released/ ↩︎

  3. https://www.influxdata.com/blog/why-were-building-flux-a-new-data-scripting-and-query-language/ ↩︎

  4. https://docs.timescale.com/latest/introduction/timescaledb-vs-postgres ↩︎

9 - The UMH datamodel (preview for v1.0.0)

This is the preview of the datamodel for the version v1.0.0

This site is currently still in work

9.1 - Kafka

This documents our Kafka structure and settings

Default settings

By default, the following important settings are used:

settingvaluedescription
retention.ms604800000After 7 days messages will be deleted
retention.bytes-1We don’t limit the amount of messages stored by Kafka

Topics

Our Kafka topics are structured as follows:

  ia.CUSTOMER.LOCATION.MACHINE.EVENT

There are two exception to this rule:

  • ia.raw.TRANSMITTERID.GATEWAYSERIALNUMBER.PORTNUMBER.IOLINKSENSORID
  • ia.rawImage.TRANSMITTERID.CAMERAMACADDRESS

Specifications for those can be found on the UMH datamodel page.

All Topics may have suffixes, which might get ignored by the different microservices. For example:

  ia.testCustomer.testLocation.testAsset.processValue.temperature

All names have to match the following regex:

^[a-zA-Z0-9_\-]$

Customer

This is name of the customer (e.g: united-manufacturing-hub). It can also be an abbreviation (e.g: umh) of the customer name.

Location

This is the name of the location, the sender belongs to. It can be a physical location (aachen), an virtual location (rack-1), or any other unique specifier.

Machine

This is the name of the machine, the sender belongs to. It can be a physical machine (printer-1), a virtual machine (vmachine-1), or any other unique specifier.

Event

Our kafka stack currently supports the following events:

  • addMaintenanceActivity
  • addOrder
  • addParentToChild
  • addProduct
  • addShift
  • count
  • deleteShiftByAssetIdAndBeginTimestamp
  • deleteShiftById
  • endOrder
  • modifyProducedPieces
  • modifyState
  • processValue
  • processValueFloat64
  • processValueString
  • productTag
  • productTagString
  • recommendation
  • scrapCount
  • startOrder
  • state
  • uniqueProduct
  • scrapUniqueProduct

Further information about these events can be found at the UMH datamodel site.

Routing

Below you can find an example flow of messages.

Example kafka flow

Edge PC

In this example, we have an edge pc, which is connected to multiple sensors and a camera. It also receives data via MQTT, Node-RED and a barcode reader.

In our dataflow, we handle any IO-Link compatible sensor with sensorconnect, which reads IO-Link data and publishes it to Kafka. Compatible cameras / barcode readers are handled by cameraconnect and barcode reader respectively.

Node-RED can be used to pre-process arbitrary data and the publish it to Kafka.

MQTT-Kafka-Bridge takes MQTT data and publishes it to Kafka.

Once the data is published to the Edge Kafka broker, other microservices can subscribe to the data and produce higher level data, which gets re-published to the Edge Kafka broker.

Kafka bridge

This microservice can sit on either the edge or server and connects two kafka brokers. It has a regex based filter, for sending messages, which can be bi- or uni-directional. Also, it filters out duplicated messages, to prevent loops.

Every bridge adds an entry to the kafka header, to identify the source of the message and all hops taken.

Server

On the server, we have two microservices listening for incoming data.

Kafka-to-blob is a microservice which listens to Kafka and publishes the data to blob storage, in our case Minio. Kafka-to-postgresql is a microservice which listens to Kafka and publishes the data to a PostgreSQL database, with timescale installed.

Guarantees

Our system is build to provide at-least-once delivery guarantees, once a message first enters any kafka broker, except for high throughput data (processValue, processValueFloat64, processValueString).

Every message taken out of the broker, will only get committed to the broker, once it has been processed, or successfully returned to the broker (in case of an error).

For this, we use the following kafka settings:

{
  "enable.auto.commit":       true,
  "enable.auto.offset.store": false,
  "auto.offset.reset":        "earliest"
}
  • enable.auto.commit
    • This will auto commit all offsets, in the local offset store, every couple of seconds.
  • enable.auto.offset.store
    • We manually store the offsets in the local offset store, once we confirmed, that the message has been processed.
  • auto.offset.reset
    • This will return the offset pointer to the earliest unprocessed message, in case of a re-connect.

Note, that we could have gone for disabling “enable.auto.commit”, but in our testing, that was significantly slower.

For in-depth information about how we handle message inside our microservices, please see their documentation: