Enrich data with external datasets in data flow graphs

Sometimes the incoming message doesn't contain everything you need. A temperature reading might arrive with a device ID, but the display name, ___location, and calibration offset live in a separate lookup table. Enrichment lets you pull that external data into your transform rules.

For an overview of data flow graphs, see Data flow graphs overview.

Prerequisites

An instance of Azure IoT Operations deployed in a Kubernetes cluster. For more information, see Deploy Azure IoT Operations.

A default registry endpoint named default that points to mcr.microsoft.com is automatically created during deployment.

What is enrichment

You can augment incoming messages with data from an external state store, called a contextualization dataset. During processing, the runtime looks up records in the dataset and matches them against the incoming message using a condition you define. The matched fields then become available to your rules.

Enrichment works with map, filter, and branch transforms. It isn't supported in window transforms.

Configure a dataset

Datasets are defined in the datasets array at the top level of your rules configuration, alongside map, filter, or branch.

In the transform configuration, add a dataset. Configure:

Setting	Description
State store key	The key where dataset records are stored. Use `as` to assign an alias (for example, `device-metadata as device`).
Match inputs	Fields to compare: one from the source message (`$source.<field>`) and one from the dataset (`$context.<field>`).
Match expression	A boolean expression (for example, `$1 == $2`).

The CLI applies the whole graph from one config file, so add this to the transform node's configuration in your graph.json and apply it with az iot ops dataflowgraph apply.

The rules are a JSON object:

{
  "datasets": [
    {
      "key": "device-metadata as device",
      "inputs": ["$source.deviceId", "$context.deviceId"],
      "expression": "$1 == $2"
    }
  ],
  "map": [
    {
      "inputs": ["$context(device).displayName"],
      "output": "deviceName"
    }
  ]
}

These rules go in the value field as an escaped string:

"configuration": [
  {
    "key": "rules",
    "value": "{\"datasets\":[{\"key\":\"device-metadata as device\",\"inputs\":[\"$source.deviceId\",\"$context.deviceId\"],\"expression\":\"$1 == $2\"}],\"map\":[{\"inputs\":[\"$context(device).displayName\"],\"output\":\"deviceName\"}]}"
  }
]

Tip

To generate the escaped string, save the rules to a file like rules.json, then run jq -c . rules.json and paste the single-line output into the value field.

The dataset configuration is part of the rules JSON:

configuration: [
  {
    key: 'rules'
    value: '{"datasets":[{"key":"device-metadata as device","inputs":["$source.deviceId","$context.deviceId"],"expression":"$1 == $2"}],"map":[{"inputs":["$context(device).displayName"],"output":"deviceName"}]}'
  }
]

Important

The use of Kubernetes deployment manifests isn't supported in production environments and should only be used for debugging and testing.

{
  "datasets": [
    {
      "key": "device-metadata as device",
      "inputs": ["$source.deviceId", "$context.deviceId"],
      "expression": "$1 == $2"
    }
  ],
  "map": [
    {
      "inputs": ["$context(device).displayName"],
      "output": "deviceName"
    }
  ]
}

Each dataset entry has these properties:

Property	Required	Description
`key`	Yes	The state store key where the dataset records are stored. Supports an optional alias with the `as` keyword.
`inputs`	Yes	List of field references used in the match expression. Each entry uses a `$source.` or `$context.` prefix.
`expression`	Yes	A boolean expression that determines which dataset record matches the incoming message.

Key and alias

The key value is the state store key that the runtime reads. Assign a shorter alias with the as keyword. For example, datasets.parag10.rule42 as position lets you reference fields as $context(position).WorkingHours.

Dataset inputs

Each entry in the inputs array uses a prefix to indicate where the value comes from:

$source.<field>: reads from the incoming message.
$context.<field>: reads from the dataset record being evaluated.

Inputs can appear in any order and you can mix $source and $context references freely. Wildcard inputs aren't supported in dataset definitions.

Match expression

The expression evaluates to a boolean. The runtime loads the dataset from the state store as NDJSON (one JSON object per line), iterates through the records, and returns the first record where the expression evaluates to true.

If no record matches, the enrichment fields aren't available and any rule that depends on them is skipped for that message.

Use enriched data in rules

Reference matched record fields in any rule's inputs array using $context(<alias>).<fieldPath>.

Map example

Add map rules that reference enriched fields:

Input	Output
`$context(position).WorkingHours`	`WorkingHours`
`rawValue` and `$context(product).multiplier`	`adjustedValue` (expression: `$1 * $2`)

The enriched field references are part of the map rules in your graph.json config file. Add the rules to the transform node's configuration and apply the graph with az iot ops dataflowgraph apply:

"map": [
  {
    "inputs": [
      "$context(position).WorkingHours"
    ],
    "output": "WorkingHours"
  },
  {
    "inputs": [
      "rawValue",
      "$context(product).multiplier"
    ],
    "output": "adjustedValue",
    "expression": "$1 * $2"
  }
]

The enriched field references are part of the map rules JSON:

'{"datasets":[...],"map":[{"inputs":["$context(position).WorkingHours"],"output":"WorkingHours"},{"inputs":["rawValue","$context(product).multiplier"],"output":"adjustedValue","expression":"$1 * $2"}]}'

Important

The use of Kubernetes deployment manifests isn't supported in production environments and should only be used for debugging and testing.

- inputs:
    - "$context(position).WorkingHours"
  output: WorkingHours

- inputs:
    - rawValue                          # $1
    - "$context(product).multiplier"    # $2
  output: adjustedValue
  expression: "$1 * $2"

Filter example

Add a filter rule with inputs rawValue, $context(limits).multiplier, and $context(limits).baseLimit, and expression $1 * $2 > $3.

The CLI applies the whole graph from one config file. The rules are a JSON object:

{
  "datasets": [
    {
      "key": "device_limits as limits",
      "inputs": [
        "$source.deviceId",
        "$context.deviceId"
      ],
      "expression": "$1 == $2"
    }
  ],
  "filter": [
    {
      "inputs": [
        "rawValue",
        "$context(limits).multiplier",
        "$context(limits).baseLimit"
      ],
      "expression": "$1 * $2 > $3"
    }
  ]
}

Add these rules to the transform node's configuration in your graph.json as an escaped string in the value field, then apply it with az iot ops dataflowgraph apply:

"configuration": [
  {
    "key": "rules",
    "value": "{\"datasets\":[{\"key\":\"device_limits as limits\",\"inputs\":[\"$source.deviceId\",\"$context.deviceId\"],\"expression\":\"$1 == $2\"}],\"filter\":[{\"inputs\":[\"rawValue\",\"$context(limits).multiplier\",\"$context(limits).baseLimit\"],\"expression\":\"$1 * $2 > $3\"}]}"
  }
]

'{"datasets":[{"key":"device_limits as limits","inputs":["$source.deviceId","$context.deviceId"],"expression":"$1 == $2"}],"filter":[{"inputs":["rawValue","$context(limits).multiplier","$context(limits).baseLimit"],"expression":"$1 * $2 > $3"}]}'

Important

The use of Kubernetes deployment manifests isn't supported in production environments and should only be used for debugging and testing.

{
  "datasets": [
    {
      "key": "device_limits as limits",
      "inputs": ["$source.deviceId", "$context.deviceId"],
      "expression": "$1 == $2"
    }
  ],
  "filter": [
    {
      "inputs": ["rawValue", "$context(limits).multiplier", "$context(limits).baseLimit"],
      "expression": "$1 * $2 > $3"
    }
  ]
}

Branch example

Configure a branch rule with inputs quantity, $context(mult).factor, and $context(mult).threshold, and expression $1 * $2 > $3.

The CLI applies the whole graph from one config file. The rules are a JSON object:

{
  "datasets": [
    {
      "key": "multipliers as mult",
      "inputs": [
        "$source.productCode",
        "$context.productCode"
      ],
      "expression": "$1 == $2"
    }
  ],
  "branch": {
    "inputs": [
      "quantity",
      "$context(mult).factor",
      "$context(mult).threshold"
    ],
    "expression": "$1 * $2 > $3"
  }
}

Add these rules to the transform node's configuration in your graph.json as an escaped string in the value field, then apply it with az iot ops dataflowgraph apply:

"configuration": [
  {
    "key": "rules",
    "value": "{\"datasets\":[{\"key\":\"multipliers as mult\",\"inputs\":[\"$source.productCode\",\"$context.productCode\"],\"expression\":\"$1 == $2\"}],\"branch\":{\"inputs\":[\"quantity\",\"$context(mult).factor\",\"$context(mult).threshold\"],\"expression\":\"$1 * $2 > $3\"}}"
  }
]

'{"datasets":[{"key":"multipliers as mult","inputs":["$source.productCode","$context.productCode"],"expression":"$1 == $2"}],"branch":{"inputs":["quantity","$context(mult).factor","$context(mult).threshold"],"expression":"$1 * $2 > $3"}}'

Important

The use of Kubernetes deployment manifests isn't supported in production environments and should only be used for debugging and testing.

{
  "datasets": [
    {
      "key": "multipliers as mult",
      "inputs": ["$source.productCode", "$context.productCode"],
      "expression": "$1 == $2"
    }
  ],
  "branch": {
    "inputs": ["quantity", "$context(mult).factor", "$context(mult).threshold"],
    "expression": "$1 * $2 > $3"
  }
}

Wildcards with datasets

In map rules, use $context(<alias>).* to copy all top-level fields from the matched dataset record:

Add a map rule with input $context(device).* and output *.

The CLI applies the whole graph from one config file, so add this to the corresponding place in your graph.json and apply it with az iot ops dataflowgraph apply:

{
  "inputs": [
    "$context(device).*"
  ],
  "output": "*"
}

{
  inputs: [ '$context(device).*' ]
  output: '*'
}

Important

The use of Kubernetes deployment manifests isn't supported in production environments and should only be used for debugging and testing.

- inputs:
    - "$context(device).*"
  output: "*"

You can also target a nested object within the dataset record. For example, $context(device).configuration.* copies only the fields under configuration.

Wildcard enrichment inputs are supported only in map rules. Filter and branch rules don't support wildcard inputs.

Set up the state store

The runtime reads dataset records from the Azure IoT Operations distributed state store. Each dataset key maps to one or more records in NDJSON format (one JSON object per line). The runtime caches records and receives change notifications, so state store updates are reflected in processing.

For information on configuring the distributed state store, see State store overview.

Deploy a data flow graph with enrichment

In the Operations experience, create a data flow graph with enrichment:

Add a source that reads from your MQTT topic.
Add a map transform. In the dataset configuration, add a dataset with the state store key and match condition.
In the map rules, reference enriched fields using $context(<alias>).<field> syntax.
Add a destination that sends to your output topic.

The Azure CLI applies a data flow graph from a single JSON config file. Create a graph.json file with the graph properties. In the graph.json file, each transform's rules are stored in the value field as an escaped JSON string. For the readable form of each transform's rules, see the how-to for that transform type.

{
  "mode": "Enabled",
  "nodes": [
    {
      "nodeType": "Source",
      "name": "sensors",
      "sourceSettings": {
        "endpointRef": "default",
        "dataSources": [
          "telemetry/sensors"
        ]
      }
    },
    {
      "nodeType": "Graph",
      "name": "enrich-and-map",
      "graphSettings": {
        "registryEndpointRef": "default",
        "artifact": "azureiotoperations/graph-dataflow-map:1.0.0",
        "configuration": [
          {
            "key": "rules",
            "value": "{\"datasets\":[{\"key\":\"device-metadata as device\",\"inputs\":[\"$source.deviceId\",\"$context.deviceId\"],\"expression\":\"$1 == $2\"}],\"map\":[{\"inputs\":[\"*\"],\"output\":\"*\"},{\"inputs\":[\"$context(device).displayName\"],\"output\":\"deviceName\"},{\"inputs\":[\"$context(device).___location\"],\"output\":\"___location\"}]}"
          }
        ]
      }
    },
    {
      "nodeType": "Destination",
      "name": "output",
      "destinationSettings": {
        "endpointRef": "default",
        "dataDestination": "telemetry/enriched"
      }
    }
  ],
  "nodeConnections": [
    {
      "from": {
        "name": "sensors"
      },
      "to": {
        "name": "enrich-and-map"
      }
    },
    {
      "from": {
        "name": "enrich-and-map"
      },
      "to": {
        "name": "output"
      }
    }
  ]
}

Apply the config file. The extendedLocation is added automatically from the instance and resource group, so don't include it in the file.

az iot ops dataflowgraph apply \
  --name enrich-example \
  --instance <INSTANCE_NAME> \
  --resource-group <RESOURCE_GROUP> \
  --config-file graph.json

resource dataflowGraph 'Microsoft.IoTOperations/instances/dataflowProfiles/dataflowGraphs@2026-03-01' = {
  name: 'enrich-example'
  parent: dataflowProfile
  properties: {
    mode: 'Enabled'
    nodes: [
      {
        nodeType: 'Source'
        name: 'sensors'
        sourceSettings: {
          endpointRef: 'default'
          dataSources: [ 'telemetry/sensors' ]
        }
      }
      {
        nodeType: 'Graph'
        name: 'enrich-and-map'
        graphSettings: {
          registryEndpointRef: 'default'
          artifact: 'azureiotoperations/graph-dataflow-map:1.0.0'
          configuration: [
            {
              key: 'rules'
              value: '{"datasets":[{"key":"device-metadata as device","inputs":["$source.deviceId","$context.deviceId"],"expression":"$1 == $2"}],"map":[{"inputs":["*"],"output":"*"},{"inputs":["$context(device).displayName"],"output":"deviceName"},{"inputs":["$context(device).___location"],"output":"___location"}]}'
            }
          ]
        }
      }
      {
        nodeType: 'Destination'
        name: 'output'
        destinationSettings: {
          endpointRef: 'default'
          dataDestination: 'telemetry/enriched'
        }
      }
    ]
    nodeConnections: [
      { from: { name: 'sensors' }, to: { name: 'enrich-and-map' } }
      { from: { name: 'enrich-and-map' }, to: { name: 'output' } }
    ]
  }
}

Important

The use of Kubernetes deployment manifests isn't supported in production environments and should only be used for debugging and testing.

apiVersion: connectivity.iotoperations.azure.com/v1
kind: DataflowGraph
metadata:
  name: enrich-example
  namespace: azure-iot-operations
spec:
  profileRef: default
  nodes:
    - nodeType: Source
      name: sensors
      sourceSettings:
        endpointRef: default
        dataSources:
          - telemetry/sensors

    - nodeType: Graph
      name: enrich-and-map
      graphSettings:
        registryEndpointRef: default
        artifact: azureiotoperations/graph-dataflow-map:1.0.0
        configuration:
          - key: rules
            value: |
              {
                "datasets": [
                  {
                    "key": "device-metadata as device",
                    "inputs": ["$source.deviceId", "$context.deviceId"],
                    "expression": "$1 == $2"
                  }
                ],
                "map": [
                  { "inputs": ["*"], "output": "*" },
                  { "inputs": ["$context(device).displayName"], "output": "deviceName" },
                  { "inputs": ["$context(device).___location"], "output": "___location" }
                ]
              }

    - nodeType: Destination
      name: output
      destinationSettings:
        endpointRef: default
        dataDestination: telemetry/enriched

  nodeConnections:
    - from: { name: sensors }
      to: { name: enrich-and-map }
    - from: { name: enrich-and-map }
      to: { name: output }

Limitations

Not supported in window transforms. Enrichment datasets aren't available in window (accumulate) transforms.
First match wins. The runtime uses the first record where the expression evaluates to true.
Missing matches skip enriched rules. If no dataset record matches, rules that reference $context(<alias>) fields are skipped. The transformation doesn't fail.
State store errors propagate. If the state store is unreachable, the transformation fails for that message.
No wildcard inputs in dataset definitions. Each input must be a specific $source.<field> or $context.<field> reference.

Next steps

Feedback

Was this page helpful?

Last updated on 2026-06-23

Enrich data with external datasets in data flow graphs

Prerequisites

What is enrichment

Configure a dataset

Key and alias

Dataset inputs

Match expression

Use enriched data in rules

Map example

Filter example

Branch example

Wildcards with datasets

Set up the state store

Deploy a data flow graph with enrichment

Limitations

Next steps

Feedback

Additional resources