Event-Driven Architecture and Message Queues
An order in the shop, a stock movement in the warehouse, a paid invoice in accounting: in a connected commerce landscape, each of these events triggers a chain of follow-up actions in other systems. If these systems are wired together directly and synchronously, a fragile mesh emerges in which the failure of a single service brings the whole chain to a halt. Event-driven architecture inverts this principle: instead of one system calling another directly, it publishes an event to a message queue, and interested consumers react at their own pace. German B2B e-commerce, including electronic data interchange, has reached a volume of around 1.4 trillion euros (ECC Cologne / IFH Cologne, B2B Market Monitor 2024), and with that volume the demand for resilient, decoupled data flows rises. This article shows how events and message queues enable decoupled and resilient shop integrations, and how retry, ordering and idempotency play together cleanly. If you want to dig into the underlying push mechanism, the article on webhooks and polling provides the foundation, and our API development puts these patterns into practice.
Why Direct Point-to-Point Coupling Hits Its Limits
The simplest form of integration is the direct call: on order completion, the shop synchronously calls the ERP interface and waits for the response. As long as both systems are fast and available, this works. But in reality, the ERP is in a maintenance window, the database is under load or the network is briefly disrupted. In a synchronous chain, every one of these cases means the order in the shop fails or hangs, even though the customer has nothing to do with the shop itself. The more systems are involved, the faster these dependencies compound into a system that is only as strong as its weakest link.
On top of this comes the sheer number of connections. An average enterprise runs over 900 applications, of which only about 28 percent are integrated at all (MuleSoft Connectivity Benchmark, 2024). If every system is wired directly to every other, the number of interfaces grows quadratically. This is exactly where decoupling via a message queue comes in: every system now only talks to the broker, no longer to each partner individually. A mesh of N-by-N connections becomes a hub-and-spoke model that stays manageable with each additional integration. The economic relevance is considerable, as 81 percent of respondents see integration obstacles as a brake on digital transformation (MuleSoft Connectivity Benchmark, 2024).
Coupling Is Not a Purely Technical Detail
How Event-Driven Architecture and Message Queues Work
At its core, an event-driven integration consists of three roles. A producer creates an event, such as order-received, and places it on a message queue or topic. The broker, the queue infrastructure itself, receives the event, stores it durably and makes it available. One or more consumers read the event and process it, each for its own purpose. The decisive difference from direct coupling is the temporal and spatial decoupling: the producer does not need to know who reads the event, how many consumers there are, or whether they are currently available.
Two basic distribution patterns are common. With a classic work queue, each message is delivered to exactly one of a group of consumers to share the load. With the publish-subscribe model, every interested subscriber receives its own copy of the event, so the same order event can reach the ERP, accounting and warehouse logic simultaneously. Which pattern fits depends on the data flow. Market dynamics underline the importance of these building blocks: the message queue service market grows at an annual rate of around 17.6 percent and is projected to rise from 1.41 billion US dollars in 2024 to about 1.66 billion in 2025 (The Business Research Company, 2025).
Producer
The source system, such as the shop, publishes a business event. It knows only the broker, not the recipients, and is free again for the next operation immediately after writing.
Broker and Queue
The infrastructure accepts events, holds them durably and buffers load spikes. It decouples the speed of producers from the speed of consumers.
Consumer
The target system reads events at its own pace, processes them and acknowledges receipt. If it fails, the events remain safely in the queue.
Work Queue
Each message goes to exactly one consumer of a group. This allows load to be distributed horizontally by having more workers process the same queue.
Publish-Subscribe
Each subscriber receives its own copy of the event. An order event can thus trigger several downstream processes in parallel without the producer knowing.
Event Log
Some brokers store events as an ordered, re-readable stream. New consumers can re-read the history and rebuild their state from the event stream.
These building blocks are not tied to a specific product. Whether a lightweight broker suffices for a modest integration or a distributed event log is needed for high throughput is decided by the load and currency requirements. More important than the choice of tool is that the architecture consistently leverages the three properties of decoupling, buffering and replayability. A well-designed middleware implements exactly this layer between shop and ERP.
Decoupling and Load Buffering as a Stability Gain
Perhaps the greatest practical advantage of a queue is its function as a load buffer. In commerce, load spikes are the rule, not the exception: a discount campaign, a newsletter send or the Christmas season can multiply the number of orders within minutes. A synchronously connected ERP would have to absorb this spike in real time, otherwise order acceptance collapses. A queue, by contrast, accepts events as fast as they arrive and lets consumers process them at their sustainable pace. The spike is stretched over time instead of overrunning the slowest system.
How real this risk is shown by an analysis of production incidents: around 34 percent of event-driven systems could not handle load spikes beyond three times their baseline load (arXiv, 2025). That sounds sobering at first, but it underlines exactly the point: a queue alone does not solve the problem automatically; it must be deliberately designed for buffering, throttling and horizontal scaling of consumers. Properly sized, it turns a dangerous spike into a manageable wave. At the same time, the need for exactly this capability is growing, as IDC estimates that by 2025 around 90 percent of the world's largest companies will work with real-time data (IDC, cited in industry analysis 2025).
| Property | Direct synchronous coupling | Event-driven with queue |
|---|---|---|
| Failure of a target | blocks the entire chain | events remain in the queue |
| Load spike | hits target system unbuffered | is buffered over time |
| New consumer | change the source code | add a subscriber, source untouched |
| Latency | instant, while everything runs | seconds, asynchronous |
| Error retry | caller must solve it itself | broker and consumer handle it |
| Coupling | tight, shared availability | loose, independent operation |
Decoupling also has an organizational effect. Because the producer knows nothing about the consumers, a new downstream process, such as a reporting service or a tax integration, can be added as an additional subscriber without touching the shop. This extensibility is a key reason why event-driven approaches prevail in complex ERP integrations. They turn point integration into a reusable infrastructure.
Delivery Guarantee, Retry and Dead Letter Queue
A message queue does not promise that every processing succeeds on the first attempt, but that no accepted event is silently lost. The pragmatic standard is at-least-once delivery: a consumer reads a message, processes it and only then acknowledges receipt with an ACK or commit. Only this acknowledgement removes the message from the queue. If the consumer crashes before the acknowledgement, the message is retained and delivered again. The price of this safety is possible duplicate deliveries, which are defused by idempotency on the receiver side.
If processing fails repeatedly, for example because a target system is unreachable for longer, a retry with exponential backoff kicks in. After the first failed attempt, the consumer waits briefly, then progressively longer, supplemented by a random spread (jitter) so that not all backed-up messages restart simultaneously and overload the just-recovered system again. Events that cannot be processed even after all attempts move to a dead letter queue. There they rest safely for later analysis, trigger an alert and do not block the processing of subsequent, healthy messages.
// Simplified consumer with ACK, retry counting and DLQ
async function handleMessage(msg, channel) {
const attempt = (msg.headers['x-attempt'] || 0) + 1;
try {
await processOrder(msg.body); // business processing
channel.ack(msg); // acknowledge only AFTER success
} catch (err) {
if (attempt >= MAX_ATTEMPTS) {
channel.publish('orders.dlq', msg, { reason: String(err) });
channel.ack(msg); // remove from main queue, secure in DLQ
} else {
const delay = backoffWithJitter(attempt);
channel.nackWithDelay(msg, delay, { 'x-attempt': attempt });
}
}
}Acknowledge Only After Successful Processing
This mechanism makes an event-driven integration robust against the normal cases of operation, namely maintenance windows, deployments and short-lived disruptions. An in-depth treatment of retry logic and the question of how many attempts make sense is available in the sister article on idempotency and retry strategies. It also explains how backoff curves can be adapted to the recovery time of the target systems.
Preserving Order: Why Sequence Matters in Commerce
In many business processes, the order of events is functionally significant. If a change is reported for an order and then a cancellation, swapped processing leads to a wrong final state. The same applies to consecutive stock movements of the same item or to status changes of a shipment. A naive parallel processing across many consumers can tear apart the original order because faster workers overtake slower ones.
The usual solution is partitioned ordering. Instead of enforcing a global sort across all events, which would negate parallelism, events are grouped by a key. All events for one order number or one item number land in the same partition and are processed there strictly in sequence. Different keys are distributed across different partitions and processed in parallel. This preserves order where it counts without sacrificing overall throughput.
- Choose the partition key carefully: the order or item number is usually a good fit because it bundles all related events and spreads evenly.
- Avoid global ordering where it is not needed: a system-wide total order is expensive and rarely required. Order per business object is usually enough.
- Carry a version or sequence number: a running number in the event lets the consumer detect and discard stale or duplicate messages.
- Idempotency as a safety net: even with correct partitioning, a retry can deliver an already-processed event again. Idempotent processing catches this.
Order and Parallelism Are Not a Contradiction
Idempotency: Making Duplicate Deliveries Harmless
Because at-least-once delivery explicitly accepts duplicate deliveries, idempotency on the receiver side is the decisive counterpart. Idempotent means that processing the same event multiple times produces the same final state as processing it once. An order must not be created twice, a payment must not be booked twice and a stock level must not be reduced twice, even if the same event arrives twice for technical reasons.
The practical means is a unique idempotency key per event, such as the order number combined with the event type. The consumer maintains a table of already-processed keys and checks every incoming message against it. If the key is known, the message is acknowledged but not processed again. It is important that the check and the business processing happen in the same transaction, so that no gap arises between checking and writing through which a second delivery could slip.
-- Idempotent processing within one transaction
BEGIN;
-- Reserve the key; on conflict the event was already processed
INSERT INTO processed_events (event_key)
VALUES ('order:10245:created')
ON CONFLICT (event_key) DO NOTHING;
-- Only continue if the INSERT created a new row
-- (otherwise: event is a duplicate, skip processing)
INSERT INTO erp_orders (order_no, payload)
SELECT '10245', '{...}'
WHERE EXISTS (
SELECT 1 FROM processed_events WHERE event_key = 'order:10245:created'
);
COMMIT;Idempotency is therefore not an add-on feature but a basic condition of every resilient event-driven integration. It allows the simpler and more robust at-least-once delivery to be used instead of striving for the costly exactly-once that is hard to achieve across system boundaries. How to concretely design idempotent endpoints and what role idempotency keys play in REST interfaces is explored in the article on idempotency and retry strategies.
In Practice: Building an Event-Driven Shop-to-ERP Integration
In a real integration between shop and ERP, there is not one data flow but many, each with its own requirements for currency, ordering and volume. The job of the integration layer is to model these flows as events and route them through the broker. The typical setup follows a clear pattern that can be introduced step by step without interrupting ongoing operations.
- Cut events along business lines: define clear, business events such as order-received or stock-changed instead of passing through technical database updates. Each event carries a unique key.
- Separate acceptance from processing: the entry point only writes the event to the queue and acknowledges immediately. The heavy processing toward the ERP runs asynchronously through workers.
- Define partitioning: partition order-critical flows such as status changes by the order number so that the sequence per order is preserved.
- Anchor idempotency: every consumer checks the idempotency key and processes each event at most once effectively, even on duplicate delivery.
- Set up the failure path: retry with backoff and jitter, a dead letter queue with alerting and a procedure for controlled replay after the cause is fixed.
- Ensure observability: monitor queue depth, processing time and DLQ fill level to detect bottlenecks early, before they turn into incidents.
This step-by-step approach makes it possible to first switch the most critical flow, such as order receipt, to event-driven and to follow up with the rest later. That such investments pay off is shown by a look at the cost side: IT teams spend more than a third of their time on integration projects, and custom integrations cost large enterprises considerable sums in annual labour effort on average (MuleSoft Connectivity Benchmark, 2024). A reusable, event-driven layer noticeably reduces this recurring effort. In over 50 integration projects (project experience), the setup described here has proven to be a resilient standard. If you want to weigh the difference between direct API coupling and a mediating layer, the article on REST API versus middleware provides a classification.
The question is rarely whether one system can call another, but what happens when the other system is not responding right now. A queue answers exactly this question by safely holding the event until the recipient is ready again.
For all its robustness, maturity remains a topic: only around 13 percent of companies consider themselves at a mature stage in adopting event-driven architectures (industry analysis, 2024). At the same time, this means that a cleanly implemented event-driven data flow can be a genuine differentiator. An accompanying integration consultation helps to align the order of the migration with your concrete processes and to size the architecture neither too small nor too large.