Skip to content
Closed
Changes from 3 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
b29e3da
Created draft of messaging-first-architecture
gayatri-potawad Jun 24, 2025
89cc4c8
Update messaging-first-architecture.md
gayatri-potawad Jun 24, 2025
1d84b2c
Updated cover image for messaging-first-architecture.md
gayatri-potawad Jun 24, 2025
84123b9
Addresses messaging-first-architecture.md
gayatri-potawad Jun 26, 2025
9fab6a7
Removed date metadata in messaging-first-architecture.md
gayatri-potawad Mar 9, 2026
9c670b7
removed for restructuring
Mar 9, 2026
d5e8bb3
feat: add self hosted blog
Cybertron1 Jun 16, 2025
12af655
feat: move published articles
Cybertron1 Jun 17, 2025
4643eaa
feat: add cf wrangler json
Cybertron01Z Jun 17, 2025
9c92bd9
feat: add embedded stackblitz component
Cybertron1 Jul 15, 2025
2f74919
feat: add moiree article
Cybertron1 Jul 16, 2025
85a5567
feat: add shortDescription to all articles
Cybertron1 Jul 16, 2025
e203902
feat: improve styling
Cybertron1 Jul 16, 2025
c9a7f60
feat: add social sharing meta tags
Cybertron1 Jul 16, 2025
d8862d9
feat: add proper astro URL
Cybertron1 Jul 16, 2025
75d31fa
feat: add improved social sharing
Cybertron1 Jul 17, 2025
e2661c7
feat: add posthog
Cybertron01Z Aug 5, 2025
df6c435
fix: try to fix prefetching articles
Cybertron01Z Aug 5, 2025
d05b167
fix: improve phone viewport css
Cybertron01Z Aug 6, 2025
a6aa248
fix: improve tagline to be more inclusive
Cybertron01Z Aug 6, 2025
c1efdb6
docs: improve readme to explain the astro blog
Cybertron01Z Aug 6, 2025
61e66dd
feat: add sitemap
Cybertron01Z Aug 6, 2025
302230b
First draft
immo-huneke-zuhlke Jun 20, 2025
310af51
Added save as draft flag
immo-huneke-zuhlke Jun 20, 2025
4688478
Minor corrections and checks
immo-huneke-zuhlke Jun 20, 2025
683c2ce
Move filesystem mirror to a safer location
immo-huneke-zuhlke Jun 23, 2025
0d0a16e
Mention virus protection
immo-huneke-zuhlke Jun 23, 2025
33b183a
Update terraform-with-provider-from-filesystem.md
immo-huneke-zuhlke Jul 1, 2025
c4cc085
After feedback from reviewer timouti (TispBe)
immo-huneke-zuhlke Jul 30, 2025
1937607
Add disagreeing without dividing article
hayrettin-mavis Jul 22, 2025
7cde29d
Address comments for Disagreeing without dividing article
hayrettin-mavis Aug 5, 2025
b82adaf
Update README.md
tispBe Aug 12, 2025
99b2a40
Update README.md
tispBe Aug 12, 2025
04a8892
release resize observer article
Cybertron01Z Aug 13, 2025
8a05f2a
fix: change links pointing to hashnode domain to relative paths
culas Aug 15, 2025
3f43a92
Fix documentation links in the README
eXpl0it3r Sep 3, 2025
af8cd76
release disagreeing without deviding
Cybertron01Z Sep 15, 2025
2749eff
fix description
Cybertron01Z Sep 15, 2025
1f0a023
add AUTHORS.md file
Cybertron01Z Sep 19, 2025
baf8c42
Add Djordje Madic to AUTHORS.md
Sep 22, 2025
f903cc5
add robots.txt
Cybertron01Z Sep 22, 2025
d551697
release terraform article
Cybertron01Z Sep 30, 2025
867b6da
Belimo and Zuhlke: How to win with Flutter in production
Sep 18, 2025
b6ddca9
Update images
Sep 18, 2025
11ada20
Add missing diagram
Sep 18, 2025
0ae00c3
Rename images
Sep 18, 2025
8dff847
Fix
Sep 18, 2025
8c2b1b0
Fix
Sep 18, 2025
3dfe5d3
Add links
Sep 19, 2025
328c823
Add metadata
Sep 19, 2025
2fac022
Reduce image size
Sep 19, 2025
559d2f9
Remove image duplicates
Sep 19, 2025
cf726d3
Compress images
Sep 19, 2025
3762c39
Fix title
Sep 19, 2025
2cdd929
Sentence per line
Sep 19, 2025
5e44fea
Add desktop to tags
Sep 20, 2025
546da0e
Change release date to October 27, 2025
Oct 27, 2025
850f418
Update title for Flutter in Production article
Oct 28, 2025
3ddcfe1
Add Tim Grünewald to the AUTHORS.md file
timgruenewald Nov 12, 2025
9e22e8b
Adding patw to AUTHORS.md
patrickwilmes Nov 11, 2025
d731fb7
Created draft of messaging-first-architecture
gayatri-potawad Jun 24, 2025
46ded7b
Update messaging-first-architecture.md
gayatri-potawad Jun 24, 2025
eb2273f
Updated cover image for messaging-first-architecture.md
gayatri-potawad Jun 24, 2025
1b90aec
Addresses messaging-first-architecture.md
gayatri-potawad Jun 26, 2025
a782b40
Removed date metadata in messaging-first-architecture.md
gayatri-potawad Mar 9, 2026
1e6d87d
removed for restructuring
Mar 9, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions drafts/messaging-first-architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
title: "Messaging-First Architectures: Resilient Systems with Azure Service Bus"
date: "2025-06-24"
Comment thread
gayatri-potawad marked this conversation as resolved.
Outdated
saveAsDraft: true
hideFromHashnodeCommunity: false
publishAs: gayatri-potawad
tags:
Comment thread
gayatri-potawad marked this conversation as resolved.
Outdated
- Azure
- architecture
- software-architecture
- messaging
- motivation
- microservice
enableToc: true
cover: https://cdn.hashnode.com/res/hashnode/image/upload/v1750774772267/_rMXC2BJP.jpg?auto=format
---

# Messaging-First Architectures: Resilient Systems with Azure Service Bus


In one of my recent projects, I worked on a large-scale retail platform where nearly every critical business flow from orders to inventory updates relied on Azure Service Bus. This was my first dive into a messaging-first architecture on Azure.

This blog is my attempt to capture what I learned and design principles that shaped the system and hopefully help anyone walking a similar path, especially if you’re transitioning from synchronous REST-based APIs to asynchronous messaging.


## 1. Azure Service Bus

Azure Service Bus is a fully managed enterprise message broker that enables decoupled communication between services using queues and topics.
If you’ve worked with something like ActiveMQ, Kafka, or RabbitMQ, a lot will feel familiar, but Azure adds cloud-native features like auto-scaling, integration with Azure Functions, and dead-letter handling.


## 2. Why & When Messaging-First?

In most systems I’ve worked on, HTTP APIs were the go-to service A calls service B, often in a tightly coupled chain. That works fine for many workflows, especially when you need quick, direct responses. But in a recent project, we leaned into a messaging-first approach using Azure Service Bus. Instead of services calling each other directly, they communicated through messages and that changed a lot.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This first sentence looks a bit non-grammatical. How about:

In most systems I’ve worked on, HTTP APIs were the go-to architectural option, in which service A calls service B, often in a tightly coupled chain.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "In most of the systems I’ve worked on, HTTP APIs were the standard architectural approach, where service A calls service B often in a tightly coupled sequence."


It wasn’t about replacing REST, but about picking the right model for the problem.
Messaging brought clear benefits in areas like:
- Decoupling services so they could evolve independently.
- Smoothing out traffic spikes with queues.
- Handling retries and failures more gracefully.

That said, messaging isn't a silver bullet. It introduces latency and adds complexity in tracking, ordering, and debugging.
But where it fits, especially in async-heavy workflows, it can make systems more resilient and scalable.

For me, messaging-first became less about abandoning APIs, and more about using the right tool where it made sense.


## 3. Designing Around the Bus

In a messaging-first architecture, the Service Bus becomes the backbone of your system. Services are designed to react to messages, rather than respond to requests.

We might think "If everything goes through Service Bus, isn’t that a single point of failure?”

The reality is, Azure Service Bus (especially on the Premium tier) is built for high availability.
It’s redundant across zones, fully managed, and handles all the scaling, patching, and infrastructure stuff behind the scenes.
You’re not babysitting a broker; Microsoft does that for you.

That said, putting messaging at the center of your system does mean you have to take it seriously.
Things like Dead Letter Queues, lock timeouts, or message retries can become blind spots if you’re not monitoring them properly.
Team had to invest in observability early; logs, alerts, correlation IDs to make sure we weren’t flying blind.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing article. Use "The team" instead of just "Team".


So yes, Service Bus is central. But with the right setup, it’s not fragile. In fact, it ended up being reliable parts of the stack.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it one part or several parts? This sentence doesn't quite make sense. How about:

In fact, it ended up being one of the most reliable parts of the stack.

Or even

In fact, it ended up being the most reliable part of the stack.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes right , changed to "In fact, it ended up being the most reliable part of the stack."


Overview of a simple ordering service with minimal processes

![Overview of a simple ordering service with minimal processes ](https://cdn.hashnode.com/res/hashnode/image/upload/v1750763150332/cwv8WTrXn.png?auto=format)


## 4. DLQs Done Right

Dead-letter queues (DLQs) are where messages end up when something goes wrong — too many delivery attempts, serialization issues, or unhandled exceptions.
In our case, DLQs turned out to be a quiet but critical signal.

We started seeing messages pile up in the DLQ, with reasons like "Max Delivery Attempts Exceeded." At first glance, it wasn’t obvious what the problem was — the functions were technically healthy. But when we dug deeper, we realized that the Azure Service Bus was retrying deliveries because our functions were simply taking too long to respond.
Not because they failed but because they slowed down under high CPU load.

The root cause? Several functions running in the same App Service Plan were fighting for compute.
CPU was hitting 100%, and as a result, some functions would time out after Azure Service Bus’s default 5-minute lock duration.
Since there weren’t clear diagnostic logs from Service Bus indicating a timeout, we had to correlate it ourselves using App Insights and DLQ metadata.

The fix: We tuned the App Service to auto-scale more aggressively aiming to bring the CPU load down within 10 minutes (two timeouts) instead of letting it hover for 30 minutes (more than 5 timeouts).
Once that was in place, the DLQ entries dropped, and message flow stabilized.

Moral of the story: DLQs don’t just catch errors they reveal when your system is struggling.
They can help you fine-tune not just code but scaling policies too.


## 5. Retry Strategies

Azure Service Bus provides built-in retry handling, but you can (and often should) tune it.

- maxDeliveryCount controls how many times a message is retried before DLQ.
- Set autoComplete to false so you can complete processing only on success.
- Use custom retry queues or scheduled retries for long tail errors.

Coming from Java, this felt a bit like using Spring Retry but without needing annotations, you control retries in your message loop or function binding.


## 6. Observability + Fail-Safes

A messaging-first system only works if you can see what’s happening.
- Enable diagnostic settings to stream logs and metrics to Log Analytics.
- Add Application Insights and propagate correlation IDs.
- Include message IDs and payloads (truncated!) in logs for traceability.
- Track processing times and delivery counts to detect slow consumers.

Don’t treat observability as an afterthought. When a message fails silently, it’s hard to debug unless you’ve wired in visibility.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it's worth adding a note of caution about spoiling the ship for a ha'porth of tar - some product managers believe that they can save money by eschewing the use of logging, telemetry and analytics, but these savings are usually outweighed by the wasted effort of tracking down the problems that inevitably occur.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh really like it, I want to use that line :)

Added

⚠️ Note of Caution:
Don’t treat observability as an afterthought - it’s a classic case of spoiling the ship for a ha’porth of tar. Skimping on logging and telemetry might save a little now, but it'll cost far more when failures strike and you're flying blind.




## 7. Gotchas to Avoid

Even with a solid design, there are a few sharp edges in messaging-first systems; here are the mistakes we ran into (so you hopefully don’t have to):

- Ignoring DLQs
It’s easy to treat DLQs like a trash bin. DLQs often surface subtle bugs, timeouts, or performance issues we might otherwise miss. We learned to monitor them like a first-class signal.

- Sending Large Messages
Messages over 256 KB can silently fail. While we didn’t hit this ourselves, it’s a common pitfall.
If you’re close to the limit, compress the payload or store large data in blob storage and just pass a reference.

- Lock Timeouts
By default, a message lock lasts 30 seconds. If your function or processor takes longer, Azure will think it failed and redeliver the message. We observed implementing lock renewal, increases processing efficiency to avoid duplicate executions.



## Wrapping Up

This project really changed the way I think about service communication. Messaging-first isn’t just about queues and topics. It’s about designing for resilience, decoupling, and scale from day one.

But here’s the nuance: messaging-first doesn’t mean messaging-only.

Some interactions are still best done synchronously like fetching user details for a UI in real time or validating input. The real strength comes from knowing where async fits best: background jobs, cross-system workflows, retries, or anything that shouldn’t block the user.

System can be hybrid. It’s not one or the other. It’s about picking the right tool for the job.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System --> Systems?


If you're building distributed systems on Azure, or transitioning from a synchronous mindset like I was, I hope this gives you a good head start.