Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
da0eea9
dkim check
davidhuser Mar 12, 2026
e27cc9b
Merge pull request #5 from davidhuser/feat/dkim-classification
davidhuser Mar 12, 2026
fbb5fe4
confirm ms365 via getuserrealm.srf
davidhuser Mar 12, 2026
5093d50
fetch mun. email via bfs, wikidata, scrape and review+override
davidhuser Mar 14, 2026
34dd56c
fix zh
davidhuser Mar 14, 2026
a9f928a
override unknown muni.
davidhuser Mar 14, 2026
c6cbfb3
weighting v1
davidhuser Mar 15, 2026
82975fc
wip v2£
davidhuser Mar 15, 2026
aa643af
wip v2 flattened
davidhuser Mar 15, 2026
208b7bf
consider txt verification as confirmation signal
davidhuser Mar 15, 2026
12df49c
autodiscover, map
davidhuser Mar 15, 2026
631741d
fix typo3 scrape
davidhuser Mar 15, 2026
2c7dbd3
fix empty tooltips by storing raw spf
davidhuser Mar 15, 2026
5bb6d85
classify spf_ip as confirmation only
davidhuser Mar 15, 2026
b9b8e4d
logging
davidhuser Mar 16, 2026
c4bbe26
remove double mxlookup
davidhuser Mar 16, 2026
70d17ef
overnight run
davidhuser Mar 17, 2026
59b75c6
logging improvements
davidhuser Mar 17, 2026
4f6ac3c
update readme
davidhuser Mar 17, 2026
e9672f1
simplify weighted scores in classifier
davidhuser Mar 17, 2026
a312c03
map colors for dev
davidhuser Mar 17, 2026
058ea70
map colors
davidhuser Mar 18, 2026
1db30f9
legend fix
davidhuser Mar 18, 2026
862a365
add new view for ms365 tenant
davidhuser Mar 18, 2026
26cd52e
refactor frontend files
davidhuser Mar 18, 2026
4cdc574
menu
davidhuser Mar 18, 2026
cc4f94d
navigation
davidhuser Mar 18, 2026
ce0b120
improve classifier
davidhuser Mar 18, 2026
632174c
improve resolver robustness
davidhuser Mar 18, 2026
5f5e065
improve classifier and fix gaps in combinations
davidhuser Mar 18, 2026
d151b93
follow redirects
davidhuser Mar 19, 2026
bad8bf5
fix malformed domain b/c resolvers will cough
davidhuser Mar 19, 2026
f5593fa
boost signals for independents too
davidhuser Mar 19, 2026
39df85e
add canton code to tooltip
davidhuser Mar 19, 2026
b0fb5ac
improve domain -> country detection for gateways
davidhuser Mar 19, 2026
a01b1f5
improve resolver to reduce double cname lookups and logging
davidhuser Mar 19, 2026
608473a
log to file
davidhuser Mar 19, 2026
0954cc8
reduce dns calls for spf and spf_raw
davidhuser Mar 19, 2026
94fb700
data run
davidhuser Mar 19, 2026
4f51ae3
reduce cdns to one
davidhuser Mar 19, 2026
c9e5e83
preview pages
davidhuser Mar 19, 2026
ea6df99
preview pages
davidhuser Mar 19, 2026
0649c10
images
davidhuser Mar 19, 2026
4d4cca9
readme
davidhuser Mar 19, 2026
6346c71
tests cleanup
davidhuser Mar 19, 2026
b66108b
boost dkim behind gateways, issue #13
davidhuser Mar 20, 2026
f423971
improve tooltip ux
davidhuser Mar 20, 2026
f35578e
css cleanup
davidhuser Mar 20, 2026
3e80e2a
git hash into info-bar
davidhuser Mar 20, 2026
c76f4fd
update fork deployments in readme
davidhuser Mar 20, 2026
44898a8
lower base when no mx found at all in independents
davidhuser Mar 23, 2026
d0eb6af
bump mx-only to 0.8
davidhuser Mar 23, 2026
a937f8d
introduce autodiscovery to rule chain
davidhuser Mar 23, 2026
a026acb
display classification rule hit count
davidhuser Mar 23, 2026
bdd66cf
remove 0-hit rules
davidhuser Mar 23, 2026
dd3b96f
a few more rules and removed those with 0 counts
davidhuser Mar 23, 2026
6de73af
fix email detection for user(at)domain.ch and add override for ssl ex…
davidhuser Mar 24, 2026
b16aaf0
handle ssl cert errors
davidhuser Mar 24, 2026
e3753f0
overrides
davidhuser Mar 24, 2026
1dfa606
colors
davidhuser Mar 24, 2026
8cd2614
switch rule to prefer autodiscover in rule summary
davidhuser Mar 24, 2026
7fe98f6
github release workflow to tag on push to main
davidhuser Mar 24, 2026
26db692
colors
davidhuser Mar 24, 2026
61309b3
run analysis module
davidhuser Mar 24, 2026
9c7d25f
detect mx.microsoft
davidhuser Mar 24, 2026
1432d4f
images
davidhuser Mar 24, 2026
dd4c898
format
davidhuser Mar 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 0 additions & 75 deletions .github/workflows/nightly.yml

This file was deleted.

26 changes: 26 additions & 0 deletions .github/workflows/preview.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: Deploy preview

on:
push:
branches: [dev]

jobs:
preview:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6

- name: Create Cloudflare Pages project (if needed)
uses: cloudflare/wrangler-action@v3
with:
apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }}
accountId: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
command: pages project create mxmap-ch-preview --production-branch=dev
continue-on-error: true

- name: Deploy to Cloudflare Pages
uses: cloudflare/wrangler-action@v3
with:
apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }}
accountId: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
command: pages deploy . --project-name=mxmap-ch-preview
32 changes: 32 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Create Release

on:
push:
branches: [main]

permissions:
contents: write

jobs:
release:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 0

- name: Determine tag
id: tag
run: |
base="v$(date -u +%Y.%m.%d)"
existing=$(git tag -l "${base}*" | wc -l | tr -d ' ')
if [ "$existing" -eq 0 ]; then
echo "tag=${base}" >> "$GITHUB_OUTPUT"
else
echo "tag=${base}.${existing}" >> "$GITHUB_OUTPUT"
fi

- name: Create release
env:
GH_TOKEN: ${{ github.token }}
run: gh release create "${{ steps.tag.outputs.tag }}" --generate-notes
35 changes: 35 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# CLAUDE.md

MXmap (mxmap.ch) — an automated system that classifies where ~2100 Swiss municipalities host their email by fingerprinting DNS records and network infrastructure. Results are displayed on an interactive Leaflet map.

## Commands

```bash
# Setup
uv sync --group dev

# Run pipeline (two stages, in order)
uv run resolve-domains # Stage 1: resolve municipality domains
uv run classify-providers # Stage 2: classify email providers

# Test
uv run pytest --cov --cov-report=term-missing # 90% coverage threshold enforced
uv run pytest tests/test_probes.py -k test_mx # single test
uv run pytest tests/test_data_validation.py -v # data validation (requires JSON files)

# Lint & format
uv run ruff check src tests
uv run ruff format src tests
```

### Data Files

- `overrides.json` — manual classification corrections with reasons
- `municipality_domains.json` — intermediate output from resolve stage
- `data.json` — final classifications served to the frontend


### What not to do

- modify any data files directly without approval (especially `data.json` and `municipality_domains.json`, which are generated by the pipeline)
- run the pipeline directly since it can time out and hides warnings
100 changes: 53 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# MXmap — Email Providers of Swiss Municipalities

[![CI](https://github.com/davidhuser/mxmap/actions/workflows/ci.yml/badge.svg)](https://github.com/davidhuser/mxmap/actions/workflows/ci.yml)
[![Nightly](https://github.com/davidhuser/mxmap/actions/workflows/nightly.yml/badge.svg)](https://github.com/davidhuser/mxmap/actions/workflows/nightly.yml)

An interactive map showing where Swiss municipalities host their email — whether with US hyperscalers (Microsoft, Google, AWS) or Swiss providers or other solutions.

Expand All @@ -11,66 +10,57 @@ An interactive map showing where Swiss municipalities host their email — wheth

## How it works

The data pipeline has three steps:
The data pipeline has two stages:

1. **Preprocess** -- Fetches all ~2100 Swiss municipalities from Wikidata, performs MX and SPF DNS lookups on their official domains, and classifies each municipality's email provider.
2. **Postprocess** -- Applies manual overrides for edge cases, retries DNS for unresolved domains, checks SMTP banners of independent MX hosts for hidden providers, then scrapes websites of still-unclassified municipalities for email addresses.
3. **Validate** -- Cross-validates MX and SPF records, assigns a confidence score (0-100) to each entry, and generates a validation report.
1. **Resolve domains** Fetches all ~2100 Swiss municipalities from Wikidata and the BFS (Swiss Statistics) API, applies manual overrides, scrapes municipal websites for email addresses, guesses domains from municipality names, and verifies candidates with MX lookups. Scores source agreement to pick the best domain. Outputs `municipality_domains.json`.

2. **Classify providers** — For each resolved domain, looks up all MX hosts, pattern-matches them, then runs 10 concurrent probes (SPF, DKIM, DMARC, Autodiscover, CNAME chain, SMTP banner, Tenant, ASN, TXT verification, SPF IP). Aggregates weighted evidence, computes confidence scores (0100). Outputs `data.json` (full) and `data.min.json` (minified for the frontend).

```mermaid
flowchart TD
trigger["Nightly trigger"] --> wikidata

subgraph pre ["1 · Preprocess"]
wikidata[/"Wikidata SPARQL"/] --> fetch["Fetch ~2100 municipalities"]
fetch --> domains["Extract domains +<br/>guess candidates"]
domains --> dns["MX + SPF lookups<br/>(3 resolvers)"]
dns --> spf_resolve["Resolve SPF includes<br/>& redirects"]
spf_resolve --> cname["Follow CNAME chains"]
cname --> asn["ASN lookups<br/>(Team Cymru)"]
asn --> autodiscover["Autodiscover DNS<br/>(CNAME + SRV)"]
autodiscover --> gateway["Detect gateways<br/>(SeppMail, Barracuda,<br/>Proofpoint, Sophos ...)"]
gateway --> classify["Classify providers<br/>MX → CNAME → SPF → Autodiscover → SMTP"]
subgraph resolve ["1 · Resolve domains"]
bfs[/"BFS Statistics API"/] --> merge["Merge ~2100 municipalities"]
wikidata[/"Wikidata SPARQL"/] --> merge
overrides[/"overrides.json"/] --> per_muni
merge --> per_muni["Per municipality"]
per_muni --> scrape["Scrape website for<br/>email addresses"]
per_muni --> guess["Guess domains<br/>from name"]
scrape --> mx_verify["MX lookup to<br/>verify domains"]
guess --> mx_verify
mx_verify --> score["Score source<br/>agreement"]
end

classify --> overrides
score --> domains[("municipality_domains.json")]
domains --> classify_in

subgraph post ["2 · Postprocess"]
overrides["Apply manual overrides<br/>(19 edge cases)"] --> retry["Retry DNS<br/>for unknowns"]
retry --> smtp["SMTP banner check<br/>(EHLO on port 25)"]
smtp --> scrape_urls["Probe municipal websites<br/>(/kontakt, /contact, /impressum …)"]
scrape_urls --> extract["Extract emails<br/>+ decrypt TYPO3 obfuscation"]
extract --> scrape_dns["DNS lookup on<br/>email domains"]
scrape_dns --> reclassify["Reclassify<br/>resolved entries"]
subgraph classify ["2 · Classify providers"]
classify_in["Per unique domain"] --> mx_lookup["MX lookup<br/>(all hosts)"]
mx_lookup --> mx_match["Pattern-match MX<br/>+ detect gateway"]
mx_match --> concurrent["10 concurrent probes<br/>SPF · DKIM · DMARC<br/>Autodiscover · CNAME chain<br/>SMTP · Tenant · ASN<br/>TXT verification · SPF IP"]
concurrent --> aggregate["Aggregate weighted<br/>evidence"]
aggregate --> vote["Primary vote<br/>+ confidence scoring"]
end

reclassify --> data[("data.json")]
data --> score
vote --> data[("data.json + data.min.json")]
data --> frontend["Leaflet map<br/>mxmap.ch"]
```

subgraph val ["3 · Validate"]
score["Confidence scoring · 0–100"] --> gwarn["Flag potential<br/>unknown gateways"]
gwarn --> gate{"Quality gate<br/>avg ≥ 70 · high-conf ≥ 80%"}
end
## Classification system

gate -- "Pass" --> deploy["Commit & deploy to Pages"]
gate -- "Fail" --> issue["Open GitHub issue"]
see [`classifier.py`](src/mail_sovereignty/classifier.py) for the full implementation details, but in summary,
we use a weighted evidence system where each probe contributes signals of varying strength towards different provider classifications.

style trigger fill:#e8f4fd,stroke:#4a90d9,color:#1a5276
style wikidata fill:#e8f4fd,stroke:#4a90d9,color:#1a5276
style data fill:#d5f5e3,stroke:#27ae60,color:#1e8449
style deploy fill:#d5f5e3,stroke:#27ae60,color:#1e8449
style issue fill:#fadbd8,stroke:#e74c3c,color:#922b21
style gate fill:#fdebd0,stroke:#e67e22,color:#935116
```

## Quick start

```bash
uv sync

uv run preprocess
uv run postprocess
uv run validate
# Stage 1: resolve municipality domains
uv run resolve-domains

# Stage 2: classify email providers
uv run classify-providers

# Serve the map locally
python -m http.server
Expand All @@ -81,20 +71,36 @@ python -m http.server
```bash
uv sync --group dev

# Run tests with coverage
# Run tests (90% coverage threshold enforced)
uv run pytest --cov --cov-report=term-missing

# Lint the codebase
# Lint & format
uv run ruff check src tests
uv run ruff format src tests
```


## Related work

* [hpr4379 :: Mapping Municipalities' Digital Dependencies](https://hackerpublicradio.org/eps/hpr4379/index.html)
* if you know of other similar projects, please open an issue or submit a PR to add them here!
* If you know of other similar projects, please open an issue or submit a PR to add them here!

## Forks

* DE https://b42labs.github.io/mxmap/
* NL https://mxmap.nl/
* NO https://kommune-epost-norge.netlify.app/
* BE https://mxmap.be/
* EU https://livenson.github.io/mxmap/
* LV: https://securit.lv/mxmap
* See also the forks of this repository


## Contributing

If you spot a misclassification, please open an issue with the BFS number and the correct provider.
For municipalities where automated detection fails, corrections can be added to the `MANUAL_OVERRIDES` dict in `src/mail_sovereignty/postprocess.py`.
For municipalities where automated detection fails, corrections can be added to [`overrides.json`](overrides.json).

## Licence

[MIT](LICENCE)
25 changes: 25 additions & 0 deletions css/content.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
/* content.css — content page styles (datenschutz, impressum) */
body { color: #222; background: #fff; }

main {
max-width: 720px; margin: 0 auto; padding: 32px 20px 48px;
line-height: 1.7; font-size: 15px;
}
h1 { font-size: 22px; font-weight: 700; color: #1a1a2e; margin-bottom: 24px; }
h2 { font-size: 17px; font-weight: 600; color: #1a1a2e; margin-top: 28px; margin-bottom: 8px; }
p { margin-bottom: 12px; }
ul { margin: 0 0 12px 20px; }
li { margin-bottom: 4px; }
a { color: #2563eb; text-decoration: none; }
a:hover { text-decoration: underline; }
code {
font-family: ui-monospace, monospace; font-size: 13px;
background: #f3f4f6; padding: 1px 5px; border-radius: 3px;
}

footer {
max-width: 720px; margin: 0 auto; padding: 24px 20px;
border-top: 1px solid #e2e4e8; font-size: 13px; color: #888;
}
footer a { color: #888; }
footer a:hover { color: #555; }
Loading
Loading