Add models for labels and assign first labels to buckets by ksy36 · Pull Request #219 · MozillaSecurity/WebCompatManager

ksy36 · 2026-05-26T01:08:28Z

This PR adds auto labeling based on 2 lists we have at the moment (worldcup2026 and nsfw). Labeling command is run when bucket is created as well as on each domain list update.

jgraham · 2026-05-27T13:25:59Z

+class Label(models.Model):
+    name: models.CharField = models.CharField(max_length=50, unique=True)
+    description: models.TextField = models.TextField(blank=True, default="")
+    domain_source: models.OneToOneField = models.OneToOneField(


So is the idea that this would be NULL if we want to add arbitrary labels that aren't defined by a domain list?

Yeah, it would be NULL for manually created labels. Added it mainly to have a connection if we ever wanted to bulk delete or merge labels based on their sources

jgraham · 2026-05-27T13:44:47Z



+@receiver(post_save, sender=Bucket)
+def Bucket_save(sender, instance, created, **kwargs):


This means that every time we save any bucket we rerun labeling for that bucket irrespective of whether the domain property on the bucket actually changed (which AIUI is the only case at present where the automatic labels might change). That doesn't necessarily seem bad if we planned to add the ability to auto-label on more than just the domain, but the examples I can think of also depend on the reports in the bucket (e.g. if we wanted to label something "Android" if 80% of the reports were on Android or "Japan" if some reports came from Japan), and I don't think we'd go through this codepath if we were just adding entries to a bucket rather than updating the bucket properties?

The idea I had is to run this only on bucket creation (it has a check below for if not created to return early). I guess we don't really edit bucket domain at the moment, so I didn't add that for every save/change.

So with this PR there are two ways a bucket can receive labels:

on creation (if a given source list exists already)

on domain list creation / update (it runs call_command("label_buckets", source_name=name) )

e.g. if we wanted to label something "Android" if 80% of the reports were on Android or "Japan" if some reports came from Japan

This can probably be run in a similar manner, on bucket creation and a scheduled run a few times a day on a set of rules that we define in some config?

I think for rules that don't depend on fixed properties of the bucket they should probably be applied at the point that the bucket is updated. Making everything async makes it hard to reason about the system.

jgraham · 2026-05-27T16:37:49Z

+        source_names = get_label_source_names(source_name)
+
+        if bucket_id is not None:
+            for mapped_source_name in source_names:


It seems like we could make these queries operate over all the labels at once rather than doing them one at a time (but not a blocker).

jgraham · 2026-05-27T17:10:06Z

    # store the domain outside the signature only if the signature includes
    # a non-regex domain symptom and no other symptoms (for quick exclusion)
    domain: models.CharField = models.CharField(max_length=255, null=True)
+    domain_normalized: models.CharField = models.CharField(


On the BigQuery side we never started to store this, instead we just have a routine that knows how to make the normalized comparisons, which would make it easier to change things in the future. Storing a normalized domain is probably fine, but it does end up with something that's basically part of the business logic directly in the data layer.

yeah, I couldn’t come up with a clean way to do the comparison-time normalization across both sqlite and MySQL without making the join query pretty awkward, so decided to store it

jgraham · 2026-06-04T11:27:27Z



+@receiver(post_save, sender=Bucket)
+def Bucket_save(sender, instance, created, **kwargs):


I think for rules that don't depend on fixed properties of the bucket they should probably be applied at the point that the bucket is updated. Making everything async makes it hard to reason about the system.

ksy36 force-pushed the auto_labeling branch from d34f792 to a90fefc Compare May 26, 2026 04:04

Add models for labels and assign first labels to buckets

ef3a7c0

ksy36 force-pushed the auto_labeling branch from a90fefc to ef3a7c0 Compare May 26, 2026 04:07

ksy36 marked this pull request as ready for review May 26, 2026 04:09

ksy36 requested a review from jgraham May 26, 2026 04:12

jgraham requested changes May 27, 2026

View reviewed changes

Code review changes

d48fa05

ksy36 requested a review from jgraham May 30, 2026 01:42

jgraham approved these changes Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add models for labels and assign first labels to buckets#219

Add models for labels and assign first labels to buckets#219
ksy36 wants to merge 2 commits into
mainfrom
auto_labeling

ksy36 commented May 26, 2026 •

edited

Loading

Uh oh!

jgraham May 27, 2026

Uh oh!

ksy36 May 29, 2026 •

edited

Loading

Uh oh!

jgraham May 27, 2026

Uh oh!

ksy36 May 29, 2026

Uh oh!

jgraham Jun 4, 2026

Uh oh!

Uh oh!

Uh oh!

jgraham May 27, 2026

Uh oh!

jgraham May 27, 2026

Uh oh!

ksy36 May 29, 2026 •

edited

Loading

Uh oh!

jgraham Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		@receiver(post_save, sender=Bucket)
		def Bucket_save(sender, instance, created, **kwargs):

Conversation

ksy36 commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgraham May 27, 2026

Choose a reason for hiding this comment

Uh oh!

ksy36 May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgraham May 27, 2026

Choose a reason for hiding this comment

Uh oh!

ksy36 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

jgraham Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jgraham May 27, 2026

Choose a reason for hiding this comment

Uh oh!

jgraham May 27, 2026

Choose a reason for hiding this comment

Uh oh!

ksy36 May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgraham Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ksy36 commented May 26, 2026 •

edited

Loading

ksy36 May 29, 2026 •

edited

Loading

ksy36 May 29, 2026 •

edited

Loading