Move make_identifier function to collect.py by Lotram · Pull Request #1215 · GothenburgBitFactory/bugwarrior

Lotram · 2026-05-19T11:28:02Z

in aggregate_issues, we have both the Issue object and the deserialized data, so we can compute the identifier without relying on a fragile iteration on list of key_lists Also add some more precise typing

ryneeverett · 2026-05-23T03:24:04Z

@@ -0,0 +1,14 @@
+from typing import Any, NamedTuple
+
+TaskwarriorData = dict[str, Any]


I find this name confusing because it is unclear that it is data for a single task. (Everything in the taskwarrior database is "taskwarrior data".)

I had the thought that it might be ideal to align naming with taskchampion but I'm not strongly opinionated. Here are some ideas and rambling thoughts:

Task: Don't all data types hold data? What are we trying to communicate with "data" that Task wouldn't cover? Or perhaps we should differentiate from taskw.Task?

TaskData: Similar to what you have, but taskchampion's data structure is slightly higher level.

TaskMap: The most directly comparable data structure, but named after the rust-specific HashMap.

On the other hand, does such a simple data structure really deserve a custom type? What's the point?

I find this name confusing because it is unclear that it is data for a single task. (Everything in the taskwarrior database is "taskwarrior data".)

Good point, that name is bad.

On the other hand, does such a simple data structure really deserve a custom type? What's the point?
Here is my chain of thoughts:

We should stop passing dicts everywhere, as it's really hard to understand what these dicts are exactly

We should use our own Task object, and that's what to_taskwarrior() should return, but that's a separate issue.

In the meantime, let's have a type alias, so we know where this dict is used (so we can modify all the types all at once, when switching to a real model (pydantic, probably).

But I can remove this type for now.

I agree with your line of reasoning but I also think it should be removed for now until we have a specified data type that actually constrains the value it holds.

ryneeverett · 2026-05-23T03:32:53Z

What is the advantage of centralizing types here rather than in the modules in which they're instantiated? To me this pattern seems to obfuscate the dependency graph between modules and possibly encourage such dependence. (Using a type which is only instantiated in one module is effectively dependence on that module.)

To be honest, the main reason to put it there is laziness, as I was sure it wouldn't create any dependency issue, but I mostly agree with you.

Would you move this code along Issue and Service in services/__init__.py, or in a new file services/types.py ?

It's such a small amount of code that I would just put it in _init__.py.

Part of the reason I was asking is that I've seen this pattern in other code bases and wondered if there was something special about types.py or if that was considered a best practice for some reason. I think your answer confirmed that there is nothing special about it and no reason to break it out unless we feel a module has become too big.

On bigger codebases, I still think it improves readability, like most projects choose to have an exceptions.py. But I have no problem with dropping this for now

Would you move this code along Issue and Service in services/__init__.py, or in a new file services/types.py ?

Is there a reason not to put them in collect.py, the place where they originate and db.py was already importing from?

Good point, I moved it back to collect.py

ryneeverett · 2026-05-23T03:34:43Z

+            record = TaskConstructor(issue).get_data_to_sync()
            yield record


Suggested change

record = TaskConstructor(issue).get_data_to_sync()

yield record

yield TaskConstructor(issue).get_data_to_sync()

Lotram · 2026-05-28T11:48:36Z

Turns out I had a types.py to avoid cyclic dependency issues. I changed two things:

get_service is now defined in services/__init__.py, as it makes more sense than to define it in collect.py
config/__init__.py now only import (and re-export) objects used in the public API. It used to import all files, creating cyclic dependencies much more "easily".

in aggregate_issues, we have both the Issue object and the deserialized data, so we can compute the identifier without relying on a fragile iteration on list of key_lists Also add some more precise typing

config/__init__.py now only imports objects for the public API get_service function is now defined in services/__init__.py

ryneeverett · 2026-05-29T14:07:57Z

Turns out I had a types.py to avoid cyclic dependency issues. I changed two things:

* `get_service` is now defined in `services/__init__.py`, as it makes more sense than to define it in `collect.py`

* `config/__init__.py` now only import (and re-export) objects used in the public API. It used to import all files, creating cyclic dependencies much more "easily".

It isn't clear to me that these changes are desirable. Could they be dropped now that the types are in collect.py?

Lotram · 2026-05-29T15:05:52Z

I'd keep the change on get_service. IMO, there is really no good reason to have it in collect.py. For now, config, as a modules, depends on collect.py, but it should not. If not in services/__init__.py, I think it should be somewhere in config/ (maybe validation.py ?), so that config does not depend on anything.

If we make sure of that (config not depending on anything), then I can revert the change on config/__init__.py, since we're sure that importing all config files can't be an issue for other modules.

ryneeverett · 2026-05-29T16:09:25Z

Keeping get_service in services/__init__.py and reverting the config/__init__.py changes makes sense to me. (Now that I look at get_service from a dependency perspective I see what you mean.)

ryneeverett reviewed May 23, 2026

View reviewed changes

Lotram force-pushed the move-identifier-function branch from e8e23de to 3315a89 Compare May 28, 2026 11:46

Lotram force-pushed the move-identifier-function branch from 3315a89 to b2bb879 Compare May 28, 2026 11:55

Lotram added 3 commits May 29, 2026 09:49

Move make_identifier function to collect.py

b9ecf04

in aggregate_issues, we have both the Issue object and the deserialized data, so we can compute the identifier without relying on a fragile iteration on list of key_lists Also add some more precise typing

Change import dependency graph

2bc9041

config/__init__.py now only imports objects for the public API get_service function is now defined in services/__init__.py

move CollectedIssue to collect.py

56754e9

Lotram force-pushed the move-identifier-function branch from b2bb879 to 56754e9 Compare May 29, 2026 07:53

		@@ -0,0 +1,14 @@
		from typing import Any, NamedTuple

		TaskwarriorData = dict[str, Any]

		record = TaskConstructor(issue).get_data_to_sync()
		yield record

	record = TaskConstructor(issue).get_data_to_sync()
	yield record
	yield TaskConstructor(issue).get_data_to_sync()

Conversation

Lotram commented May 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryneeverett May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Lotram commented May 28, 2026

Uh oh!

ryneeverett commented May 29, 2026

Uh oh!

Lotram commented May 29, 2026

Uh oh!

ryneeverett commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ryneeverett May 27, 2026 •

edited

Loading