Skip to content

Zeeschuimer auto map w/ CI#593

Open
dale-wahl wants to merge 12 commits into
masterfrom
zeeschuimer_auto_map
Open

Zeeschuimer auto map w/ CI#593
dale-wahl wants to merge 12 commits into
masterfrom
zeeschuimer_auto_map

Conversation

@dale-wahl
Copy link
Copy Markdown
Member

This is additive only. And I need to push the CI to master to properly test.

  • new GitHub Actions workflow, zeeschuimer_map_item_sync.yml, to automate the translation of map_item functions from Python to JavaScript for Zeeschuimer datasources. The workflow (hopefully) detects changes in relevant files (also hopefully and should fast exit if relevant files have no changes), translates their map_item function using an LLM (this has been tested and improved a few times over), and opens a draft pull request against the Zeeschuimer repository.
  • Relies on helper script map_item_converter.py. This was... fun and interesting to iterate on.
  • On the model: I expect we would have even better success with a larger model, but I think qwen2.5-coder:14b is the best given our setup. From what I understand qwen3 may actually be less good at this as it was designed for agentic work flows with bug reports. We could test it as well, if we can find one that will run on our system (the larger qwen2.5 ended up half on CPU).

There is a Zeeschuimer branch as well with some changes that go in-sync with this.

@dale-wahl
Copy link
Copy Markdown
Member Author

dale-wahl commented May 7, 2026

Notes for me:

  • workflow_dispatch triggers work for either specific datasource files or all of them. yay.
  • PRs per module; should kill any existing job if it is still running and overwrite a PR for the same module. Useful.
  • ran into some reoccurring issues.
    • added don't do's and do do's to prompt and qwen sometimes listens.
    • consistently has trouble with regex (both streaming and JSON structured output fails to escape certain characters; added warning for review)
    • consistently has trouble with 'substring' in str vs. str.includes('substring'); added warning
    • linting... forced linting will erroneously fix the regex issue. could add prettier --check just to note unparseable output in PR that are otherwise missed.
  • worth pursuing: branch https://github.com/digitalmethodsinitiative/4cat/tree/zeeschuimer_auto_map_reduce_false_pos
    • right now, we diff check whole files not just map_item. Probably fine since datasources are not likely to be modified. But if there is a diff, we get a PR. I do check that the new JS map_item is different, but with an LLM, even if it has the same functionality it is liable to be different somehow.

Ok. That means, this can be merged. When it is, we need to run workflow_dispatch for all datasources and test their map_item functions. Should all be in one PR (technically did not test that since splitting to PR per module, but it worked before). Was dependent on digitalmethodsinitiative/zeeschuimer#79 which I just merged.

@dale-wahl dale-wahl marked this pull request as ready for review May 7, 2026 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant