Skip to content

Conversation

@rerpha
Copy link
Contributor

@rerpha rerpha commented Jan 7, 2026

This PR adds a few ADRs to the dev wiki under the data streaming section as we have had a very informative talk with DSG and some things have become much clearer.

I am writing this now for HRPD-X so there are some ways we can cut corners ie. not histogramming and no spectra mapping, but this will likely change in the future to support the other instruments that we'll roll data streaming out to.

Please leave any initial comments here, but we will have a proper meeting to discuss - I'll organise this soon.

Closes ISISComputingGroup/DataStreaming#15
Closes ISISComputingGroup/DataStreaming#12
Closes ISISComputingGroup/DataStreaming#7
Closes ISISComputingGroup/DataStreaming#3

ISISComputingGroup/DataStreaming#26
and ISISComputingGroup/DataStreaming#24 should be done before creating any other tickets - we have some of the answers now, but the actual operation of the topics for each of those tickets will be done by those two processes. We should create tickets at the end of prototyping to flesh them out.

@rerpha rerpha changed the title Data streaming docs 2 Add more data streaming documentation - ADRs, more hardware architecture. Jan 7, 2026
@rerpha rerpha marked this pull request as ready for review January 7, 2026 16:18
@Tom-Willemsen
Copy link
Member

@FreddieAkeroyd @GRyall @danielmaclaren did you have thoughts on this PR? Or do we want a full meeting to talk about it?

@GRyall
Copy link
Member

GRyall commented Jan 14, 2026

reviewing it is on my todo list

Remove mermaid_params configuration from Sphinx.
@rerpha rerpha force-pushed the data_streaming_docs_2 branch from a0e3d30 to 9fd57d7 Compare January 19, 2026 11:52
## Decision

We are not going to support the old-style spectra files or any spectrum mapping/grouping in general

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spectra file could also be used to disable collecting from a noisy detector (using spectrum 0) - is this possible via a different route?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a noisy detector we probably don't want it streamed at all - we probably want to just not map it (before it ever hits kafka)?

Copy link
Member

@FreddieAkeroyd FreddieAkeroyd Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saving spectrum 0 to file was optional, so using spectrum 0 was a workaround for DAE3 to discard data as it would always send data. Is it easy for a scientist to unmap a detector?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be file writer, but there was a third table detector.dat that contained detector angle details, there was similar to a mantid instrument geometry in idea. ISISICP could read detector.dat or a saved mantid workspace to extract detector details to add to a nexus file. Excitations used to adjust these files each cycle post calibration, so just noting that there would ultimately need to be a way for scientists to adjust detector metadata for an experiment.

- We are able to use Linux-centric technologies and tools, without needing to spend large amounts of time inventing workarounds for Windows.
- The OS will be different. Developers will need _some_ understanding of Linux to maintain these servers.
* Mitigation: do as little as possible on the host, ideally limit it to just having a container engine installed via a configuration management tool such as Ansible.
- Data-streaming infrastructure will not be on the NDH/NDX machine with the rest of IBEX. This is fine - EPICS is explicitly designed to run in a distributed way.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to understand a bit more of what will be run on which machine (NDX or linux) and their potential interactions in various failure/restart modes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can add a bit more detail but was planning on running everything on the top level page in docker on a linux machine, probably something like fedora coreos which has docker installed by default and is auto-updating so should be less sysadmin effort to keep it alive and patched.

in terms of failure modes - we'll use health checks etc. to make sure containers don't fall over. we can add monitoring tools to send alerts if they're continually restarting etc.

Vs just running everything on the NDX there is a network link that could fail, but this is going to be the Arruba switch so is fairly unlikely to fail i'd say? If it does fail, the FPGAs probably can't stream anything anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to join in this discussion too if useful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was the NDX – linux interaction behaviour in case of failures/restarts on either end or other issues that might need resolving that i was interested in. At the moment we run all on the NDX and a clean restart is relatively simple.
Rather than linux v just NDX it was a comparison with having only Kafka on linux and all other stuff on NDX. If it is just kafka on linux (and no iocs/other services) then that seemed a potentially simpler setup? Also the option of changing kafka broker to another cluster or central hall service as a backup/recovery option or for testing etc. seemed easier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe - but we still need a Linux machine for the container stuff. I don't think it's a good idea to run things on Windows or the WSL. I think running on a VM on the NDH would be OK but we can't do that with their current specifications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the (first) Linux/Kafka machine be in the local (DAE) rack in the HRPDX?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - i think the streaming software should be in the HRPD-X rack. If we get asked to run Kafka it will probably be in that rack too.

Tom-Willemsen and others added 6 commits January 21, 2026 09:19
Added considerations for Linux server specifications related to data rates, including disk write performance, network interface speeds, and memory requirements.
Expanded on the need for containerized data streaming software due to new detector technology and the limitations of WSL on Windows.
Added considerations for data streaming stack and container configuration.
Updated status to reflect pending discussions with HRPD-X parties.
@rerpha
Copy link
Contributor Author

rerpha commented Jan 21, 2026

have added pros/cons/risks of each approach - is that any better @FreddieAkeroyd ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

6 participants