-
Notifications
You must be signed in to change notification settings - Fork 2
Add more data streaming documentation - ADRs, more hardware architecture. #135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@FreddieAkeroyd @GRyall @danielmaclaren did you have thoughts on this PR? Or do we want a full meeting to talk about it? |
|
reviewing it is on my todo list |
Remove mermaid_params configuration from Sphinx.
a0e3d30 to
9fd57d7
Compare
| ## Decision | ||
|
|
||
| We are not going to support the old-style spectra files or any spectrum mapping/grouping in general | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spectra file could also be used to disable collecting from a noisy detector (using spectrum 0) - is this possible via a different route?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a noisy detector we probably don't want it streamed at all - we probably want to just not map it (before it ever hits kafka)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
saving spectrum 0 to file was optional, so using spectrum 0 was a workaround for DAE3 to discard data as it would always send data. Is it easy for a scientist to unmap a detector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be file writer, but there was a third table detector.dat that contained detector angle details, there was similar to a mantid instrument geometry in idea. ISISICP could read detector.dat or a saved mantid workspace to extract detector details to add to a nexus file. Excitations used to adjust these files each cycle post calibration, so just noting that there would ultimately need to be a way for scientists to adjust detector metadata for an experiment.
| - We are able to use Linux-centric technologies and tools, without needing to spend large amounts of time inventing workarounds for Windows. | ||
| - The OS will be different. Developers will need _some_ understanding of Linux to maintain these servers. | ||
| * Mitigation: do as little as possible on the host, ideally limit it to just having a container engine installed via a configuration management tool such as Ansible. | ||
| - Data-streaming infrastructure will not be on the NDH/NDX machine with the rest of IBEX. This is fine - EPICS is explicitly designed to run in a distributed way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to understand a bit more of what will be run on which machine (NDX or linux) and their potential interactions in various failure/restart modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can add a bit more detail but was planning on running everything on the top level page in docker on a linux machine, probably something like fedora coreos which has docker installed by default and is auto-updating so should be less sysadmin effort to keep it alive and patched.
in terms of failure modes - we'll use health checks etc. to make sure containers don't fall over. we can add monitoring tools to send alerts if they're continually restarting etc.
Vs just running everything on the NDX there is a network link that could fail, but this is going to be the Arruba switch so is fairly unlikely to fail i'd say? If it does fail, the FPGAs probably can't stream anything anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to join in this discussion too if useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was the NDX – linux interaction behaviour in case of failures/restarts on either end or other issues that might need resolving that i was interested in. At the moment we run all on the NDX and a clean restart is relatively simple.
Rather than linux v just NDX it was a comparison with having only Kafka on linux and all other stuff on NDX. If it is just kafka on linux (and no iocs/other services) then that seemed a potentially simpler setup? Also the option of changing kafka broker to another cluster or central hall service as a backup/recovery option or for testing etc. seemed easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe - but we still need a Linux machine for the container stuff. I don't think it's a good idea to run things on Windows or the WSL. I think running on a VM on the NDH would be OK but we can't do that with their current specifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the (first) Linux/Kafka machine be in the local (DAE) rack in the HRPDX?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - i think the streaming software should be in the HRPD-X rack. If we get asked to run Kafka it will probably be in that rack too.
Added considerations for Linux server specifications related to data rates, including disk write performance, network interface speeds, and memory requirements.
Expanded on the need for containerized data streaming software due to new detector technology and the limitations of WSL on Windows.
Added considerations for data streaming stack and container configuration.
Updated status to reflect pending discussions with HRPD-X parties.
|
have added pros/cons/risks of each approach - is that any better @FreddieAkeroyd ? |
This PR adds a few ADRs to the dev wiki under the data streaming section as we have had a very informative talk with DSG and some things have become much clearer.
I am writing this now for HRPD-X so there are some ways we can cut corners ie. not histogramming and no spectra mapping, but this will likely change in the future to support the other instruments that we'll roll data streaming out to.
Please leave any initial comments here, but we will have a proper meeting to discuss - I'll organise this soon.
Closes ISISComputingGroup/DataStreaming#15
Closes ISISComputingGroup/DataStreaming#12
Closes ISISComputingGroup/DataStreaming#7
Closes ISISComputingGroup/DataStreaming#3
ISISComputingGroup/DataStreaming#26
and ISISComputingGroup/DataStreaming#24 should be done before creating any other tickets - we have some of the answers now, but the actual operation of the topics for each of those tickets will be done by those two processes. We should create tickets at the end of prototyping to flesh them out.