cern-prototype-proposal/computing.tex at master · DUNE/cern-prototype-proposal · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
\label{computing}
% moved to main doc \section{Computing requirements, data handling and software}

DUNE-PT builds upon the technology and expertise developed in the
process of design and operation of its smaller predecessor, the 35 t detector at Fermilab.
This includes elements of front-end electronics, data acquisition, run controls and related systems. We also expect that for the most part, Monte Carlo studies necessary to support this program will be conducted using software evolved from current (2015) tools. Likewise,
event reconstruction software and analysis tools will rely on the evolving software tools developed for DUNE.

The volume of the recorded data will depend on the number of events to be collected in each measurement,
as specified in the run plan (see Table~\ref{tab:RunPlan}).  Cosmic ray muons have a very large impact on the data volume due to the large detector dimensions and surface operation.

It is optimal to first stage the data collected from DUNE-PT on disk at CERN and then save it to tape also at CERN,
while simultaneously performing replication to data centers in the US. For the latter, Fermilab will be the primary site, with additional data centers at Brookhaven National Laboratory (BNL) and the National Energy Research Scientific Computing Center (NERSC) facility as useful additional locations for better redundancy and more efficient access to the data from a greater number of locations.


%\subsubsection{Cosmic ray muons and readout window}
%\label{readout_windows}
\subsection{Event size estimate and data volume}

The data volume will be dominated by the TPC data. Even though the photon detector as well as other elements of the experimental apparatus (muon counters, trigger systems) contribute to the data stream their contributions to the data volume are expected to be sufficiently small such that realistic data volume estimates can be obtained from the TPC event data sizes alone.

Event sizes can be estimated from "first principles" under the assumption of some event topology and noting the significant amount of cosmic contamination at sea level.  As was shown in Sec.~\ref{calibration}, there is expected to be on average $\sim$68 cosmic muon track segments per readout window and on top of the actual beam event.  Given that a 4 GeV Minimum Ionizing Particle (MIP) will produce on average 80~kbyte of data, the resulting number of particles in a readout window will result in the readout of 6~MB per beam event.
%This assumes that you have 80~kbyte per 4 GeV mip and that cosmic muons at the surface have an averagae enegy of 4 GeV
%Multiply by (68 muons + 1 beam particle) we get 5.5 MB
In addition to the data associated with the individual particles in the event, channel overhead information needs to be accounted for.   Experience from the 35 t detector at Fermilab indicates that with zero suppression the event overhead amounts to 6~kB per channel for a readout of three drift windows.
%
%This is based on 600kB zero suppressed readout of 3 drift windows in 35 t for 2048 channels.  There are 15,360 channels inn the CERN prototype, corresponding to the proposed TPC which consists of 6 APAs with 2560 channels each.
%
Since there are 15,360 channels in DUNE-PT, the expected overhead corresponds to 92~MB per event.  The resulting total event size from overhead and charge from individual particles is 100~MB.
%Background tracks due to cosmic ray particles must be properly identified and accounted for, in order to ensure high quality of the measurements and subsequent detector characterization. Since overlay of cosmic ray muons over beam events is stochastic in nature, the optimal way to achieve this is by recording signals which were produced ``just before'' and ``just after'' the arrival of the test particle from the beam line.  It will be possible because the design of the DAQ contains buffer memory that can be accessed after the trigger decision is made.  This technique will enable us to record and reconstruct either partial or complete background tracks present in the ``main'' event.


%\subsection{Event size estimate and data volume}

%The data volume will be dominated by the TPC data. Even though the photon detector as well as other elements of the experimental apparatus (muon counters, trigger systems) contribute to the data stream their contributions to the data volume are expected to be sufficiently small such that realistic data volume estimates can be obtained from the TPC event data sizes alone.

%Event sizes can be estimated from "first principles" under the assumption of some event topology.  Based on the track range and the number of needed samples to capture the track, a given drift velocity and sample rate a generous over-estimate is that a 1 GeV MIP needs about 20~kbyte and a 5~GeV MIP needs no more than 100~kbyte.  Showering events will require less data for the same energy deposition due to some portion of the activity overlapping in the same voxels.  The estimate assumes that all particles are minimum ionizing so this is another source of over-estimation.  The estimate is based on signals above the threshold of zero-suppression and neglects radioactivity or assumes that signals from radioactivity (predominantly from $^{39}$Ar) are below the zero-suppression threshold.

%This basic estimate serves to set the scale but does neglect channel overhead information that may be useful in interpreting the saved data.  Considering that the beam event readout window will contain $\sim$45 muons with an average energy of 4 GeV, a signal beam event is expected to result in 3.7 MB data record.

%In the LArSoft framework which is presently being used for the 35t detector data are zero suppressed (ZS).  Our assumption is that the nominal readout policy for the bulk of the data will be to use ZS in order to follow the plan for the DUNE FD.  However, even for ZS data LArSoft saves 2 bytes for every channel even if that channel is ZS'ed away.

%The data event size can be calculated as
%\begin{equation}
%  Event size = (\#channels) \times (clock\, rate) \times (readout \, time) \times (sample \, size)
%\end{equation}
%where the $\# channels$ equals 15,360 channels, corresponding to the proposed TPC which consists of 6 APAs with 2560 channels each, the $clock \, rate$ is 2~MHz, the $readout \, time$ is assumed to be 3 $ \times $ 2.25 ms = 6.75 ms corresponding to three drift times and the $sample \, size$ is 2~bytes.
%
%{\color{red} This results in an event size of $\sim$ 400~Mbyte.\\
%HOW WERE THE 2.5 MB CALCULATED ??! EMAIL FROM BRETT IN RESPONSE TO QIUGUANG AND DATED MAY 5TH INDICATED THAT 2 BYTES PER CHANNEL WERE %ASSUMED ?!\\
%FOR NOW WILL MOVE FORWARD WITH 2.5 MB\\}
%
%
%Based on these assumptions which leave room for optimization
%we arrive at an event size of 2.5 MB/event which is ZS but uncompressed.
%With compression event sizes are expected to reduce to 0.1MB/event (ZS + compressed).
%
%Since each sample is 16 bit (or 12 bit in more recent design), we arrive to the limit of approximately 20MB per single charged track.
%For this class of events, the amount of data will scale roughly linearly with the length of the track, i.e. in cases when a track is
%stopped or leaves the sensitive volume there will be less data.
%
%Further, in most cases the data will be zero-suppressed by the front-end
%electronics (e.g signals below a certain threshold
%will not be included into the outgoing data stream).
%The exact data reduction factor will depend on a variety of factors (cf. threshold, which is yet to be chosen), but as a rule of
%thumb it's an order of magnitude. \textit{We conclude therefore the events will typically be a few megabytes in size}.
%The estimate is supported by Monte Carlo studies.


%In view of the factors due to the cosmic ray muons presented above, the actual beam particle event data will represent only a fraction ofthe total volume being read out.
%As a concrete example, for an incident
%electron of 4GeV/c momentum MC calculations indicate an average event size of $\sim$2MB, after zero-suppression.
%This is less than 1\% of the estimate quoted above.

The total particle statistics in the preliminary run plan (Table~\ref{tab:RunPlan}) is approximately 5M events in total.  Taking into account the data load per event, this leads to the estimate of $\sim$500 TB of nominal data volume to be collected in this experiment.

In summary, we expect that tape storage of $O(1 $PB$)$ size will be required and a somewhat more modest disk space for raw data staging at CERN, for replication purposes.  We envisage storing the primary copy of raw data at CERN, with replicas at additional locations.
%
Processed and Monte Carlo data placement will require additional resources that are addressed in Section \ref{dataprocess}.


\subsubsection{Data transmission and distribution}
Moving data to remote locations outside of CERN is subject to a number of requirements that include
automation, monitoring and error checking and recovery.

A number of candidate systems satisfy these requirements and we would like to explore CERN based options and take advantage of
local know-how. An alternative for which we have expertise and experience is Spade, which was first used in IceCube~\cite{spade_icecube} and then enhanced and successfully utilized in the Daya Bay experiment~\cite{spade_dayabay}.


\subsection{Databases}
Databases will be required to store Run Logs, Slow Control records and detector conditions, as well as (offline) calibration information.

Most database servers will need to be local to the experiment (i.e. at CERN) in order to reduce latency, guarantee reliability and minimize
downtime due to network outages. A replication mechanism is foreseen to make data readily available at the US and other sites.
The volume of data stored in these databases is expected to be modest and of the order of 100~GB.
%Talk to Jon Paley who is in charge of the 35t DB.  He did not have a size for DUNE, but he did have a number for NOvA.  He suggested that the 40kton DUNE will have the same number of channels as NOvA and the NOvA DB uses 200 GB per year.  We should use significantly less.  100 GB is signficantly less than total requested disk space.


\subsection{Computing and software}

%\subsubsection{Distributed Processing}
%\label{distr_proc}

Fermilab provides the bulk of computational power to DUNE via Fermigrid and other facilities.
We plan to leverage these resources to process the data coming from the DUNE-PT and beam test.

One of the principal goals will be quick validation of the data collected in each measurement, in
order to be able to make adjustments during the run as necessary.
This is common practice in other experiments which have "express streams" to assess data quality~\cite{atlas_express}.


Given that tracking, reconstruction and other algorithms are in a stage of development with significant improvements
and optimizations expected, the required scale of CPU power needed to process the data are rough estimates.
The estimates we have at this point range from 10 to 100 seconds required by a typical
CPU to reconstruct a single event.
This means that utilizing a few thousand cores through Grid facilities, it will be possible to ensure timely processing of these data.

To ensure adequate capacity, we envisage a distributed computing model where Grid resources are utilized in addition to Fermilab's computing resources.
As an example, we have had good experience working with the Open Science Grid Consortium.


\subsubsection{Data processing}
\label{dataprocess}

In addition to the raw data preparations being made for offline data handling, processing and storage.
The offline data can be classified as follows:
\begin{itemize}
\item Monte Carlo data, which will contain multiple event samples to cover various event types and other conditions during the measurements with DUNE-PT
\item Data derived from Monte Carlo events, and produced with a variety of tracking and pattern recognition algorithms in order to create a basis for the detector characterization
\item Intermediate calibration files, derived from calibration data
\item Processed experimental data, which will likely exist in several parallel branches corresponding to different reconstruction algorithms being applied, with the purpose of evaluating the performance of the different algorithms.
\end{itemize}

In the latter, there will likely be more than one processing step, thus multiplying data volume.

The derived data will at most contain a small fraction of the raw data in order to keep the data manageable.
Hence the size of the processed data will likely be significantly smaller than the input (the raw data).
Given the consideration presented above, we will plan for
$\sim$ $O($1 PB$)$ of tape storage to keep the processed data.
For efficient processing, disk storage will be necessary
to stage a considerable portion of both raw data (inputs) and one or a few steps in processing (outputs).

Extrapolating from our previous experience running Monte Carlo for the former LBNE Far Detector, we estimate that we will need a few hundred TB of continuously available
disk space. In summary, we expect the need for a few~PB of disk storage at Fermilab to ensure optimal data availability and
processing efficiency.

\subsubsection{Data distribution}
We foresee that data analysis (both experimental data and Monte Carlo) will be performed by collaborators residing in many
institutions and geographically dispersed. In our
estimate above, we mostly outlined storage space requirements for major data centers like CERN and Fermilab. When it comes to making these data available to collaborators, we will utilize a combination of the following:
\begin{itemize}
\item Managed replication of data in bulk, performed with tools like Spade. Copies will be made according to wishes and capabilities of participating institutions.
\item Network-centric federated storage, based on XRootD. This allows for agile, just-in-time delivery of data to worker nodes and workstations over the network. This
technology has been evolving rapidly in the past few years, and solutions have been found to mitigate performance drops due to remote data access, by implementing caching and other techniques.
\end{itemize}

In order to act on the latter item, we plan to implement a global XRootD redirector, which will make it possible to transparently access data from anywhere.
A concrete technical feature of storage at Fermilab is the dCache network which has substantial capacity and can be leveraged
for the needs of the DUNE-PT data analysis. This dCache instance is equipped with a XRootD ``door'' which makes it accessible to the outside world, subject
to proper configuration, authentication and authorization.


Copies for a significant portion of raw and derived data are planed to be hosted at NERSC and also at Brookhaven National Laboratory.
These two institutions have substantial expertise  in the field of data handling and processing at scale and will serve as ``hubs'' for data archival and distribution.


\subsubsection{Software infrastructure}

The DUNE-PT effort will benefit from utilizing simulation toolkits, tracking and other reconstruction
that have and continue to be developed for DUNE, the 35 t detector and the short baseline program at Fermilab as well as the
neutrino platform development efforts and in particular the WA105 experiment.

The software tools will need to be portable, well maintained and validated. To ensure that this happens,
we plan to establish close cooperation among participating laboratories and other research institutions.


%\end{document}