Commit 6083af1
committed
[prof] in gux_taptamggux.mad counters.h, improve the handling of counter overhead
These are the results
(1) keep overhead
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
[COUNTERS] *** USING RDTSC-BASED TIMERS (do not remove timer overhead) ***
[COUNTERS] PROGRAM TOTAL : 4.4766s
[COUNTERS] Fortran Other ( 0 ) : 0.1202s
[COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0685s
[COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 3.2400s for 1087437 events => throughput is 3.36E+05 events/s
[COUNTERS] Fortran PDFs ( 4 ) : 0.1007s for 32768 events => throughput is 3.25E+05 events/s
[COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1673s for 16384 events => throughput is 9.79E+04 events/s
[COUNTERS] Fortran Reweight ( 6 ) : 0.0521s for 16384 events => throughput is 3.14E+05 events/s
[COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0687s for 16384 events => throughput is 2.38E+05 events/s
[COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1237s for 1087437 events => throughput is 8.79E+06 events/s
[COUNTERS] CudaCpp Initialise ( 11 ) : 0.4728s
[COUNTERS] CudaCpp Finalise ( 12 ) : 0.0269s
[COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s
[COUNTERS] TEST SampleGetX ( 21 ) : 2.3496s for 14136681 events => throughput is 6.02E+06 events/s
[COUNTERS] OVERALL NON-MEs ( 31 ) : 4.4409s
[COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s
CUDACPP_RUNTIME_USECHRONOTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
[COUNTERS] *** USING STD::CHRONO TIMERS (do not remove timer overhead) ***
[COUNTERS] PROGRAM TOTAL : 5.3144s
[COUNTERS] Fortran Other ( 0 ) : 0.1588s
[COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0674s
[COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 4.0191s for 1087437 events => throughput is 2.71E+05 events/s
[COUNTERS] Fortran PDFs ( 4 ) : 0.0996s for 32768 events => throughput is 3.29E+05 events/s
[COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1660s for 16384 events => throughput is 9.87E+04 events/s
[COUNTERS] Fortran Reweight ( 6 ) : 0.0508s for 16384 events => throughput is 3.22E+05 events/s
[COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0704s for 16384 events => throughput is 2.33E+05 events/s
[COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.1482s for 1087437 events => throughput is 7.34E+06 events/s
[COUNTERS] CudaCpp Initialise ( 11 ) : 0.4718s
[COUNTERS] CudaCpp Finalise ( 12 ) : 0.0267s
[COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s
[COUNTERS] TEST SampleGetX ( 21 ) : 2.8646s for 14136681 events => throughput is 4.94E+06 events/s
[COUNTERS] OVERALL NON-MEs ( 31 ) : 5.2787s
[COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s
(2) remove overhead
CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
INFO: COUNTERS overhead : 0.0338s for 1M start/stop cycles
[COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 4.8244s
[COUNTERS] PROGRAM COUNTEROVERHEAD : 0.8905s
-------------------------------------------------------------
[COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) ***
[COUNTERS] PROGRAM TOTAL : 3.9339s
[COUNTERS] Fortran Other ( 0 ) : 0.2954s
[COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0674s
[COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 2.7332s for 1087437 events => throughput is 3.98E+05 events/s
[COUNTERS] Fortran PDFs ( 4 ) : 0.1003s for 32768 events => throughput is 3.27E+05 events/s
[COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1688s for 16384 events => throughput is 9.71E+04 events/s
[COUNTERS] Fortran Reweight ( 6 ) : 0.0507s for 16384 events => throughput is 3.23E+05 events/s
[COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0695s for 16384 events => throughput is 2.36E+05 events/s
[COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.0924s for 1087437 events => throughput is 1.18E+07 events/s
[COUNTERS] CudaCpp Initialise ( 11 ) : 0.4692s
[COUNTERS] CudaCpp Finalise ( 12 ) : 0.0263s
[COUNTERS] CudaCpp MEs ( 19 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s
[COUNTERS] TEST SampleGetX ( 21 ) : 1.8723s for 14136681 events => throughput is 7.55E+06 events/s
[COUNTERS] OVERALL NON-MEs ( 31 ) : 3.8982s
[COUNTERS] OVERALL MEs ( 32 ) : 0.0357s for 16384 events => throughput is 4.59E+05 events/s
CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
INFO: COUNTERS overhead : 0.0637s for 1M start/stop cycles
[COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 5.8826s
[COUNTERS] PROGRAM COUNTEROVERHEAD : 1.6786s
-------------------------------------------------------------
[COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) ***
[COUNTERS] PROGRAM TOTAL : 4.2040s
[COUNTERS] Fortran Other ( 0 ) : 0.4831s
[COUNTERS] Fortran Initialise(I/O) ( 1 ) : 0.0691s
[COUNTERS] Fortran PhaseSpaceSampling ( 3 ) : 2.9924s for 1087437 events => throughput is 3.63E+05 events/s
[COUNTERS] Fortran PDFs ( 4 ) : 0.0983s for 32768 events => throughput is 3.33E+05 events/s
[COUNTERS] Fortran UpdateScaleCouplings ( 5 ) : 0.1669s for 16384 events => throughput is 9.81E+04 events/s
[COUNTERS] Fortran Reweight ( 6 ) : 0.0506s for 16384 events => throughput is 3.24E+05 events/s
[COUNTERS] Fortran Unweight(LHE-I/O) ( 7 ) : 0.0676s for 16384 events => throughput is 2.42E+05 events/s
[COUNTERS] Fortran SamplePutPoint ( 8 ) : 0.0698s for 1087437 events => throughput is 1.56E+07 events/s
[COUNTERS] CudaCpp Initialise ( 11 ) : 0.4712s
[COUNTERS] CudaCpp Finalise ( 12 ) : 0.0267s
[COUNTERS] CudaCpp MEs ( 19 ) : 0.0350s for 16384 events => throughput is 4.68E+05 events/s
[COUNTERS] TEST SampleGetX ( 21 ) : 1.9227s for 14136681 events => throughput is 7.35E+06 events/s
[COUNTERS] OVERALL NON-MEs ( 31 ) : 4.1690s
[COUNTERS] OVERALL MEs ( 32 ) : 0.0350s for 16384 events => throughput is 4.68E+05 events/s
(3) remove overhead, disable individual timers (so here the overhead is 0)
CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
INFO: COUNTERS overhead : 0.0333s for 1M start/stop cycles
[COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 4.1897s
[COUNTERS] PROGRAM COUNTEROVERHEAD : 0.3330s
-------------------------------------------------------------
[COUNTERS] *** USING RDTSC-BASED TIMERS (remove timer overhead) ***
[COUNTERS] PROGRAM TOTAL : 3.8567s
CUDACPP_RUNTIME_USECHRONOTIMERS=1 CUDACPP_RUNTIME_REMOVECOUNTEROVERHEAD=1 CUDACPP_RUNTIME_DISABLECALLTIMERS=1 \
./build.cuda_d_inl0_hrd0/madevent_cuda < /tmp/avalassi/input_ggtt_x1_cudacpp
INFO: COUNTERS overhead : 0.0659s for 1M start/stop cycles
[COUNTERS] PROGRAM TOTAL+COUNTEROVERHEAD : 4.5119s
[COUNTERS] PROGRAM COUNTEROVERHEAD : 0.6594s
-------------------------------------------------------------
[COUNTERS] *** USING STD::CHRONO TIMERS (remove timer overhead) ***
[COUNTERS] PROGRAM TOTAL : 3.8525s1 parent 3577a55 commit 6083af1
1 file changed
Lines changed: 12 additions & 10 deletions
Lines changed: 12 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
171 | 171 | | |
172 | 172 | | |
173 | 173 | | |
174 | | - | |
175 | | - | |
176 | 174 | | |
177 | 175 | | |
178 | 176 | | |
179 | 177 | | |
180 | | - | |
| 178 | + | |
181 | 179 | | |
182 | 180 | | |
183 | 181 | | |
| |||
193 | 191 | | |
194 | 192 | | |
195 | 193 | | |
196 | | - | |
| 194 | + | |
197 | 195 | | |
198 | 196 | | |
199 | 197 | | |
| |||
202 | 200 | | |
203 | 201 | | |
204 | 202 | | |
| 203 | + | |
| 204 | + | |
205 | 205 | | |
206 | 206 | | |
207 | 207 | | |
| |||
216 | 216 | | |
217 | 217 | | |
218 | 218 | | |
219 | | - | |
220 | | - | |
221 | | - | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
222 | 222 | | |
| 223 | + | |
| 224 | + | |
223 | 225 | | |
224 | 226 | | |
225 | 227 | | |
| |||
235 | 237 | | |
236 | 238 | | |
237 | 239 | | |
238 | | - | |
| 240 | + | |
239 | 241 | | |
240 | 242 | | |
241 | 243 | | |
| |||
259 | 261 | | |
260 | 262 | | |
261 | 263 | | |
262 | | - | |
| 264 | + | |
263 | 265 | | |
264 | 266 | | |
265 | 267 | | |
| |||
280 | 282 | | |
281 | 283 | | |
282 | 284 | | |
283 | | - | |
| 285 | + | |
284 | 286 | | |
285 | 287 | | |
286 | 288 | | |
| |||
0 commit comments