Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 37 additions & 5 deletions platform/linux-generic/odp_queue_basic.c
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While testing this I noticed that now we could have separate optimized implementations for single and multi event enq/deq functions (e.g. odp_queue_enq(), odp_queue_enq_multi()). Currently, the single event variants e.g. plain_queue_enq()) share the same code and I measured some performance penalty from these changes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this makes sense.

Original file line number Diff line number Diff line change
Expand Up @@ -515,20 +515,52 @@ static inline void event_index_from_hdr(uint32_t event_index[],
_odp_event_hdr_t *event_hdr[], int num)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since event_index_from_hdr() and event_index_to_hdr() functions are identical for basic and spsc queues, their code could be moved to platform/linux-generic/include/odp_event_internal.h. The convention has been to prefix implementation internal functions with _odp, so _odp_event_index_from_hdr() and _odp_event_index_to_hdr().

{
int i;
int idx = 0;

for (i = 0; i < (num & ~0x3); i += 4, idx += 4) {
event_index[i] = event_hdr[idx]->index.u32;
event_index[i+1] = event_hdr[idx+1]->index.u32;
event_index[i+2] = event_hdr[idx+2]->index.u32;
event_index[i+3] = event_hdr[idx+3]->index.u32;
}
switch (num & 0x3) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it is going to increase code size and add a few new branches (although I did not look at any generated assembly). It is not clear to me that the change is a net positive in actual applications and not just in microbenchmarks that presumably always enqueue long bursts. I believe some ODP apps have high instruction cache pressure.

Another possible optimization might be to get rid of the whole header-pointer-to-index conversion and just store the pointers in rings (but that would then use more cache lines in the rings, so I do not know if that is good or not).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, cache pressure is something we need to understand. Is there any benchmarking application (not a micro benchmark like we used here) that can be executed to confirm the behaviour?

case 3:
event_index[i++] = event_hdr[idx++]->index.u32;
__attribute__((fallthrough));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Project convention has been to use simply /* Fallthrough */ comments. i386 build seems to have issues with these attributes.

case 2:
event_index[i++] = event_hdr[idx++]->index.u32;
__attribute__((fallthrough));
case 1:
event_index[i++] = event_hdr[idx++]->index.u32;
}

for (i = 0; i < num; i++)
event_index[i] = event_hdr[i]->index.u32;
}

static inline void event_index_to_hdr(_odp_event_hdr_t *event_hdr[],
uint32_t event_index[], int num)
{
int i;
int idx = 0;

for (i = 0; i < num; i++) {
event_hdr[i] = _odp_event_hdr_from_index_u32(event_index[i]);
odp_prefetch(event_hdr[i]);
for (i = 0; i < (num & ~0x3); idx += 4, i += 4) {
event_hdr[i] = _odp_event_hdr_from_index_u32(event_index[idx]);
event_hdr[i+1] = _odp_event_hdr_from_index_u32(event_index[idx+1]);
event_hdr[i+2] = _odp_event_hdr_from_index_u32(event_index[idx+2]);
event_hdr[i+3] = _odp_event_hdr_from_index_u32(event_index[idx+3]);
}

switch (num & 0x3) {
case 3:
event_hdr[i++] = _odp_event_hdr_from_index_u32(event_index[idx++]);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary newline.

__attribute__((fallthrough));
case 2:
event_hdr[i++] = _odp_event_hdr_from_index_u32(event_index[idx++]);
__attribute__((fallthrough));
case 1:
event_hdr[i++] = _odp_event_hdr_from_index_u32(event_index[idx++]);
}

}

static inline int _plain_queue_enq_multi(odp_queue_t handle,
Expand Down
41 changes: 36 additions & 5 deletions platform/linux-generic/odp_queue_spsc.c
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,51 @@
_odp_event_hdr_t *event_hdr[], int num)
{
int i;
int idx = 0;

for (i = 0; i < num; i++)
event_index[i] = event_hdr[i]->index.u32;
for (i = 0; i < (num & ~0x3); i += 4, idx += 4) {
event_index[i] = event_hdr[idx]->index.u32;
event_index[i+1] = event_hdr[idx+1]->index.u32;
event_index[i+2] = event_hdr[idx+2]->index.u32;
event_index[i+3] = event_hdr[idx+3]->index.u32;
}
switch (num & 0x3) {
case 3:
event_index[i++] = event_hdr[idx++]->index.u32;
__attribute__((fallthrough));
case 2:
event_index[i++] = event_hdr[idx++]->index.u32;
__attribute__((fallthrough));
case 1:
event_index[i++] = event_hdr[idx++]->index.u32;
}
}

static inline void event_index_to_hdr(_odp_event_hdr_t *event_hdr[],
uint32_t event_index[], int num)
{
int i;
int idx = 0;

for (i = 0; i < (num & ~0x3); idx += 4, i += 4) {
event_hdr[i] = _odp_event_hdr_from_index_u32(event_index[idx]);
event_hdr[i+1] = _odp_event_hdr_from_index_u32(event_index[idx+1]);
event_hdr[i+2] = _odp_event_hdr_from_index_u32(event_index[idx+2]);
event_hdr[i+3] = _odp_event_hdr_from_index_u32(event_index[idx+3]);
}

switch (num & 0x3) {
case 3:
event_hdr[i++] = _odp_event_hdr_from_index_u32(event_index[idx++]);

for (i = 0; i < num; i++) {
event_hdr[i] = _odp_event_hdr_from_index_u32(event_index[i]);
odp_prefetch(event_hdr[i]);
__attribute__((fallthrough));
case 2:
event_hdr[i++] = _odp_event_hdr_from_index_u32(event_index[idx++]);
__attribute__((fallthrough));
case 1:
event_hdr[i++] = _odp_event_hdr_from_index_u32(event_index[idx++]);
}

Check failure on line 62 in platform/linux-generic/odp_queue_spsc.c

View workflow job for this annotation

GitHub Actions / Checkpatch

ERROR: trailing whitespace
}

static inline int spsc_enq_multi(odp_queue_t handle,
Expand Down
Loading