|
| 1 | ++++ |
| 2 | +date = '2026-02-25T17:09:56+01:00' |
| 3 | +draft = false |
| 4 | +title = '0% Loops vs 100% Lambdas, TMP and Views: Maximal Inlining' |
| 5 | +tags = ["advanced-level", "performance", "lambdas", "views", "parallelization"] |
| 6 | ++++ |
| 7 | + |
| 8 | + |
| 9 | +On this article, we can see in a mini real-world example how we can get rid of imperative and manual loops that are not at all descriptive, they are difficult to read and maintain, do not help the compiler to inline them and hence, they lack of performance. We will progress with refactoring one step at a time, starting with refactoring using lambdas, then we can advance a bit and expose the lambdas from a custom template function, which handles the internal iteration - a pattern that mimics the logic of modern libraries (and imititates the `views` implementation) - and finally we will run the code with `C++20` and use modern `views` directly. |
| 10 | + |
| 11 | + |
| 12 | +Consider the below example. We have a `NetWorkPacket` and then a `NetworkBuffer` that stores a vector of packets. We would like to filter some of the packets based on - for instance - the encryption or the sourceIP, gather these filtered packets from the buffer and maybe apply some logic on these. |
| 13 | + |
| 14 | + |
| 15 | + |
| 16 | +You can find the full code in the [github repo](https://github.com/konstantd/konstantd.github.io). |
| 17 | + |
| 18 | + |
| 19 | +``` cpp |
| 20 | +struct NetworkPacket { |
| 21 | + |
| 22 | + // Source and Destination |
| 23 | + std::string m_sourceIp; |
| 24 | + std::string m_destinationIp; |
| 25 | + |
| 26 | + // Let's skip the payload and use size of payload for simplicity on the ctor |
| 27 | + size_t m_packetSize; |
| 28 | + |
| 29 | + // Encryption and Priority |
| 30 | + bool m_isEncrypted; |
| 31 | + Priority m_priority; |
| 32 | + |
| 33 | + NetworkPacket(std::string src, std::string dest, |
| 34 | + int size, bool encrypted = false, |
| 35 | + Priority priority = Priority::LOW) |
| 36 | + : |
| 37 | + m_sourceIp(src), m_destinationIp(dest), |
| 38 | + m_packetSize(size), m_isEncrypted(encrypted), |
| 39 | + m_priority(priority) {} |
| 40 | + |
| 41 | + // Move ctor default and noexcept |
| 42 | + NetworkPacket(NetworkPacket&& other) noexcept = default; |
| 43 | + |
| 44 | + // Above line deleted also the copy ctor |
| 45 | + // We need it for the filtered vectors, let's define it |
| 46 | + NetworkPacket(const NetworkPacket& other) = default; |
| 47 | +}; |
| 48 | + |
| 49 | + |
| 50 | +struct NetworkBuffer { |
| 51 | + |
| 52 | + // Container for the Packets |
| 53 | + std::vector<NetworkPacket> m_packetBuffer; |
| 54 | + |
| 55 | + // Forward a packet to the container |
| 56 | + template <typename T> |
| 57 | + void addPacketForward(T&& packet) { |
| 58 | + m_packetBuffer.emplace_back(std::forward<T>(packet)); |
| 59 | + } |
| 60 | + |
| 61 | +}; |
| 62 | +``` |
| 63 | +
|
| 64 | +## Populate the Buffer |
| 65 | +
|
| 66 | +
|
| 67 | +So given the above Buffer of packets, now I am populating it randomly, allocating for 2^17 packets. The random generators are not of interest here but you can find the full code, just note that I keep the seed fixed so we have the same random packets generated every time we run it. |
| 68 | +
|
| 69 | +
|
| 70 | +``` cpp |
| 71 | + // We know the size, let's reserve it to avoid reallocations |
| 72 | + const int N = 1 << 17; |
| 73 | + buffer.m_packetBuffer.reserve(N); |
| 74 | +
|
| 75 | + // Create N random packets in the buffer |
| 76 | + for (int i = 0; i < N; ++i) { |
| 77 | + // Create them as temporaries rvalues |
| 78 | + buffer.addPacketForward(NetworkPacket(getRandomSrc(), |
| 79 | + getRandomDst(), |
| 80 | + getRandomSize(), |
| 81 | + getRandomEncryptionBool(), |
| 82 | + getRandomPriority() |
| 83 | + )); |
| 84 | + } |
| 85 | +``` |
| 86 | + |
| 87 | + |
| 88 | +# Code with Manual for loops |
| 89 | + |
| 90 | + |
| 91 | +And now this is our logic. As we said we are filtering some packets from the buffer and gathering the packets in a new vector. I have 3 filters here, we could also operate on the data - but you get the idea. |
| 92 | + |
| 93 | + |
| 94 | +``` cpp |
| 95 | +// 1. Filter packets by IP "10.0.0.5" source |
| 96 | +std::vector<NetworkPacket> filteredPacketsfromSrc; |
| 97 | +for (const auto& packet : buffer.m_packetBuffer) { |
| 98 | + if (packet.m_sourceIp == "10.0.0.5") { |
| 99 | + filteredPacketsfromSrc.push_back(packet); |
| 100 | + } |
| 101 | +} |
| 102 | + |
| 103 | +// 2. Filter packets that are encrypted with HIGH priority |
| 104 | +std::vector<NetworkPacket> filteredHighPriorEncrypted; |
| 105 | +for (const auto& packet : buffer.m_packetBuffer) { |
| 106 | + if ( (packet.m_isEncrypted) && (packet.m_priority == Priority::HIGH) ) { |
| 107 | + filteredHighPriorEncrypted.push_back(packet); |
| 108 | + } |
| 109 | +} |
| 110 | + |
| 111 | +// 3. Filter packets by IP "6.8.8.8" destination and size > 128 bytes |
| 112 | +std::vector<NetworkPacket> filteredPacketsfromDst_128; |
| 113 | +for (const auto& packet : buffer.m_packetBuffer) { |
| 114 | + if ( (packet.m_destinationIp == "6.8.8.8") && (packet.m_packetSize > 128) ) { |
| 115 | + filteredPacketsfromDst_128.push_back(packet); |
| 116 | + } |
| 117 | +} |
| 118 | +``` |
| 119 | + |
| 120 | + |
| 121 | +The loops are not really showing intention here, imagine we had some hard-coded extra filtering - or some transformations that the logic is hard to be understood. |
| 122 | + |
| 123 | + |
| 124 | +## 1st Improvement - `for_each` is slightly better |
| 125 | + |
| 126 | +As a 1st step, we can replace every for loop with a `std::for_each` and a lambda to gain inlining and moving the overhead to the compilation time. |
| 127 | + |
| 128 | +``` cpp |
| 129 | +// 1. Filter packets by IP "10.0.0.5" source |
| 130 | +std::vector<NetworkPacket> filteredPacketsfromSrc; |
| 131 | +std::for_each(buffer.m_packetBuffer.begin(), buffer.m_packetBuffer.end(), [&](const auto& packet) { |
| 132 | + if (packet.m_sourceIp == "10.0.0.5") { |
| 133 | + filteredPacketsfromSrc.push_back(packet); |
| 134 | + } |
| 135 | +}); |
| 136 | + |
| 137 | +// 2. Filter packets that are encrypted with HIGH priority |
| 138 | +std::vector<NetworkPacket> filteredHighPriorEncrypted; |
| 139 | +std::for_each(buffer.m_packetBuffer.begin(), buffer.m_packetBuffer.end(), [&](const auto& packet) { |
| 140 | + if (packet.m_isEncrypted && packet.m_priority == Priority::HIGH) { |
| 141 | + filteredHighPriorEncrypted.push_back(packet); |
| 142 | + } |
| 143 | +}); |
| 144 | + |
| 145 | +// 3. Filter packets by IP "6.8.8.8" destination and size > 128 bytes |
| 146 | +std::vector<NetworkPacket> filteredPacketsfromDst_128; |
| 147 | +std::for_each(buffer.m_packetBuffer.begin(), buffer.m_packetBuffer.end(), [&](const auto& packet) { |
| 148 | + if (packet.m_destinationIp == "6.8.8.8" && packet.m_packetSize > 128) { |
| 149 | + filteredPacketsfromDst_128.push_back(packet); |
| 150 | + } |
| 151 | +}); |
| 152 | +``` |
| 153 | +
|
| 154 | +
|
| 155 | +### Lambdas & Parallelization |
| 156 | +
|
| 157 | +Lambdas offer parallelization in hand. By switching from a loop to a lambda-based algorithm, you gain the ability to parallelize super easily just with `std::execution::par` . |
| 158 | +
|
| 159 | +
|
| 160 | +Though, if the underlying algorithm is not operating on atomics, we should lock manually, in order to avoid pushing back on the same memory and to protect the vector. |
| 161 | +
|
| 162 | +
|
| 163 | +Just as an example the 1st filter above, parallelized would be: |
| 164 | +
|
| 165 | +
|
| 166 | +``` cpp |
| 167 | +#include <execution> |
| 168 | +#include <mutex> |
| 169 | +
|
| 170 | +
|
| 171 | +// A mutex to lock |
| 172 | +std::mutex mtx; |
| 173 | +
|
| 174 | +std::vector<NetworkPacket> filteredPacketsfromSrc; |
| 175 | +std::for_each(std::execution::paar, buffer.m_packetBuffer.begin(), buffer.m_packetBuffer.end(), [&](const auto& packet) { |
| 176 | + if (packet.m_sourceIp == "10.0.0.5") { |
| 177 | + std::lock_guard<std::mutex> lock(mtx); // Lock here, unlock is provided by RAII |
| 178 | + filteredPacketsfromSrc.push_back(packet); |
| 179 | + } |
| 180 | +}); |
| 181 | +``` |
| 182 | + |
| 183 | + |
| 184 | +## 2nd Improvement - Avdanced Predicate and Action template class |
| 185 | + |
| 186 | +Then we can identify the pattern and implement a template function that accepts lambdas to filter and act on the buffer. Like this, we decouple the traversal mechanics (the How) from the business logic (the What). This **internal iteration** pattern allows the compiler to inline the lambdas directly into the loop while significantly improving code reuse. This idea is preffered in modern C++ libraries as well, since it is great for encapsulation. Imagine, even if we change the underlying container that we are iterating over, this would still work without changing code in so many places. |
| 187 | + |
| 188 | + |
| 189 | + |
| 190 | + |
| 191 | + |
| 192 | +We create a template function for the `struct NetworkBuffer` class that accepts a Predicate and an Action. |
| 193 | + |
| 194 | + |
| 195 | +``` cpp |
| 196 | +template <typename Predicate, typename Action> |
| 197 | +inline void filter_and_execute(Predicate&& filter, Action&& work) { |
| 198 | + // Because this is a template, 'filter' and 'work' are NOT function pointers. |
| 199 | + // They are unique types, allowing the compiler to 'paste' their logic here. |
| 200 | + for (const NetworkPacket& packet: m_packetBuffer) { |
| 201 | + if (filter(packet)) { |
| 202 | + work(packet); |
| 203 | + } |
| 204 | + } |
| 205 | +} |
| 206 | +``` |
| 207 | +
|
| 208 | +
|
| 209 | +And now we use it like: |
| 210 | +
|
| 211 | +
|
| 212 | +```cpp |
| 213 | +// 1. Filter packets by IP "10.0.0.5" source |
| 214 | +std::vector<NetworkPacket> filteredPacketsfromSrc; |
| 215 | +buffer.filter_and_execute( |
| 216 | + [](const NetworkPacket& packet) { |
| 217 | + return packet.m_sourceIp == "10.0.0.5"; |
| 218 | + }, |
| 219 | + [&](const NetworkPacket& packet) { |
| 220 | + filteredPacketsfromSrc.push_back(packet); |
| 221 | + } |
| 222 | +); |
| 223 | +
|
| 224 | +// 2. Filter packets that are encrypted with HIGH priority |
| 225 | +std::vector<NetworkPacket> filteredHighPriorEncrypted; |
| 226 | +buffer.filter_and_execute( |
| 227 | + [](const auto& p) { return p.m_isEncrypted && p.m_priority == Priority::HIGH; }, |
| 228 | + [&](const auto& p) { filteredHighPriorEncrypted.push_back(p); } |
| 229 | +); |
| 230 | +
|
| 231 | +// 3. Filter packets by IP "6.8.8.8" destination and size > 128 bytes |
| 232 | +std::vector<NetworkPacket> filteredPacketsfromDst_128; |
| 233 | +buffer.filter_and_execute( |
| 234 | + [](const auto& p) { return p.m_destinationIp == "6.8.8.8" && p.m_packetSize > 128; }, |
| 235 | + [&](const auto& p) { filteredPacketsfromDst_128.push_back(p); } |
| 236 | +); |
| 237 | +``` |
| 238 | + |
| 239 | + |
| 240 | + |
| 241 | + |
| 242 | +## 3rd Improvement - Views are even more readable and give highest performance |
| 243 | + |
| 244 | +Finally we can see how we can achieve the same with `views` from `C++20`. |
| 245 | + |
| 246 | + |
| 247 | + |
| 248 | +```cpp |
| 249 | +// 1. Filter packets by IP "10.0.0.5" source |
| 250 | +auto filter1 = buffer |
| 251 | + | std::views::filter([](const auto& p) { return p.m_sourceIp == "10.0.0.5";} ); |
| 252 | + |
| 253 | +// 2. Filter packets that are encrypted with HIGH priority |
| 254 | +auto filter2 = buffer |
| 255 | + | std::views::filter([](const auto& p) { return p.m_isEncrypted && p.m_priority == Priority::HIGH;}); |
| 256 | + |
| 257 | +// 3. Filter packets by IP "6.8.8.8" destination and size > 128 bytes |
| 258 | +auto filter3 = buffer |
| 259 | + | std::views::filter([](const auto& p) { return p.m_destinationIp == "6.8.8.8"; }) |
| 260 | + | std::views::filter([](const auto& p) { return p.m_packetSize > 128; }); |
| 261 | +``` |
| 262 | +
|
| 263 | +
|
| 264 | +The advantages now: |
| 265 | +
|
| 266 | +- Obviously way more readable |
| 267 | +
|
| 268 | +- Previously we were manually doing a `push_back`, which could trigger mem allocations. (In our case we had reserved memory, so we avoided it). Views do not create a new vector - `std::views::filter` is lazy, meaning that it doesn't move or copy anything. This saves us from allocating 3 separate temporary vectors. |
| 269 | +
|
| 270 | +- Also, In our `template filter_and_execute`, we have to run a new loop for every filter. With `views`, we can chain them and the compiler can optimize the logic into a single pass over the data, which is much better for the CPU and cache. |
| 271 | +
|
| 272 | +
|
| 273 | +
|
| 274 | +## Concluding |
| 275 | +
|
| 276 | +
|
| 277 | +
|
| 278 | +To summarize the evolution from manual loops to modern C++ abstractions: |
| 279 | +
|
| 280 | +
|
| 281 | +1. for loops with `if-else` logic make the compiler hard to optimize |
| 282 | +2. `lamdbas` are more readable and easier for the compiler to inline |
| 283 | +3. `lamdbas` offer parallelization in hand with `std::execution::par` - way easier than a manual loop |
| 284 | +4. A custom template is used often in modern libraries and is great for splitting the traversal (the **How**) from the applied logic (the **What**), completely independent of the underlying container. It handles the **internal iteration** and is powerful for enapsulation. The forwarded lambdas like `Predicate` and `Action` simply define the criteria and the What. |
| 285 | +5. `views` are even more readable and better with performance because |
| 286 | +6. `views` are lazy - as developers call them, since they are doing **external iteration** - they do not create temporary vectors for copies |
| 287 | +7. `views` are perfect for piping many filters at once. In a loop we should iterate again over the packets and apply the new filter, adding significant overhead. |
| 288 | +8. `views` give maximum inlining and performance |
| 289 | +9. `views` do not offer parallelization as easy as lambdas |
| 290 | +
|
0 commit comments