Added several performance optimizations to chunk encoding and decoding. Low-latency stores that do not benefit from
async operations can now implement synchronous IO methods which will be used when available during chunk processing.
Similarly, codecs can implement a synchronous API which will be used if available during chunk processing.
These changes remove unnecessary interactions with the event loop.
The synchronous chunk processing path optionally uses a thread pool to parallelize codec work across chunks. The pool is skipped for single-chunk operations and for pipelines that only contain cheap codecs (e.g. endian swap, transpose, checksum).
Use of the thread pool can be disabled in the global configuration. The minimum number of threads and the maximum number of threads can be set via the configuration as well.