let processors always take care of generating their own serialized outputs

There is currently a mix of responsibilities when generating the final payload for outputs of a process execution request that manifests when doing sync execution:

- If the media type of outputs is something other than `application/json`, the processor is allowed to generate the final form of outputs. For example, if the processor returns something with a media type of `image/tiff`, then the returned output is assumed to be a binary stream of already serialized data and is not further processed by downstream pygeoapi components;
- However, if the media type is `application/json`, then both the process manager and also the (flask) route handler do some further post-processing of the output. This means the processor cannot simply return a JSON string, because the base process manager tries to wrap the output in a dict if the requested response is of type `document` and then finally the webapp handler is the one that serializes this dict into a JSON string. This means that the processor says that it generated something with the media type of `application/json` but in reality it returns a Python `dict` or `list`, which will then be turned into a JSON string downstream before the response is sent back to the client.

In my opinion, it would be better to have a more uniform contract between these components (processor, process manager, webapp route handler) and simply have the processor prepare and own the representation of its generated output(s), regardless of their media type.

The processor would thus always return a  list of serialized outputs ( something like `list[tuple(str, bytes)]`), where each element in the list is a tuple of media type and the serialized output. The process manager could then handle this list (maybe persist each input to disk or send it to some remote storage, maybe pack them together in a `multipart/related` message for immediate response, etc), potentially even building a JSON document, if requested, but it would assume that all outputs are serialized. Let's take a closer look at this last scenario:

If the process manager would want to build a response of type `document`, it would:

- go through each pair of `media_type, serialized_output` returned by the processor;
- if the serialized_output has a media type of `application/json`, or any other JSON-related media type, then it would be parsed with `json.loads`, which would produce a Python data structure;
- if the serialized_output had a text-based media type, it would be parsed with `str()`, thus producing a string;
- if the serialized_output had another media type, it would be transcoded to a base64 representation and then parsed to a string;
- the process manager would then build a JSON object with all of these parsed outputs by putting them all in a python dict and then doing a `json.dumps` and finally serialize the resulting string into a `bytes` type 

Finally, the webapp handler would also not tamper with the serialized outputs in any way, and would rely on the also returned media type(s) to prepare its response.

I'd be willing to provide a PR for this, if this would be seen as a useful change. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

let processors always take care of generating their own serialized outputs #2305

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

let processors always take care of generating their own serialized outputs #2305

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions