Skip to content

let processors always take care of generating their own serialized outputs #2305

@ricardogsilva

Description

@ricardogsilva

There is currently a mix of responsibilities when generating the final payload for outputs of a process execution request that manifests when doing sync execution:

  • If the media type of outputs is something other than application/json, the processor is allowed to generate the final form of outputs. For example, if the processor returns something with a media type of image/tiff, then the returned output is assumed to be a binary stream of already serialized data and is not further processed by downstream pygeoapi components;
  • However, if the media type is application/json, then both the process manager and also the (flask) route handler do some further post-processing of the output. This means the processor cannot simply return a JSON string, because the base process manager tries to wrap the output in a dict if the requested response is of type document and then finally the webapp handler is the one that serializes this dict into a JSON string. This means that the processor says that it generated something with the media type of application/json but in reality it returns a Python dict or list, which will then be turned into a JSON string downstream before the response is sent back to the client.

In my opinion, it would be better to have a more uniform contract between these components (processor, process manager, webapp route handler) and simply have the processor prepare and own the representation of its generated output(s), regardless of their media type.

The processor would thus always return a list of serialized outputs ( something like list[tuple(str, bytes)]), where each element in the list is a tuple of media type and the serialized output. The process manager could then handle this list (maybe persist each input to disk or send it to some remote storage, maybe pack them together in a multipart/related message for immediate response, etc), potentially even building a JSON document, if requested, but it would assume that all outputs are serialized. Let's take a closer look at this last scenario:

If the process manager would want to build a response of type document, it would:

  • go through each pair of media_type, serialized_output returned by the processor;
  • if the serialized_output has a media type of application/json, or any other JSON-related media type, then it would be parsed with json.loads, which would produce a Python data structure;
  • if the serialized_output had a text-based media type, it would be parsed with str(), thus producing a string;
  • if the serialized_output had another media type, it would be transcoded to a base64 representation and then parsed to a string;
  • the process manager would then build a JSON object with all of these parsed outputs by putting them all in a python dict and then doing a json.dumps and finally serialize the resulting string into a bytes type

Finally, the webapp handler would also not tamper with the serialized outputs in any way, and would rely on the also returned media type(s) to prepare its response.

I'd be willing to provide a PR for this, if this would be seen as a useful change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions