-
-
Notifications
You must be signed in to change notification settings - Fork 314
let processors always take care of generating their own serialized outputs #2305
Description
There is currently a mix of responsibilities when generating the final payload for outputs of a process execution request that manifests when doing sync execution:
- If the media type of outputs is something other than
application/json, the processor is allowed to generate the final form of outputs. For example, if the processor returns something with a media type ofimage/tiff, then the returned output is assumed to be a binary stream of already serialized data and is not further processed by downstream pygeoapi components; - However, if the media type is
application/json, then both the process manager and also the (flask) route handler do some further post-processing of the output. This means the processor cannot simply return a JSON string, because the base process manager tries to wrap the output in a dict if the requested response is of typedocumentand then finally the webapp handler is the one that serializes this dict into a JSON string. This means that the processor says that it generated something with the media type ofapplication/jsonbut in reality it returns a Pythondictorlist, which will then be turned into a JSON string downstream before the response is sent back to the client.
In my opinion, it would be better to have a more uniform contract between these components (processor, process manager, webapp route handler) and simply have the processor prepare and own the representation of its generated output(s), regardless of their media type.
The processor would thus always return a list of serialized outputs ( something like list[tuple(str, bytes)]), where each element in the list is a tuple of media type and the serialized output. The process manager could then handle this list (maybe persist each input to disk or send it to some remote storage, maybe pack them together in a multipart/related message for immediate response, etc), potentially even building a JSON document, if requested, but it would assume that all outputs are serialized. Let's take a closer look at this last scenario:
If the process manager would want to build a response of type document, it would:
- go through each pair of
media_type, serialized_outputreturned by the processor; - if the serialized_output has a media type of
application/json, or any other JSON-related media type, then it would be parsed withjson.loads, which would produce a Python data structure; - if the serialized_output had a text-based media type, it would be parsed with
str(), thus producing a string; - if the serialized_output had another media type, it would be transcoded to a base64 representation and then parsed to a string;
- the process manager would then build a JSON object with all of these parsed outputs by putting them all in a python dict and then doing a
json.dumpsand finally serialize the resulting string into abytestype
Finally, the webapp handler would also not tamper with the serialized outputs in any way, and would rely on the also returned media type(s) to prepare its response.
I'd be willing to provide a PR for this, if this would be seen as a useful change.