Description
The 06_jupyter_notebook_workflow.md in the training_docs has the following section:
Put the following dataset entry in conf/base/catalog.yml:
my_dataset:
type: pandas.JSONDataSet
filepath: data/01_raw/my_dataset.json
Next, you need to reload Kedro variables by calling %reload_kedro line magic in your Jupyter notebook.
Finally, you can save the data by executing the following command:
my_dict = {"key1": "some_value", "key2": None}
catalog.save("my_dataset", my_dict)
Error
The specified type of the dataset i.e. type: pandas.JSONDataSet seems throw an AttributeError when saving the data:
Traceback:
2021-11-10 23:11:44,722 - kedro.io.data_catalog - INFO - Saving data to `my_dataset` (JSONDataSet)...
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/Volumes/GoogleDrive/My Drive/projects/kedro-training/venv/lib/python3.8/site-packages/kedro/io/core.py in save(self, data)
237 self._logger.debug("Saving %s", str(self))
--> 238 self._save(data)
239 except DataSetError:
/Volumes/GoogleDrive/My Drive/projects/kedro-training/venv/lib/python3.8/site-packages/kedro/extras/datasets/pandas/json_dataset.py in _save(self, data)
161 with self._fs.open(save_path, **self._fs_open_args_save) as fs_file:
--> 162 data.to_json(path_or_buf=fs_file, **self._save_args)
163
AttributeError: 'dict' object has no attribute 'to_json'
The above exception was the direct cause of the following exception:
DataSetError Traceback (most recent call last)
/var/folders/ps/m62g53713k7_knw76b1s566r0000gn/T/ipykernel_55726/2517118377.py in <module>
1 my_dict = {"key1": "some_value", "key2": None}
----> 2 catalog.save("my_dataset", my_dict)
/Volumes/GoogleDrive/My Drive/projects/kedro-training/venv/lib/python3.8/site-packages/kedro/io/data_catalog.py in save(self, name, data)
447
448 func = self._get_transformed_dataset_function(name, "save", dataset)
--> 449 func(data)
450
451 version = (
/Volumes/GoogleDrive/My Drive/projects/kedro-training/venv/lib/python3.8/site-packages/kedro/io/core.py in save(self, data)
623 save_version = self.resolve_save_version() # Make sure last save version is set
624 try:
--> 625 super().save(data)
626 except (FileNotFoundError, NotADirectoryError) as err:
627 # FileNotFoundError raised in Win, NotADirectoryError raised in Unix
/Volumes/GoogleDrive/My Drive/projects/kedro-training/venv/lib/python3.8/site-packages/kedro/io/core.py in save(self, data)
243 except Exception as exc:
244 message = f"Failed while saving data to data set {str(self)}.\n{str(exc)}"
--> 245 raise DataSetError(message) from exc
246
247 def __str__(self):
DataSetError: Failed while saving data to data set JSONDataSet(filepath=/Volumes/GoogleDrive/My Drive/projects/kedro-training/spaceflights/data/01_raw/my_dataset.json, load_args={}, protocol=file, save_args={}).
'dict' object has no attribute 'to_json'
Possible Resolution
Specifying the dataset type as type: json.JSONDataSet resolves the problem.
Description
The 06_jupyter_notebook_workflow.md in the training_docs has the following section:
Put the following dataset entry in
conf/base/catalog.yml:Next, you need to reload Kedro variables by calling %reload_kedro line magic in your Jupyter notebook.
Finally, you can save the data by executing the following command:
Error
The specified type of the dataset i.e.
type: pandas.JSONDataSetseems throw anAttributeErrorwhen saving the data:Traceback:
Possible Resolution
Specifying the dataset type as
type: json.JSONDataSetresolves the problem.