Skip to content
This repository was archived by the owner on Mar 3, 2023. It is now read-only.
This repository was archived by the owner on Mar 3, 2023. It is now read-only.

Specification of dataset type in training docs (06_jupyter_notebook_workflow.md) as pandas.JSONDataSet throws an AttributeError when saving #21

@abhi8893

Description

@abhi8893

Description

The 06_jupyter_notebook_workflow.md in the training_docs has the following section:

Put the following dataset entry in conf/base/catalog.yml:

my_dataset:
  type: pandas.JSONDataSet
  filepath: data/01_raw/my_dataset.json

Next, you need to reload Kedro variables by calling %reload_kedro line magic in your Jupyter notebook.

Finally, you can save the data by executing the following command:

my_dict = {"key1": "some_value", "key2": None}
catalog.save("my_dataset", my_dict)

Error

The specified type of the dataset i.e. type: pandas.JSONDataSet seems throw an AttributeError when saving the data:

Traceback:

2021-11-10 23:11:44,722 - kedro.io.data_catalog - INFO - Saving data to `my_dataset` (JSONDataSet)...
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/Volumes/GoogleDrive/My Drive/projects/kedro-training/venv/lib/python3.8/site-packages/kedro/io/core.py in save(self, data)
    237             self._logger.debug("Saving %s", str(self))
--> 238             self._save(data)
    239         except DataSetError:

/Volumes/GoogleDrive/My Drive/projects/kedro-training/venv/lib/python3.8/site-packages/kedro/extras/datasets/pandas/json_dataset.py in _save(self, data)
    161         with self._fs.open(save_path, **self._fs_open_args_save) as fs_file:
--> 162             data.to_json(path_or_buf=fs_file, **self._save_args)
    163 

AttributeError: 'dict' object has no attribute 'to_json'

The above exception was the direct cause of the following exception:

DataSetError                              Traceback (most recent call last)
/var/folders/ps/m62g53713k7_knw76b1s566r0000gn/T/ipykernel_55726/2517118377.py in <module>
      1 my_dict = {"key1": "some_value", "key2": None}
----> 2 catalog.save("my_dataset", my_dict)

/Volumes/GoogleDrive/My Drive/projects/kedro-training/venv/lib/python3.8/site-packages/kedro/io/data_catalog.py in save(self, name, data)
    447 
    448         func = self._get_transformed_dataset_function(name, "save", dataset)
--> 449         func(data)
    450 
    451         version = (

/Volumes/GoogleDrive/My Drive/projects/kedro-training/venv/lib/python3.8/site-packages/kedro/io/core.py in save(self, data)
    623         save_version = self.resolve_save_version()  # Make sure last save version is set
    624         try:
--> 625             super().save(data)
    626         except (FileNotFoundError, NotADirectoryError) as err:
    627             # FileNotFoundError raised in Win, NotADirectoryError raised in Unix

/Volumes/GoogleDrive/My Drive/projects/kedro-training/venv/lib/python3.8/site-packages/kedro/io/core.py in save(self, data)
    243         except Exception as exc:
    244             message = f"Failed while saving data to data set {str(self)}.\n{str(exc)}"
--> 245             raise DataSetError(message) from exc
    246 
    247     def __str__(self):

DataSetError: Failed while saving data to data set JSONDataSet(filepath=/Volumes/GoogleDrive/My Drive/projects/kedro-training/spaceflights/data/01_raw/my_dataset.json, load_args={}, protocol=file, save_args={}).
'dict' object has no attribute 'to_json'

Possible Resolution

Specifying the dataset type as type: json.JSONDataSet resolves the problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions