Skip to content

concepts.data_storage

Justin Joyce edited this page Jul 15, 2021 · 2 revisions
The datasets are layed out like this:

path/to/dataset                 <-- this is the value in the dataset params
    /year_2021
        /month_03
            /day_20             <-- datasets are for specific dates, this part
                                    is added automatically if not specified
                /as_at_<id>     <-- datasets can't be deleted, if you have a
                                    correction, you create the dataset again,
                                    each run puts the data into a timestamped
                                    frame. Usually, you read from the latest
                                    frame when you read the dataset
                    /blob       <-- the data is split into 64Mb blobs,
                                    these help keep memory requirements low
                                    and help to avoid small file problems

Clone this wiki locally