make optional loading of data into pandas dataframe in sf.load()

# Feature Suggestion

## Description

The `load()` function (in `load.py`) downloads the data (zip+unzip) and then directly load the data into a pandas dataframe. 
Would it be possible to add an argument to only download the data (and thus return nothing), skipping the pandas dataframe creation altogether? 
That would allow people only interested in downloading the data or using something different than Pandas to also use the nice functionalities implemented in `_maybe_download_dataset` (url, cashing, filename...). 

## Code

An easy non-breaking change could be to add an argument `create_pandas_dataframe=True`:

```
def load(dataset, variant=None, market=None, start_date=None, end_date=None,
         parse_dates=None, index=None, refresh_days=30, create_pandas_dataframe=True):
    """
    Load the dataset from local disk and return it as a Pandas DataFrame.
    ....
    :param create_pandas_dataframe:
        Boolean to create pandas dataframe with loaded data.

    :return:
        Pandas DataFrame with the data or None.
    """

    assert dataset is not None

    # Convert dataset name, variant, and market to lower-case.
    dataset = dataset.lower()
    if variant is not None:
        variant = variant.lower()
    if market is not None:
        market = market.lower()

    # Dict with dataset arguments.
    dataset_args = {'dataset': dataset, 'variant': variant, 'market': market}

    # Download file if it does not exist on local disk, or if it is too old.
    _maybe_download_dataset(**dataset_args, refresh_days=refresh_days)

    # Return Pandas DataFrame.
    if create_pandas_dataframe:
        # Lambda function for converting strings to dates. Format: YYYY-MM-DD
        date_parser = lambda x: pd.to_datetime(x, yearfirst=True, dayfirst=False)
    
        # Print status message.
        print('- Loading from disk ... ', end='')
    
        # Full path for the CSV-file on local disk.
        path = _path_dataset(**dataset_args)
        if start_date or end_date:
            print('\n- Applying filter ... ', end='')
            path = _filtered_file(path, start_date, end_date=end_date)
        
        # Load dataset into Pandas DataFrame.
        df = pd.read_csv(path, sep=';', header=0,
                        parse_dates=parse_dates, date_parser=date_parser)
    
        # Set the index and sort the data.
        if index is not None:
            # Set the index.
            df.set_index(index, inplace=True)
    
            # Sort the rows of the DataFrame according to the index.
            df.sort_index(ascending=True, inplace=True)
    
        # Print status message.
        print('Done!')
        
        return df
```

## Example

```
import simfin as sf

# only download data
sf.load_income(variant='quarterly-full-asreported', market='us', create_pandas_dataframe=False)

# download data and return pandas dataframe
sf.load_balance(variant='quarterly-full-asreported', market='us')
```


Happy to make a PR if necessary. Thanks!  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

make optional loading of data into pandas dataframe in sf.load() #25

Feature Suggestion

Description

Code

Example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

make optional loading of data into pandas dataframe in sf.load() #25

Description

Feature Suggestion

Description

Code

Example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions