Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/data-science/basics/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,12 @@ uv init --vcs none # (1)!
By default `--vcs git` is set, which initializes a git repository. Since
git is not within the scope of this project, we set `--vcs` to none.

???+ warning "Restart VS Code if command fails"

If the command returns an error saying `uv` was not found, close and reopen
VS Code. This allows your system to recognize the newly installed `uv`
executable. Then run the command again.
Comment thread
JakobKlotz marked this conversation as resolved.

This initializes the project. `uv` creates a few files in your folder. Your
workspace should look like this:

Expand Down
24 changes: 22 additions & 2 deletions docs/data-science/data/basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,26 @@ Examples include number of students (5) or age (22).
A simple rule of thumb: If you can meaningfully have fractional values, it's
continuous. If counting whole units makes more sense, it's discrete.

???+ warning "Numbers aren't always numerical data"

Just because data is stored as numbers doesn't make it numerical.
Consider ZIP codes, their mean is mathematically possible but conceptually
meaningless.

```python
zip_codes = pd.Series([6020, 1050, 6011, 1010])
print(f"Average ZIP code: {zip_codes.mean()}") # Makes no sense!
```

```title=">>> Output"
Average ZIP code: 3522.75
```

If you can't meaningfully add, subtract or average the values, it's
categorical data in disguise.

Other examples are customer IDs or coordinates.

Comment thread
JakobKlotz marked this conversation as resolved.
### Categorical (Qualitative)

Categorical data represents qualities or characteristics that place
Expand Down Expand Up @@ -160,8 +180,8 @@ How many rows and columns has the `penguin` dataset?
- [ ] 5 rows and 8 columns
- [x] 344 rows and 7 columns

The data set has 344 rows (penguins) and 7 columns (features). Use `data.shape`
to quickly get the datasets dimensions.
The data set has 344 rows (penguins) and 7 columns (features). Use
`penguins.shape` to quickly get the datasets dimensions.
</quiz>

???+ question "Identify attribute types"
Expand Down