Skip to content
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
f7bf7c9
Created WhatTheHack template stub
Jul 18, 2023
c668964
Initial Add
Aug 11, 2023
2884ea4
Updates
Aug 13, 2023
08d16fe
Add spell checks
Aug 15, 2023
3643b19
Merge branch 'master' into xxx-FabricLakehouse
Aug 16, 2023
36df6b9
Minor updates
Aug 16, 2023
864c818
Rename hackathon folder
Sep 7, 2023
3e5d37b
Merge branch 'master' into xxx-FabricLakehouse
Sep 7, 2023
414ef97
Update spellcheck word list
Sep 7, 2023
7878596
Updated the Success Criteria to a numbered list
jcbendernh Sep 16, 2023
ccd1054
Updated the Solution 00, grammatical.
jcbendernh Sep 16, 2023
e163c61
Update .wordlist.txt
jordanbean-msft Sep 19, 2023
215f4b9
Update README.md
jordanbean-msft Sep 19, 2023
c280b80
Update Solution-00.md
jordanbean-msft Sep 19, 2023
0c9b932
Update Solution-04.md
jordanbean-msft Sep 19, 2023
d851303
Update Solution-05.md
jordanbean-msft Sep 19, 2023
798b97a
Update Solution-05.md
jordanbean-msft Sep 19, 2023
0679b0f
Update Challenge-00.md
jordanbean-msft Sep 19, 2023
2326f84
Update Challenge-00.md
jordanbean-msft Sep 19, 2023
5bd503c
Update Challenge-01.md
jordanbean-msft Sep 19, 2023
b8b91bf
Update Challenge-04.md
jordanbean-msft Sep 19, 2023
f20975f
Update Challenge-05.md
jordanbean-msft Sep 19, 2023
2853206
Update Solution-05.md
jordanbean-msft Sep 19, 2023
e0104ec
Update Solution-00.md
jordanbean-msft Sep 19, 2023
7b704dc
Update Solution-03.md
jordanbean-msft Sep 19, 2023
b3ac670
Merge branch 'xxx-FabricLakehouse' into xxx-FabricLakehouse
liesel-h Sep 20, 2023
50199a1
Merge pull request #2 from jcbendernh/xxx-FabricLakehouse
liesel-h Sep 20, 2023
94e1740
Merge branch 'microsoft:master' into xxx-FabricLakehouse
liesel-h Oct 18, 2023
719cc05
Additional lakehouse notes. Re-work coach guides
Oct 18, 2023
cdafc43
Add shipwrecks geojson data
liesel-h Oct 24, 2023
234482a
Merge branch 'xxx-FabricLakehouse' of https://github.com/liesel-h/Wha…
Oct 25, 2023
36fa62d
Move shipwrecks geojson to Solutions folder
Oct 26, 2023
213b2e1
Rework student and coach guides.
Oct 27, 2023
b68fb06
Updates post feedback. Reworked Coach guides, solution files, cleaned…
Jan 5, 2024
5c0f446
Update solution and resources code. Cleanup files
Jan 5, 2024
441523b
Jekyll complaining about fenced M code :(
Jan 5, 2024
f93a201
Jekyll parsing M again :(
Jan 5, 2024
d7222e0
Make word change
kriation Jun 17, 2024
32d3a1e
Make whitespace adjustment
kriation Jun 17, 2024
bdabe8e
Make wording adjustment of verb tense
kriation Jun 17, 2024
cab0282
Make change to position of markdown link
kriation Jun 17, 2024
5058dd3
Make change to example abbreviation
kriation Jun 17, 2024
4238541
Make correction to abbreviation
kriation Jun 17, 2024
50635e1
Make correction to abbreviation
kriation Jun 17, 2024
b31d8cc
Cut rogue comma
kriation Jun 17, 2024
520fc37
Cut rogue comma
kriation Jun 17, 2024
0e634e3
Make capitalization change
kriation Jun 17, 2024
619590c
Make change to markdown link
kriation Jun 17, 2024
76a07c4
Make slight word change to sentence
kriation Jun 17, 2024
646579f
Make minor spelling corrections
kriation Jun 17, 2024
ed45c26
Make minor spelling change
kriation Jun 17, 2024
448d958
Start optimizing .wordlist.txt for this content
kriation Jun 17, 2024
d88a159
Optimize .wordlist.txt
kriation Jun 17, 2024
defc4f1
Make minor spelling and grammatical corrections
kriation Jun 17, 2024
7499b67
Add word to .wordlist.txt
kriation Jun 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions 067-FabricLakehouse/.wordlist.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
lakehouse
datalake
Margie
datasource
datasources
shapefile
shapefiles
ECMWF
yarrr
pre-reqs
pyspark
T-SQL
datafow
dataflows
TMTOWTDI
doco
Nemo
Leeuwin
Boorloo
Whadjuk
DAX
Workspaces
workspaces
workspace
Datamesh
Realtime
WTHs
OneLake
BOM
bom
shtml
spatialdata
WAM
Batavia
snorkellers
Liesel
FabricTrial
getpowerbi









impactful
Binary file added 067-FabricLakehouse/Coach/Lectures.pptx
Binary file not shown.
140 changes: 140 additions & 0 deletions 067-FabricLakehouse/Coach/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# What The Hack - Fabric Lakehouse - Coach Guide

## Introduction

Welcome to the coach's guide for the Fabric Lakehouse What The Hack. Here you will find links to specific guidance for coaches for each of the challenges.

This hack includes an awesome [lecture presentation](Lectures.pptx) that features short presentations to introduce key topics associated with each challenge. It is recommended that the host present each short presentation before attendees kick off that challenge.

You may also want to customise the deck to include logistics, timings, you and other coaches details, and other information specific to your event. Additional sea-themed puns are also encouraged. Yarrr...

**NOTE:** If you are a Hackathon participant, this is the answer guide. Don't cheat yourself by looking at these during the hack! Go learn something. :)

## Coach's Guides

- Challenge 00: **[Prerequisites - Grab your fins and a full tank!](Solution-00.md)**
- Provision your Fabric Lakehouse
- Challenge 01: **[Finding Data](Student/Challenge-01.md)**
- Head out into open waters to find your data
- Challenge 02: **[Land ho!](Solution-02.md)**
- Land your data in your Fabric Lakehouse
- Challenge 03: **[Swab the Decks!](Solution-03.md)**
- Clean and combine your datasets ready for analysis
- Challenge 04: **[Make a Splash](Solution-04.md)**
- Build a data story to bring your findings to life
- Challenge 05: **[Giant Stride](Solution-05.md)**
- Take a giant stride and share your data story with the world

## Coach Prerequisites

This hack has pre-reqs that a coach is responsible for understanding and/or setting up BEFORE hosting an event. Please review the [What The Hack Hosting Guide](https://aka.ms/wthhost) for information on how to host a hack event.

The guide covers the common preparation steps a coach needs to do before any What The Hack event.

### About the Audience

Students will most likely have a wide range of backgrounds and experience ranging from Data Engineers comfortable with Pyspark, T-SQL and datalake technology, to business analysts with little or no coding experience. The hack is intended to be accessible to all, but coaches should be aware of the following:

- The hack is designed to be completed by teams of 2-4 students working together. Coaches should consider how to arrange students into teams to ensure a mix of skills and experience. However, for an event with more experienced (or daring) students the hack can be completed individually.
- The hack is designed to be completed in a linear fashion. Encourage students to work outside of their areas of expertise, but be aware that some challenges may be more difficult for some students than others. Use your discretion - students may commence challenges in parallel. For example, a Power BI reporting expert may wish to start Challenge 4 while the other team members are data wrangling in Challenge 2/3.


### Duration & Agenda

Ideally this hack can be completed in a day, but it is highly dependant on student profiles. For a more relaxed pace for a less experienced group, consider running the hack over 2 days - one day for challenges 1-3 and the second day for challenges 3-5.

Times are very fluid and almost certainly will vary, but a *very* indicative agenda for a 1 day hack is as follows:

|Time|Duration|Activity|
|----|--------|--------|
|9:30 AM|15 mins|Welcome!|
|9:45 AM|30 mins|An Overview of Microsoft Fabric|
|10:15 AM|15 mins|About The Hack & Challenge 0 - Get Your Gear Ready|
|10:30 AM|15 mins|Break|
|10:45 AM|45 mins|Challenge 1 - Finding Data|
|11:30 AM|60 mins|Challenge 2 - Land Ho!|
|12:00 PM|60 mins|Lunch|
|1:00 PM|30 min|Challenge 2 cont.|
|1:30 PM|60 mins|Challenge 3 - Swab the Decks!|
|2:30 PM|15 mins|Break|
|2:45 PM|60 mins|Challenge 4 - Make A Splash!|
|3:45 PM|30 mins|Challenge 5 - Giant Stride|
|4:15 PM|15 mins|Wrap Up|

### Additional Coach Prerequisites

This hack as been left very open ended - [TMTOWTDI](https://perl.fandom.com/wiki/TIMTOWTDI). Coaches should be familiar with the following technologies and concepts:

- Microsoft Fabric (duh!)
- Python / PySpark
- Dataflow Gen 2 / M / Power Query
- Power BI / DAX

Coaches should deploy the solution files before the event to ensure they are familiar with the approach and the technologies used. Coaches should also be familiar with the datasets and source government agency websites used in the example solutions. Links are provided in the various [Solutions](./Solutions).

### Coach Resources

- [What is Microsoft Fabric?](https://aka.ms/learnfabric)
- [Microsoft Fabric Blog](https://aka.ms/FabricBlog)

- [Import Existing Notebooks](https://learn.microsoft.com/en-us/fabric/data-engineering/how-to-use-notebook#import-existing-notebooks)
- [Importing a datafow gen2 template](https://learn.microsoft.com/en-us/fabric/data-factory/move-dataflow-gen1-to-dataflow-gen2)

- [Fabric (trial) Known Issues](https://learn.microsoft.com/en-gb/fabric/get-started/fabric-known-issues)

### Student Resources

Always refer students to the [What The Hack website](https://aka.ms/wth) for the student guide: [https://aka.ms/wth](https://aka.ms/wth)

There are no specific student resources for this hack, but you of course may share parts of the solutions, hints, doco links etc with students who may be struggling as you see fit.

**NOTE:** Students should **not** be given a link to the What The Hack repo before or during a hack. The student guide does **NOT** have any links to the Coach's guide or the What The Hack repo on GitHub.

## Azure / Fabric Requirements

This hack requires students to have access to a Microsoft Fabric tenant account with an active subscription. These tenant requirements should be shared with stakeholders in the organization that will be providing the Fabric environment that will be used by the students.

There are no specific Azure resources required for this hack, beyond a Fabric capacity (see below).

### Enabling Microsoft Fabric

Fabric needs to be enabled in the tenant (see [Enable Microsoft Fabric for your organization](https://learn.microsoft.com/en-us/fabric/admin/fabric-switch)) and students need to be granted permission to create Fabric resources. Depending on the host org config and policy, students could be added to an security group that has permission to create Fabric resources:

![](https://learn.microsoft.com/en-us/fabric/admin/media/fabric-switch/fabric-switch-enabled.png)

### Microsoft Fabric Licenses

At the time of writing, Fabric is in preview and a trial license is available. See [Start a Fabric (Preview) trial](https://learn.microsoft.com/en-us/fabric/get-started/fabric-trial). Post-trial, a paid Fabric capacity will be required, although a small F2-F4 capacity should be sufficient for this hack.

See [Microsoft Fabric Licenses](https://learn.microsoft.com/en-us/fabric/enterprise/licenses) for details.

### Microsoft Fabric Workspace

Students will require a Microsoft Fabric enabled Workspace where they can create Fabric artefacts (Lakehouse, Dataflow, Pipeline, Notebook, Report etc). If the group is arranged into pods, it is recommended to provision one workspace per pod. If the group is working individually, it is recommended to provision one workspace per student. It is recommended that students are granted Admin role on this workspace to allow them to create and manage all artefacts.

*Note:* for pods collaborating in a per-pod Workspace, students may also require a Power BI Pro license to publish reports to this workspace.

See [Microsoft Fabric Workspaces](https://learn.microsoft.com/en-us/fabric/get-started/workspaces) and [Create a workspace](https://learn.microsoft.com/en-us/fabric/get-started/create-workspace) for details.

### Power BI Desktop

Students will require Power BI desktop to be installed on their PC. Either the store or download version is fine.

## Repository Contents

- `./Coach`
- Coach's Guide and related files
- `./Coach/Solutions`
- Solution files with completed example answers to a challenge
- `./Student`
- Student's Challenge Guide
- `./Student/Resources`
- Resource files, sample code, scripts, etc meant to be provided to students. (Must be packaged up by the coach and provided to students at start of event)


## Other Fabric What The Hacks

These WTHs are currently in development and will be released soon:
- Fabric Datamesh
- Fabric Realtime

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions 067-FabricLakehouse/Coach/Solution-00.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Challenge 00 - Prerequisites - Grab your fins and a full tank! - Coach's Guide

**[Home](./README.md)** - [Next Solution >](./Solution-01.md)

## Notes & Guidance

Please make sure that the students review the [introduction](../readme.md)] and [Challenge 0 - Prerequisites - Ready, Set, GO! - Student's Guide](../Student/Challenge-00.md) ahead of time. Also ensure you have read the prerequisites section of the [Coach's Guide](./README.md).

Students will need access to a Fabric enabled workspace and Power BI desktop. Again, see the pre-reqs in the [Coach's Guide](./README.md) for more details.
33 changes: 33 additions & 0 deletions 067-FabricLakehouse/Coach/Solution-01.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Challenge 01 - Finding Nemo - Coach's Guide

[< Previous Solution](./Solution-00.md) - **[Home](./README.md)** - [Next Solution >](./Solution-02.md)

## Notes & Guidance

This first challenge is all about finding the data but not importing it (yet). The output is a list of datasets that meet the requirements, a strategy for ingesting / processing and a selection of the "best" tool - notebook, dataflow etc. Actual development starts in challenge 2.

For this challenge, the students will be searching for suitable datasources online. You should ensure that they are aware of the following:

- Licensing
- Copyright

This object of this challenge is to get the students to think about

- the data they need to meet the requirements,
- what sources are available
- how it is licensed
- how they can land this data automatically in OneLake

### Solution

The example solutions have been built using Australian Bureau of Meteorology and Western Australian Museum datasets:

- BOM FTP data services: http://www.bom.gov.au/catalogue/data-feeds.shtml and http://www.bom.gov.au/catalogue/anon-ftp.shtml

- ``IDW11160`` - Coastal Waters Forecast - All Districts (WA)
- ``IDM000003`` - Marine Zones - http://reg.bom.gov.au/catalogue/spatialdata.pdf

- WA Museum
- ``WAM-002`` https://catalogue.data.wa.gov.au/dataset/shipwrecks (requires a free SLIP account and is CC BY 4.0)

More advanced students might like to include climate (temperature and wave) models from [ECMWF Open Data](https://planetarycomputer.microsoft.com/dataset/ecmwf-forecast) available via the Microsoft Planetary Computer. An example notebook is included in the [Solutions](./Solutions) folder.
15 changes: 15 additions & 0 deletions 067-FabricLakehouse/Coach/Solution-02.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Challenge 02 - Land Ho! - Coach's Guide

[< Previous Solution](./Solution-01.md) - **[Home](./README.md)** - [Next Solution >](./Solution-03.md)

## Notes & Guidance

This challenge implements the design developed in Challenge 1, landing data in a "raw" format ready to import in [the next Challenge](./Solution-03.md).

## Solution

Notebooks were used to download the datasets discussed in [Challenge 1](./Solution-01.md), using examples from the source documentation. See the notebooks in the [Solutions](./Solutions) folder and details in [Solution 3](./Solution-03.md).

More advanced students might like to include climate (temperature and wave) models from [ECMWF Open Data](https://planetarycomputer.microsoft.com/dataset/ecmwf-forecast) available via the Microsoft Planetary Computer. An example notebook is included in the [Solutions](./Solutions) folder.

Depending on the skill set of the group, coding automated retrieval may prove challenging. It's OK to skip ahead and to manually download datasets. The important thing is to have attempted this challenge, and the data ready in OneLake for the next.
26 changes: 26 additions & 0 deletions 067-FabricLakehouse/Coach/Solution-03.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Challenge 03 - Swab the Decks! - Coach's Guide

[< Previous Solution](./Solution-02.md) - **[Home](./README.md)** - [Next Solution >](./Solution-04.md)

## Notes & Guidance

Challenge Three is about cleaning and loading the data to delta tables. The students should be encouraged to explore both notebooks and dataflows (for example, processing BOM forecast XML is easy in a dataflow, but the WAM-002 data is better suited to a notebook).

Overall, the method used by the students are very much a design choice by each.

## Solutions

Solutions are contained in the [Solutions](./Solutions) folder,

__Dataflow2__
``Load BOM Forecasts.pqt`` - dataflow template
``Dataflow Load IDW11160 M Code.txt`` - M code for the above dataflow

__Notebooks__
``Download BOM Forecasts To OneLake.ipynb`` - downloads BOM forecasts to OneLake
``Load Shipwrecks.ipynb`` - loads WAM-002 data merging with the BOM Marine Zones data
``Loading Planetary Computer Climate Prediction Models.ipynb`` - bonus level - loads ECMWF climate models from the Planetary Computer

__Misc__
``Cleanup.ipynb`` - cleans up OneLake tables and files
``troubleshooting/Cancel-Dataflow.ps1`` - a Posh script to cancel a dataflow (or at least mark the metadata as cancelled)
11 changes: 11 additions & 0 deletions 067-FabricLakehouse/Coach/Solution-04.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Challenge 04 - Make a Splash! - Coach's Guide

[< Previous Solution](./Solution-03.md) - **[Home](./README.md)** - [Next Solution >](./Solution-05.md)

## Notes & Guidance

Here the challenge is to visualise the data. Power BI is the obvious choice, but if a group wish to build a dashboard in Excel, or use Power Apps, or React then that is fine too.

Emphasis should be placed on the data story and meeting the requirements.

A good data story will have a clear narrative, and will be easy to follow. It will also be visually appealing, address the specific needs of the audience, and will make good use of the data. Students should be encouraged to use other media (whilst keeping within any licensing terms) such as images, to enhance their story.
11 changes: 11 additions & 0 deletions 067-FabricLakehouse/Coach/Solution-05.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Challenge 05 - Giant Stride - Coach's Guide

[< Previous Solution](./Solution-04.md) - **[Home](./README.md)** - [Next Solution >](./Solution-06.md)

## Notes & Guidance

This challenge is about showing value to the customer and reflecting on the day. Although it might be tempting to drop this challenge (especially if time is tight), it's important to for the teams to present their work so please try to save reserve time to include this activity.

You will play the part of the customer. Make up a character (CDO, Dive SHop Manager, Salty old sea dog) and make it a fun retro. How's your pirate accent?

You might like to award small prizes for the best team, the best presentation, or the best pirate accent, but this challenge is not about winning or losing, it's about reflecting on the day and having fun. It's all about the bubbles we blew along the way.
Empty file.
83 changes: 83 additions & 0 deletions 067-FabricLakehouse/Coach/Solutions/Cleanup.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ff6af748-d15b-4f95-b740-d370a113ba63",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## Clean Up OneLake"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2c3c10d0-3210-4f39-ac8f-aaf791260aac",
"metadata": {},
"outputs": [],
"source": [
"from shutil import rmtree\n",
"rmtree(\"/lakehouse/default/Files/BOM\")\n",
"rmtree(\"/lakehouse/default/Files/WAM\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c4099c04-73a8-40b4-a4c1-13c22459dd3f",
"metadata": {
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"microsoft": {
"language": "sparksql"
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"%%sql\n",
"DROP TABLE IF EXISTS Forecasts;\n",
"DROP TABLE IF EXISTS Shipwrecks;"
]
}
],
"metadata": {
"kernel_info": {
"name": "synapse_pyspark"
},
"kernelspec": {
"display_name": "Synapse PySpark",
"language": "Python",
"name": "synapse_pyspark"
},
"language_info": {
"name": "python"
},
"notebook_environment": {},
"save_output": true,
"spark_compute": {
"compute_id": "/trident/default",
"session_options": {
"conf": {},
"enableDebugMode": false
}
},
"synapse_widget": {
"state": {},
"version": "0.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading