-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathPizza Runner Solutions.sql
More file actions
369 lines (307 loc) · 14.5 KB
/
Pizza Runner Solutions.sql
File metadata and controls
369 lines (307 loc) · 14.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
/**********************
Case Study Questions
This case study has LOTS of questions - they are broken up by area of focus including:
Pizza Metrics
Runner and Customer Experience
Ingredient Optimisation
Pricing and Ratings
Bonus DML Challenges (DML = Data Manipulation Language)
Each of the following case study questions can be answered using a single SQL statement.
Again, there are many questions in this case study - please feel free to pick and choose which ones you’d like to try!
Before you start writing your SQL queries however - you might want to investigate the data, you may want to do something with some of those null values and data types in the customer_orders and runner_orders tables!
A. Pizza Metrics
How many pizzas were ordered?
How many unique customer orders were made?
How many successful orders were delivered by each runner?
How many of each type of pizza was delivered?
How many Vegetarian and Meatlovers were ordered by each customer?
What was the maximum number of pizzas delivered in a single order?
For each customer, how many delivered pizzas had at least 1 change and how many had no changes?
How many pizzas were delivered that had both exclusions and extras?
What was the total volume of pizzas ordered for each hour of the day?
What was the volume of orders for each day of the week?
B. Runner and Customer Experience
How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)
What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?
Is there any relationship between the number of pizzas and how long the order takes to prepare?
What was the average distance travelled for each customer?
What was the difference between the longest and shortest delivery times for all orders?
What was the average speed for each runner for each delivery and do you notice any trend for these values?
What is the successful delivery percentage for each runner?
C. Ingredient Optimisation
What are the standard ingredients for each pizza?
What was the most commonly added extra?
What was the most common exclusion?
Generate an order item for each record in the customers_orders table in the format of one of the following:
Meat Lovers
Meat Lovers - Exclude Beef
Meat Lovers - Extra Bacon
Meat Lovers - Exclude Cheese, Bacon - Extra Mushroom, Peppers
Generate an alphabetically ordered comma separated ingredient list for each pizza order from the customer_orders table and add a 2x in front of any relevant ingredients
For example: "Meat Lovers: 2xBacon, Beef, ... , Salami"
What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first?
D. Pricing and Ratings
If a Meat Lovers pizza costs $12 and Vegetarian costs $10 and there were no charges for changes - how much money has Pizza Runner made so far if there are no delivery fees?
What if there was an additional $1 charge for any pizza extras?
Add cheese is $1 extra
The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset - generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5.
Using your newly generated table - can you join all of the information together to form a table which has the following information for successful deliveries?
customer_id
order_id
runner_id
rating
order_time
pickup_time
Time between order and pickup
Delivery duration
Average speed
Total number of pizzas
If a Meat Lovers pizza was $12 and Vegetarian $10 fixed prices with no cost for extras and each runner is paid $0.30 per kilometre traveled - how much money does Pizza Runner have left over after these deliveries?
E. Bonus Questions
If Danny wants to expand his range of pizzas - how would this impact the existing data design? Write an INSERT statement to demonstrate what would happen if a new Supreme pizza with all the toppings was added to the Pizza Runner menu?
************************/
/*****
noticed issues and observations in the data:
1. null values are allowed in every column of every table.
2. While inserting values, sometimes proper NULL, sometimes 'null' and sometimes, '' is provided.
3. although we can understand that there are primary keys and foreign keys, they are not specified while creating tables.
4. column 'order_time' in table 'customer_orders' is of type TIMESTAMP wheras column 'pickup_time' in table 'runner_orders' is a varchar.
5. column 'registration_date' in 'runners' table is of type DATE.
6. 'topping_id' in pizza_toppings, 'toppings' in pizza_recipes, and 'exclusions' & 'extras' in customer_orders seem related.
7. entries in columns 'distance' and 'duration' of table 'runner_order' need to be cleaned.
*****/
use pizza_runner;
show tables;
/**** Cleaning Data ****/
update customer_orders set exclusions = null where exclusions in ('null', '');
update customer_orders set extras = null where extras in ('null', '');
update runner_orders set pickup_time = null where pickup_time in ('null', '');
update runner_orders set distance = null where distance in ('null', '');
update runner_orders set duration = null where duration in ('null', '');
update runner_orders set cancellation = null where cancellation in ('null', '');
alter table runner_orders modify column pickup_time timestamp;
alter table runner_orders modify column pickup_time timestamp;
update runner_orders set distance=replace(distance, 'km', '');
alter table runner_orders modify column distance float;
update runner_orders set duration= left(duration, 2);
alter table runner_orders modify column duration int;
describe runner_orders;
-- A. Pizza Metrics
-- 1. How many pizzas were ordered?
select count(pizza_id) as total_pizzas_ordered
from customer_orders;
-- 2. How many unique customer orders were made?
select count(distinct order_id) as unique_customer_orders
from customer_orders;
-- 3. How many successful orders were delivered by each runner?
select count(*) -- count(cancellation) will give 0
from runner_orders
where cancellation is null;
-- 4. How many of each type of pizza was delivered?
select co.pizza_id, count(*) as total_delivered
from customer_orders co
join runner_orders ro
using (order_id)
where ro.cancellation is null
group by co.pizza_id;
-- 5. How many Vegetarian and Meatlovers were ordered by each customer?
select pn.pizza_name, count(*) as total_orders
from customer_orders co
join pizza_names pn
using (pizza_id)
group by pn.pizza_name;
-- 6. What was the maximum number of pizzas delivered in a single order?
select co.order_id, count(*) as pizzas_delivered
from customer_orders co
join runner_orders ro
using (order_id)
where ro.cancellation is null
group by co.order_id
order by pizzas_delivered desc
limit 1;
-- 7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?
with temp as
(select *
from customer_orders co
join runner_orders ro
using (order_id)
where ro.cancellation is null ),
x as
(select distinct customer_id, count(*) as pizzas_with_atleast_1_change
from temp
where exclusions is not null or extras is not null
group by customer_id),
y as
(select distinct customer_id, count(*) as pizzas_with_no_change
from temp
where not (exclusions is not null or extras is not null)
group by customer_id)
-- select * from x right join y using(customer_id);
-- /**
select * from
( select customer_id, pizzas_with_atleast_1_change, pizzas_with_no_change from x left join y using(customer_id) -- specify columns order, don't use *.
union
select customer_id, pizzas_with_atleast_1_change, pizzas_with_no_change from x right join y using(customer_id) -- specify columns order. If we use *, MySQl gives different order of columns from above left join query, and hence, union will be wrong.
) z
order by customer_id;
-- **/
-- 8. How many pizzas were delivered that had both exclusions and extras?
with temp as
(select *
from customer_orders co
join runner_orders ro
using (order_id)
where ro.cancellation is null )
select count(*) as pizzas_with_exclusion_and_extras_both from temp
where exclusions is not null and extras is not null
;
-- 9. What was the total volume of pizzas ordered for each hour of the day?
select extract(hour from order_time) as hour_of_day, count(pizza_id) as orders_volume
from customer_orders
group by hour_of_day
order by hour_of_day
;
-- 10. What was the volume of orders for each day of the week?
select dayofweek(order_time) as day_of_week, count(pizza_id) as orders_volume
from customer_orders
group by day_of_week
order by day_of_week
;
-- B. Runner and Customer Experience
-- 1. How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)
-- 2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?
with timing as
(select *, timestampdiff(minute, order_time, pickup_time) as arrival_time_mins
from (
(select distinct order_id, order_time from customer_orders) co
join
(select distinct order_id, runner_id, pickup_time from runner_orders where pickup_time is not null) ro
using(order_id)
)
)
select runner_id, round(avg(arrival_time_mins),2) as avg_arrival_time_mins
from timing
group by runner_id
;
-- 3. Is there any relationship between the number of pizzas and how long the order takes to prepare?
with temp as
(select *, timestampdiff(minute, order_time, pickup_time) as order_to_pickup_time_mins
from customer_orders co
join
runner_orders ro
using(order_id)
where pickup_time is not null)
select order_id, count(1)as npizzas_ordered, round(avg(order_to_pickup_time_mins),0) as order_to_pickup_time_mins
from temp
group by order_id
order by npizzas_ordered desc
;
-- It can be observed that in general, as the number of pizzas ordered is increasing, so the time to prepare them. looks like there is some relation.
-- 4. What was the average distance travelled for each customer?
with temp as
( select * from customer_orders co
join
runner_orders ro
using(order_id)
where ro.duration is not null )
select customer_id, round(avg(distance), 2) as avg_distance_travelled_by_runner_km
from temp
group by customer_id
order by avg_distance_travelled_by_runner_km desc;
-- 5. What was the difference between the longest and shortest delivery times for all orders?
select max(duration) - min(duration) as longest_minus_shortest_delivery from runner_orders;
/** -- another apporach
with
lt as
(select max(duration) as longest_time from runner_orders),
st as
(select min(duration) as shortest_time from runner_orders)
select lt.longest_time - st.shortest_time as longest_minus_shortest_delivery
from lt, st;
**/
-- 6. What was the average speed for each runner for each delivery and do you notice any trend for these values?
select *, round(distance/duration,2) -- runner_id, round( avg(distance/duration), 2) as avg_speed
from runner_orders
where duration is not null
order by duration desc
-- group by runner_id
;
-- 7. What is the successful delivery percentage for each runner?
with temp as
( select * from customer_orders co
join
runner_orders ro
using(order_id)
-- where ro.duration is not null
)
select * from temp;
-- C. Ingredient Optimisation
-- 1. What are the standard ingredients for each pizza?
select *
from pizza_recipes
;
-- 2. What was the most commonly added extra?
select extras from customer_orders;
-- 3. What was the most common exclusion?
select exclusions from customer_orders;
/* 4. Generate an order item for each record in the customers_orders table in the format of one of the following:
Meat Lovers
Meat Lovers - Exclude Beef
Meat Lovers - Extra Bacon
Meat Lovers - Exclude Cheese, Bacon - Extra Mushroom, Peppers */
/* 5. Generate an alphabetically ordered comma separated ingredient list for each pizza order from the customer_orders table and add a 2x in front of any relevant ingredients
For example: "Meat Lovers: 2xBacon, Beef, ... , Salami" */
-- 6. What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first?
-- D. Pricing and Ratings
/* 1. If a Meat Lovers pizza costs $12 and Vegetarian costs $10 and there were no charges for changes
- how much money has Pizza Runner made so far if there are no delivery fees?*/
with temp as
(select pizza_id,
case when co.pizza_id=1 then 10
else 12
end as pizza_amount
from customer_orders co
join
runner_orders ro
using(order_id)
where ro.cancellation is null
)
select sum(pizza_amount) as money_made
from temp;
/* 2. What if there was an additional $1 charge for any pizza extras?
Add cheese is $1 extra */
-- case1: any number of extras cost 1$ only
with temp as
(select pizza_id, extras,
case when co.pizza_id=1 and extras is null then 10
when co.pizza_id=1 and extras is not null then 11
when co.pizza_id=2 and extras is null then 12
else 13
end as pizza_amount
from customer_orders co
join
runner_orders ro
using(order_id)
where ro.cancellation is null
)
select sum(pizza_amount) as money_made
from temp;
-- case 2: each extra adds 1$ to amount.
/* 3. The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset
- generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5. */
/* 4. Using your newly generated table - can you join all of the information together to form a table which has the following information for successful deliveries?
customer_id
order_id
runner_id
rating
order_time
pickup_time
Time between order and pickup
Delivery duration
Average speed
Total number of pizzas */
/* 5. If a Meat Lovers pizza was $12 and Vegetarian $10 fixed prices with no cost for extras and each runner is paid $0.30 per kilometre traveled
- how much money does Pizza Runner have left over after these deliveries? */
-- E. Bonus Questions
/*If Danny wants to expand his range of pizzas - how would this impact the existing data design?
Write an INSERT statement to demonstrate what would happen if a new Supreme pizza with all the toppings was added to the Pizza Runner menu? */