-
Notifications
You must be signed in to change notification settings - Fork 113
Expand file tree
/
Copy path9_to_5_sql_project.sql
More file actions
337 lines (254 loc) · 10.2 KB
/
9_to_5_sql_project.sql
File metadata and controls
337 lines (254 loc) · 10.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
--1. Use this space to make note of each table in the database, the columns within each table, each column’s data type, and how the tables are connected. You can write this down or draw a diagram. Whatever method helps you get an
--understanding of what is going on with `LaborStatisticsDB`.
--Answer-
--explore the LaborStatisticsDB database
--Step 1: List all tables in the database
SELECT TABLE_NAME
FROM LaborStatisticsDB.INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE';
--9 rows affected
--Step 2: List columns and data types for each table
SELECT COLUMN_NAME, DATA_TYPE
FROM LaborStatisticsDB.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'annual_2016';
SELECT COLUMN_NAME, DATA_TYPE
FROM LaborStatisticsDB.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'datatype';
SELECT COLUMN_NAME, DATA_TYPE
FROM LaborStatisticsDB.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'footnote';
SELECT COLUMN_NAME, DATA_TYPE
FROM LaborStatisticsDB.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'industry';
SELECT COLUMN_NAME, DATA_TYPE
FROM LaborStatisticsDB.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'january_2017';
SELECT COLUMN_NAME, DATA_TYPE
FROM LaborStatisticsDB.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'period';
SELECT COLUMN_NAME, DATA_TYPE
FROM LaborStatisticsDB.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'seasonal';
SELECT COLUMN_NAME, DATA_TYPE
FROM LaborStatisticsDB.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'series';
SELECT COLUMN_NAME, DATA_TYPE
FROM LaborStatisticsDB.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'supersector';
--Step 3: Identify relationships between tables
--series_id appears in both series and emplyoement table , it’s likely a relationship.
--2. What is the datatype for women employees?
SELECT TOP 10 *
FROM LaborStatisticsDB.dbo.datatype;
SELECT COLUMN_NAME, DATA_TYPE
FROM LaborStatisticsDB.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'datatype'
AND COLUMN_NAME LIKE '%women%';
SELECT *
FROM LaborStatisticsDB.dbo.datatype;
SELECT TOP 10 series_id, data_type_code
FROM LaborStatisticsDB.dbo.series;
SELECT TOP 20 *
FROM LaborStatisticsDB.dbo.series
WHERE data_type_code = '10';
--Answer -The data type for Women employees is 10.
--3. What is the series id for women employees in the commercial banking industry in the
--financial activities supersector?
SELECT TOP 10
s.series_id,
s.data_type_code,
i.industry_code,
i.industry_name
FROM LaborStatisticsDB.dbo.series s
JOIN LaborStatisticsDB.dbo.industry i
ON s.industry_code = i.industry_code
WHERE s.data_type_code = '10'
AND i.industry_name LIKE '%Commercial Banking%';
--Answer- 2 rows affected
--## Aggregate Your Friends and Code some SQL
--1. How many employees were reported in 2016 in all industries? Round to the nearest whole number.
SELECT TOP 10 *
FROM LaborStatisticsDB.dbo.series
SELECT
ROUND(SUM(value), 0) AS total_employees_2016
FROM LaborStatisticsDB.dbo.annual_2016
WHERE year = 2016;
--Answer- 2351408916
--2. How many women employees were reported in 2016 in all industries?
--Round to the nearest whole number.
SELECT
ROUND(SUM(value), 0) AS total_women_employees_2016
FROM LaborStatisticsDB.dbo.annual_2016
WHERE series_id = 'CES0000000010';
--Answer- NULL
--3. How many production/nonsupervisory employees were reported in 2016?
-- Round to the nearest whole number.
SELECT TOP 10 *
FROM LaborStatisticsDB.dbo.series
where series_title = 'Production and nonsupervisory employees '
SELECT
ROUND(SUM(value), 0) AS Production_nonsupervisory_employees
FROM LaborStatisticsDB.dbo.annual_2016
WHERE series_id = 'CES0600000006';
--Answer- NULL
--4. In January 2017, what is the average weekly hours worked by production and nonsupervisory
--employees across all industries?
SELECT
ROUND(AVG(value), 2) AS avg_weekly_hours_jan2017
FROM LaborStatisticsDB.dbo.january_2017
WHERE series_id = 'CES0600000006'
AND month = 1;
SELECT *
FROM LaborStatisticsDB.dbo.[period]
WHERE [YEAR]='2017'
GROUP BY january_2017.[period]
SELECT
ROUND(AVG(value), 2) AS avg_weekly_hours_jan2017
FROM LaborStatisticsDB.dbo.january_2017
WHERE series_id = 'CES0600000007' -- avg weekly hours series_id
GROUP BY [period]
--answer- 41.1
--5. What is the total weekly payroll for production and nonsupervisory employees across all
-- industries in January 2017? Round to the nearest penny.
SELECT TOP 10 *
FROM LaborStatisticsDB.dbo.series
WITH employees AS (
SELECT SUM(value) AS total_employees
FROM LaborStatisticsDB.dbo.january_2017
WHERE series_id = 'CES0500000006' AND period = 'M01'
),
avg_hours AS (
SELECT AVG(value) AS avg_weekly_hours
FROM LaborStatisticsDB.dbo.january_2017
WHERE series_id = 'CES0500000007' AND period = 'M01'
),
avg_earnings AS (
SELECT AVG(value) AS avg_hourly_earnings
FROM LaborStatisticsDB.dbo.january_2017
WHERE series_id = 'CES0500000008' AND period = 'M01'
)
SELECT
ROUND(
(SELECT total_employees FROM employees) *
(SELECT avg_weekly_hours FROM avg_hours) *
(SELECT avg_hourly_earnings FROM avg_earnings),
2) AS total_weekly_payroll_jan2017;
--Answer- 148996351.39
--6. In January 2017, for which industry was the average weekly hours worked by production and nonsupervisory employees the highest?
--Which industry was the lowest?
SELECT TOP 1
i.industry_name,
j.value AS avg_weekly_hours
FROM LaborStatisticsDB.dbo.january_2017 j
JOIN LaborStatisticsDB.dbo.series s
ON j.series_id = s.series_id
JOIN LaborStatisticsDB.dbo.industry i
ON s.industry_code = i.industry_code
WHERE j.series_id = 'CES0500000007'
AND j.period = 'M01'
ORDER BY j.value DESC;
--lowest
SELECT TOP 1
i.industry_name,
j.value AS avg_weekly_hours
FROM LaborStatisticsDB.dbo.january_2017 j
JOIN LaborStatisticsDB.dbo.series s
ON j.series_id = s.series_id
JOIN LaborStatisticsDB.dbo.industry i
ON s.industry_code = i.industry_code
WHERE j.series_id = 'CES0500000007'
AND j.period = 'M01'
ORDER BY j.value ASC;
--Answer- Total private = 33.6
--## Join in on the Fun
--Time to start joining! You can choose the type of join you use, just make sure to make a note!
--1. Join `annual_2016` with `series` on `series_id`. We only want the data in the `annual_2016`
--table to be included in the result.
SELECT
a.*,
s.series_title,
s.industry_code,
s.data_type_code
FROM LaborStatisticsDB.dbo.annual_2016 AS a
LEFT JOIN LaborStatisticsDB.dbo.series AS s
ON a.series_id = s.series_id;
--Answer- 29042 rows affected
SELECT TOP 50
a.*,
s.series_title,
s.industry_code,
s.data_type_code
FROM LaborStatisticsDB.dbo.annual_2016 AS a
LEFT JOIN LaborStatisticsDB.dbo.series AS s
ON a.series_id = s.series_id
ORDER BY a.id;
--Answer- 50 rows affected
--2. Join `series` and `datatype` on `data_type_code`.
-- Limiting rows returned from query, uncomment the line below to start on your query!
-- SELECT TOP 50 *
-- Uncomment the line below when you are ready to run the query, leaving it as your last!
-- ORDER BY id
SELECT TOP 50
s.series_id,
s.series_title,
s.data_type_code,
d.data_type_text
FROM LaborStatisticsDB.dbo.series AS s
LEFT JOIN LaborStatisticsDB.dbo.datatype AS d
ON s.data_type_code = d.data_type_code
ORDER BY s.series_id;
--Answer- 50 rows affected
--3. Join `series` and `industry` on `industry_code`.
-- Limiting rows returned from query, uncomment the line below to start on your query!
-- SELECT TOP 50 *
-- Uncomment the line below when you are ready to run the query, leaving it as your last!
-- ORDER BY id
SELECT TOP 50
s.series_id,
s.series_title,
s.industry_code,
i.industry_name
FROM LaborStatisticsDB.dbo.series AS s
LEFT JOIN LaborStatisticsDB.dbo.industry AS i
ON s.industry_code = i.industry_code
ORDER BY s.series_id;
--Answer- 50 rows affected
--## Subqueries, Unions, Derived Tables, Oh My!
--1. Write a query that returns the `series_id`, `industry_code`, `industry_name`, and `value` from the `january_2017` table but only if that value is greater than the average
--value for `annual_2016` of `data_type_code` 82.
SELECT
j.series_id,
s.industry_code,
i.industry_name,
j.value
FROM LaborStatisticsDB.dbo.january_2017 AS j
LEFT JOIN LaborStatisticsDB.dbo.series AS s
ON j.series_id = s.series_id
LEFT JOIN LaborStatisticsDB.dbo.industry AS i
ON s.industry_code = i.industry_code
WHERE j.value > (
SELECT AVG(a.value)
FROM LaborStatisticsDB.dbo.annual_2016 AS a
LEFT JOIN LaborStatisticsDB.dbo.series AS s2
ON a.series_id = s2.series_id
WHERE s2.data_type_code = '82'
);
--Answer- 754 rows affected
--1. During which time period did production and nonsupervisory employees fare better?
SELECT [value], [period]
FROM LaborStatisticsDB.dbo.january_2017
WHERE series_id IN (select series_id from LaborStatisticsDB.dbo.series where series_title like '%production and nonsupervisory employees%')
AND [value] = (SELECT MAX([value]) from LaborStatisticsDB.dbo.january_2017)
SELECT value, period
FROM LaborStatisticsDB.dbo.january_2017
HAVING [value] IN (SELECT MAX([value]) where series_id=(Select series_id where series_title like '%production and nonsupervisory employees%'))
Select TOP 10*
FROM LaborStatisticsDB.dbo.series
SELECT top 10*
from LaborStatisticsDB.dbo.[period]
--3. Now that you have explored the datasets, is there any data or information that you wish you had in this analysis?
--Answer-Yes. While the data was helpful, a few more things would have made the analysis better:
--Information by age, gender, or location This would help us see if all groups of workers benefited the same way.
--Cost of living or inflation Just knowing wages is not enough. It would be helpful to know if wages actually kept up with rising prices.
--Benefits or overtime pay This would show the full income employees are getting, not just their regular pay.
--Unemployment or job loss data This would help us understand how stable jobs were during that time.