Skip to content

Latest commit

 

History

History
417 lines (349 loc) · 16.2 KB

File metadata and controls

417 lines (349 loc) · 16.2 KB

Intro

Aggregation is one of those capabilities that really make you appreciate the power of relational database systems. It allows you to move beyond merely persisting your data, into the realm of asking truly interesting questions that can be used to inform decision making. This category covers aggregation at length, making use of standard grouping as well as more recent window functions. If you struggle with these questions, I strongly recommend Learning SQL, by Alan Beaulieu and SQL Cookbook by Anthony Molinaro. In fact, get the latter anyway - it'll take you beyond anything you find on this site, and on multiple different database systems to boot.

Q1: count the number of facilities

For our first foray into aggregates, we're going to stick to something simple. We want to know how many facilities exist - simply produce a total count. image

select count(*) as count from cd.facilities;

Q2: count the number of enpensive faciliteis

Produce a count of the number of facilities that have a cost to guests of 10 or more. image

select count(*)
	from cd.facilities
	where guestcost >= 10;

Q3: Count the number of recommendations each member makes.

Produce a count of the number of recommendations each member has made. Order by member ID. image

select recommendedby, count(*)
	from cd.members
	where recommendedby is not null
	group by recommendedby
order by recommendedby;

--or--
select recommendedby, count(*)
	from cd.members
	where not recommendedby is null
	group by recommendedby
order by recommendedby;

Q4: list the total slots booked per facility

Produce a list of the total number of slots booked per facility. For now, just produce an output table consisting of facility id and slots, sorted by facility id. image

select facid, sum(slots) as "TotalSlot"
	from cd.bookings
	group by facid
order by facid;

Q5: list the total slots booked per facility in a given month

Produce a list of the total number of slots booked per facility in the month of September 2012. Produce an output table consisting of facility id and slots, sorted by the number of slots. image

select facid, sum(slots) as "Total Slots"
	from cd.bookings
	where starttime between '2012-09-01'  and '2012-10-1'
	group by facid -- group by必须放在被过滤后的表格后面
order by "Total Slots";

Q6: list total slots booked per facility per month

Produce a list of the total number of slots booked per facility per month in the year of 2012. Produce an output table consisting of facility id and slots, sorted by the id and month. image

-- extract(month from starttime) -- 

select facid, extract(month from starttime) as month, sum(slots) as "Total Slots"
	from cd.bookings
	where starttime between '2012-01-01' and '2013-01-01'
	group by facid, month
order by facid, month;

解析

这里因为要select facid, 所以用group by分组的时候也要用facid分组,否则会报错,原因是: group by就是将一个数据集划分为若干个小区域, 然后对若干个小区域进行数据处理; select中出现的字段必须要么是在group by中作为分组的依据,要么被包含在聚集函数内部

Q7: find the count of members who have made at least one booking

Find the total number of members who have made at least one booking. image

select count(distinct memid)
	from cd.bookings
	where memid is not null;

Q8: lists facilities with more than 1000 slots booked

Produce a list of facilities with more than 1000 slots booked. Produce an output table consisting of facility id and hours, sorted by facility id. image

select facid, sum(slots) as "Total Slots"
	from cd.bookings
	group by facid
	having sum(slots) > 1000
order by facid;

解析 这里使用having而不能使用where是因为:

Where 是一个约束声明,使用Where约束来自数据库的数据,Where是在结果返回之前起作用的,Where中不能使用聚合函数。 Having是一个过滤声明,是在查询返回结果集以后对查询结果进行的过滤操作,在Having中可以使用聚合函数。 在查询过程中聚合语句(sum,min,max,avg,count)要比having子句优先执行。而where子句在查询过程中执行优先级高于聚合语句。

Q9: find the total revenue of each facility

Produce a list of facilities along with their total revenue. The output table should consist of facility name and revenue, sorted by revenue. Remember that there's a different cost for guests and members! image

select f.name,
	sum(b.slots * case
	   		when b.memid = 0 then f.guestcost
			else f.membercost
	   end) as revenue
	from cd.facilities as f
		inner join cd.bookings as b
			on f.facid = b.facid
	group by f.name
order by revenue;

sql语句的执行顺序

不了解这个那么在写一个sql语句时,就不能有一个清晰的逻辑. 就不能理解为什么having能够执行聚集函数而where不行,不能理解为什么若有group by的话,为什么select中的列必须是group by指定的,或者是在聚集函数中的 SQL SELECT语句的执行顺序: from子句组装来自不同数据源的数据; where子句基于指定的条件对记录行进行筛选; group by子句将数据划分为多个分组; 使用聚集函数进行计算; 使用having子句筛选分组; 计算所有的表达式; 使用order by对结果集进行排序; select 集合输出。

Q10: find a facility with total revenue less than 1000

Produce a list of facilities with a total revenue less than 1000. Produce an output table consisting of facility name and revenue, sorted by revenue. Remember that there's a different cost for guests and members! image

select name, revenue
	from(select f.name as name,
		 	sum(b.slots*case
			   	when b.memid = 0 then f.guestcost
				else f.membercost
			   end) as revenue
		 	from cd.facilities as f
		 	inner join cd.bookings as b
		 		on f.facid = b.facid
		 	group by f.name
		) as revenues
	where revenue < 1000
order by revenue;

Q11: outpu the facility with the hieghtest number of slots booked

Output the facility id that has the highest number of slots booked. For bonus points, try a version without a LIMIT clause. This version will probably look messy! image

select facid, sum(slots) as "Total Slots"
		  	from cd.bookings
		  	group by facid
order by "Total Slots" 
limit 1;

--使用通用表达式书写--
with sum as (select facid, sum(slots) as totalslots
	from cd.bookings
	group by facid
)
select facid, totalslots 
	from sum
	where totalslots = (select max(totalslots) from sum);

Q12: List the total slots booked per facility per month, part 2

Produce a list of the total number of slots booked per facility per month in the year of 2012. In this version, include output rows containing totals for all months per facility, and a total for all months for all facilities. The output table should consist of facility id, month and slots, sorted by the id and month. When calculating the aggregated values for all months and all facids, return null values in the month and facid columns. image

--原始的写法
with bookings as (select facid, extract(month from starttime) as month, slots
				 	from cd.bookings
				 	where starttime between '2012-01-01' and '2013-01-01'
				)
select facid, month, sum(slots) as slots
	from bookings
	group by facid, month
union
select facid, null as month, sum(slots)
	from bookings
	group by facid
union
select null as facid, null as month, sum(slots)
	from bookings
order by facid, month

-- 使用rollup函数简化
select facid, extract(month from starttime) as month, sum(slots) as slots
	from cd.bookings
	where
		starttime >= '2012-01-01'
		and starttime < '2013-01-01'
	group by rollup(facid, month)
order by facid, month;    

Q13: list the total hours booked per named facility

Produce a list of the total number of hours booked per facility, remembering that a slot lasts half an hour. The output table should consist of the facility id, name, and hours booked, sorted by facility id. Try formatting the hours to two decimal places. image

--使用格式化函数格式化浮点型数据
--pg里没有直接保留几位有效数字的功能,只能通过转换为字符串进行格式化)
select b.facid, f.name, trim(to_char(sum(b.slots*0.5), '999D99')) as "Total Hours"
	from cd.bookings as b
	inner join cd.facilities as f
		on b.facid = f.facid
	group by b.facid, f.name
order by b.facid

Q14: list each member's first booking after septmember 1st 2012

Produce a list of each member name, id, and their first booking after September 1st 2012. Order by member ID. image

select m.surname, m.firstname, m.memid, min(b.starttime)
	from cd.members as m
	inner join cd.bookings as b
		on m.memid = b.memid
	where b.starttime > '2012-09-01'
	group by m.surname, m.firstname, m.memid
order by m.memid;

Q15: produce a list of member name, with each row containing the total member count

Produce a list of member names, with each row containing the total member count. Order by join date. image

--普通模式
select (select count(*) from cd.members) as count,
	firstname, surname
	from cd.members
order by joindate;

--使用窗口函数贱货操作
select count(*) over(), firstname, surname
	from cd.members
order by joindate

Q16: produce a numberd list of members

Produce a monotonically increasing numbered list of members, ordered by their date of joining. Remember that member IDs are not guaranteed to be sequential. image

select row_number() over(order by joindate), firstname, surname
	from cd.members
order by joindate

解析

row_number用于为窗口函数排序后的集合添加序号,序号是递增不重复的 rank负责为窗口函数排序后的集合添加排名,不同于row_num,rank需要考虑被排序的那一栏是否相等

Q17: output the facility id that has highest number of slots booked

Output the facility id that has the highest number of slots booked. Ensure that in the event of a tie, all tieing results get output. image

--不建议
select facid, sum(slots) as total
	from cd.bookings
order by total
limit 1;

--使用ranke函数
select facid, total
	from (
		select facid, sum(slots) as total, rank() over(order by sum(slots) desc) as rank
	  		from cd.bookings
	  		group by facid
	) as bookings
	where rank = 1;

Q18: rank members by hours(rounded) used

Produce a list of members, along with the number of hours they've booked in facilities, rounded to the nearest ten hours. Rank them by this rounded figure, producing output of first name, surname, rounded hours, rank. Sort by rank, surname, and first name. image

select m.firstname,
	m.surname,
	round(sum(b.slots)/2.0/10)*10 as hours,
	rank() over(order by round(sum(b.slots)/2.0/10)*10 desc) as rank
	from cd.members as m
		inner join cd.bookings as b
			on m.memid = b.memid
	group by m.firstname, m.surname
order by rank, surname, firstname

Q19: find the top three revenue generating facilities

Produce a list of the top three revenue generating facilities (including ties). Output facility name and rank, sorted by rank and facility name. image

with revenues as (select f.name,
		sum(case
		when b.memid = 0 then
			b.slots*f.guestcost
		else
			b.slots*f.membercost
		end) as revenue
		from cd.facilities as f
		inner join cd.bookings as b
			on f.facid = b.facid
		group by f.name)
select name, rank
	from(select name, rank() over(order by revenue desc) as rank
		from revenues) as ranks
	where rank in (1, 2, 3)
order by rank, name

Q20: classify facilities by value

Classify facilities into equally sized groups of high, average, and low based on their revenue. Order by classification and facility name. image

select name, case when class=1 then 'high'
		when class=2 then 'average'
		else 'low'
		end revenue
	from (
		select facs.name as name, ntile(3) over (order by sum(case
				when memid = 0 then slots * facs.guestcost
				else slots * membercost
			end) desc) as class
		from cd.bookings bks
		inner join cd.facilities facs
			on bks.facid = facs.facid
		group by facs.name
	) as subq
order by class, name;   

解析

使用ntile进行分类

Q21: calculate the payback for time each facility

Based on the 3 complete months of data so far, calculate the amount of time each facility will take to repay its cost of ownership. Remember to take into account ongoing monthly maintenance. Output facility name and payback time in months, order by facility name. Don't worry about differences in month lengths, we're only looking for a rough value here! image

select 	facs.name as name,
	facs.initialoutlay/((sum(case
			when memid = 0 then slots * facs.guestcost
			else slots * membercost
		end)/3) - facs.monthlymaintenance) as months
	from cd.bookings bks
	inner join cd.facilities facs
		on bks.facid = facs.facid
	group by facs.facid
order by name; 

Q22: calculate a rolling average of total revenue

For each day in August 2012, calculate a rolling average of total revenue over the previous 15 days. Output should contain date and revenue columns, sorted by the date. Remember to account for the possibility of a day having zero revenue. This one's a bit tough, so don't be afraid to check out the hint! image

--这个答案不是我自己想的, 我只是理解了参考答案的思路
select 	dategen.date,
	(
		-- correlated subquery that, for each day fed into it,
		-- finds the average revenue for the last 15 days
		select sum(case
			when memid = 0 then slots * facs.guestcost
			else slots * membercost
		end) as rev

		from cd.bookings bks
		inner join cd.facilities facs
			on bks.facid = facs.facid
		where bks.starttime > dategen.date - interval '14 days'
			and bks.starttime < dategen.date + interval '1 day'
	)/15 as revenue
	from
	(
		-- generates a list of days in august
		select 	cast(generate_series(timestamp '2012-08-01',
			'2012-08-31','1 day') as date) as date
	)  as dategen
order by dategen.date;