Aggregation is one of those capabilities that really make you appreciate the power of relational database systems. It allows you to move beyond merely persisting your data, into the realm of asking truly interesting questions that can be used to inform decision making. This category covers aggregation at length, making use of standard grouping as well as more recent window functions. If you struggle with these questions, I strongly recommend Learning SQL, by Alan Beaulieu and SQL Cookbook by Anthony Molinaro. In fact, get the latter anyway - it'll take you beyond anything you find on this site, and on multiple different database systems to boot.
For our first foray into aggregates, we're going to stick to something simple. We want to know how many facilities exist - simply produce a total count.
select count(*) as count from cd.facilities;Produce a count of the number of facilities that have a cost to guests of 10 or more.
select count(*)
from cd.facilities
where guestcost >= 10;Produce a count of the number of recommendations each member has made. Order by member ID.
select recommendedby, count(*)
from cd.members
where recommendedby is not null
group by recommendedby
order by recommendedby;
--or--
select recommendedby, count(*)
from cd.members
where not recommendedby is null
group by recommendedby
order by recommendedby;Produce a list of the total number of slots booked per facility. For now, just produce an output table consisting of facility id and slots, sorted by facility id.
select facid, sum(slots) as "TotalSlot"
from cd.bookings
group by facid
order by facid;Produce a list of the total number of slots booked per facility in the month of September 2012. Produce an output table consisting of facility id and slots, sorted by the number of slots.
select facid, sum(slots) as "Total Slots"
from cd.bookings
where starttime between '2012-09-01' and '2012-10-1'
group by facid -- group by必须放在被过滤后的表格后面
order by "Total Slots";Produce a list of the total number of slots booked per facility per month in the year of 2012. Produce an output table consisting of facility id and slots, sorted by the id and month.
-- extract(month from starttime) --
select facid, extract(month from starttime) as month, sum(slots) as "Total Slots"
from cd.bookings
where starttime between '2012-01-01' and '2013-01-01'
group by facid, month
order by facid, month;解析
这里因为要select facid, 所以用group by分组的时候也要用facid分组,否则会报错,原因是: group by就是将一个数据集划分为若干个小区域, 然后对若干个小区域进行数据处理; select中出现的字段必须要么是在group by中作为分组的依据,要么被包含在聚集函数内部
Find the total number of members who have made at least one booking.
select count(distinct memid)
from cd.bookings
where memid is not null;Produce a list of facilities with more than 1000 slots booked. Produce an output table consisting of facility id and hours, sorted by facility id.
select facid, sum(slots) as "Total Slots"
from cd.bookings
group by facid
having sum(slots) > 1000
order by facid;解析 这里使用having而不能使用where是因为:
Where 是一个约束声明,使用Where约束来自数据库的数据,Where是在结果返回之前起作用的,Where中不能使用聚合函数。 Having是一个过滤声明,是在查询返回结果集以后对查询结果进行的过滤操作,在Having中可以使用聚合函数。 在查询过程中聚合语句(sum,min,max,avg,count)要比having子句优先执行。而where子句在查询过程中执行优先级高于聚合语句。
Produce a list of facilities along with their total revenue. The output table should consist of facility name and revenue, sorted by revenue. Remember that there's a different cost for guests and members!
select f.name,
sum(b.slots * case
when b.memid = 0 then f.guestcost
else f.membercost
end) as revenue
from cd.facilities as f
inner join cd.bookings as b
on f.facid = b.facid
group by f.name
order by revenue;sql语句的执行顺序
不了解这个那么在写一个sql语句时,就不能有一个清晰的逻辑. 就不能理解为什么having能够执行聚集函数而where不行,不能理解为什么若有group by的话,为什么select中的列必须是group by指定的,或者是在聚集函数中的 SQL SELECT语句的执行顺序: from子句组装来自不同数据源的数据; where子句基于指定的条件对记录行进行筛选; group by子句将数据划分为多个分组; 使用聚集函数进行计算; 使用having子句筛选分组; 计算所有的表达式; 使用order by对结果集进行排序; select 集合输出。
Produce a list of facilities with a total revenue less than 1000. Produce an output table consisting of facility name and revenue, sorted by revenue. Remember that there's a different cost for guests and members!
select name, revenue
from(select f.name as name,
sum(b.slots*case
when b.memid = 0 then f.guestcost
else f.membercost
end) as revenue
from cd.facilities as f
inner join cd.bookings as b
on f.facid = b.facid
group by f.name
) as revenues
where revenue < 1000
order by revenue;Output the facility id that has the highest number of slots booked. For bonus points, try a version without a LIMIT clause. This version will probably look messy!
select facid, sum(slots) as "Total Slots"
from cd.bookings
group by facid
order by "Total Slots"
limit 1;
--使用通用表达式书写--
with sum as (select facid, sum(slots) as totalslots
from cd.bookings
group by facid
)
select facid, totalslots
from sum
where totalslots = (select max(totalslots) from sum);Produce a list of the total number of slots booked per facility per month in the year of 2012. In this version, include output rows containing totals for all months per facility, and a total for all months for all facilities. The output table should consist of facility id, month and slots, sorted by the id and month. When calculating the aggregated values for all months and all facids, return null values in the month and facid columns.
--原始的写法
with bookings as (select facid, extract(month from starttime) as month, slots
from cd.bookings
where starttime between '2012-01-01' and '2013-01-01'
)
select facid, month, sum(slots) as slots
from bookings
group by facid, month
union
select facid, null as month, sum(slots)
from bookings
group by facid
union
select null as facid, null as month, sum(slots)
from bookings
order by facid, month
-- 使用rollup函数简化
select facid, extract(month from starttime) as month, sum(slots) as slots
from cd.bookings
where
starttime >= '2012-01-01'
and starttime < '2013-01-01'
group by rollup(facid, month)
order by facid, month; Produce a list of the total number of hours booked per facility, remembering that a slot lasts half an hour. The output table should consist of the facility id, name, and hours booked, sorted by facility id. Try formatting the hours to two decimal places.
--使用格式化函数格式化浮点型数据
--pg里没有直接保留几位有效数字的功能,只能通过转换为字符串进行格式化)
select b.facid, f.name, trim(to_char(sum(b.slots*0.5), '999D99')) as "Total Hours"
from cd.bookings as b
inner join cd.facilities as f
on b.facid = f.facid
group by b.facid, f.name
order by b.facidProduce a list of each member name, id, and their first booking after September 1st 2012. Order by member ID.
select m.surname, m.firstname, m.memid, min(b.starttime)
from cd.members as m
inner join cd.bookings as b
on m.memid = b.memid
where b.starttime > '2012-09-01'
group by m.surname, m.firstname, m.memid
order by m.memid;Produce a list of member names, with each row containing the total member count. Order by join date.
--普通模式
select (select count(*) from cd.members) as count,
firstname, surname
from cd.members
order by joindate;
--使用窗口函数贱货操作
select count(*) over(), firstname, surname
from cd.members
order by joindateProduce a monotonically increasing numbered list of members, ordered by their date of joining. Remember that member IDs are not guaranteed to be sequential.
select row_number() over(order by joindate), firstname, surname
from cd.members
order by joindate解析
row_number用于为窗口函数排序后的集合添加序号,序号是递增不重复的 rank负责为窗口函数排序后的集合添加排名,不同于row_num,rank需要考虑被排序的那一栏是否相等
Output the facility id that has the highest number of slots booked. Ensure that in the event of a tie, all tieing results get output.
--不建议
select facid, sum(slots) as total
from cd.bookings
order by total
limit 1;
--使用ranke函数
select facid, total
from (
select facid, sum(slots) as total, rank() over(order by sum(slots) desc) as rank
from cd.bookings
group by facid
) as bookings
where rank = 1;Produce a list of members, along with the number of hours they've booked in facilities, rounded to the nearest ten hours. Rank them by this rounded figure, producing output of first name, surname, rounded hours, rank. Sort by rank, surname, and first name.
select m.firstname,
m.surname,
round(sum(b.slots)/2.0/10)*10 as hours,
rank() over(order by round(sum(b.slots)/2.0/10)*10 desc) as rank
from cd.members as m
inner join cd.bookings as b
on m.memid = b.memid
group by m.firstname, m.surname
order by rank, surname, firstnameProduce a list of the top three revenue generating facilities (including ties). Output facility name and rank, sorted by rank and facility name.
with revenues as (select f.name,
sum(case
when b.memid = 0 then
b.slots*f.guestcost
else
b.slots*f.membercost
end) as revenue
from cd.facilities as f
inner join cd.bookings as b
on f.facid = b.facid
group by f.name)
select name, rank
from(select name, rank() over(order by revenue desc) as rank
from revenues) as ranks
where rank in (1, 2, 3)
order by rank, nameClassify facilities into equally sized groups of high, average, and low based on their revenue. Order by classification and facility name.
select name, case when class=1 then 'high'
when class=2 then 'average'
else 'low'
end revenue
from (
select facs.name as name, ntile(3) over (order by sum(case
when memid = 0 then slots * facs.guestcost
else slots * membercost
end) desc) as class
from cd.bookings bks
inner join cd.facilities facs
on bks.facid = facs.facid
group by facs.name
) as subq
order by class, name; 解析
使用ntile进行分类
Based on the 3 complete months of data so far, calculate the amount of time each facility will take to repay its cost of ownership. Remember to take into account ongoing monthly maintenance. Output facility name and payback time in months, order by facility name. Don't worry about differences in month lengths, we're only looking for a rough value here!
select facs.name as name,
facs.initialoutlay/((sum(case
when memid = 0 then slots * facs.guestcost
else slots * membercost
end)/3) - facs.monthlymaintenance) as months
from cd.bookings bks
inner join cd.facilities facs
on bks.facid = facs.facid
group by facs.facid
order by name; For each day in August 2012, calculate a rolling average of total revenue over the previous 15 days. Output should contain date and revenue columns, sorted by the date. Remember to account for the possibility of a day having zero revenue. This one's a bit tough, so don't be afraid to check out the hint!
--这个答案不是我自己想的, 我只是理解了参考答案的思路
select dategen.date,
(
-- correlated subquery that, for each day fed into it,
-- finds the average revenue for the last 15 days
select sum(case
when memid = 0 then slots * facs.guestcost
else slots * membercost
end) as rev
from cd.bookings bks
inner join cd.facilities facs
on bks.facid = facs.facid
where bks.starttime > dategen.date - interval '14 days'
and bks.starttime < dategen.date + interval '1 day'
)/15 as revenue
from
(
-- generates a list of days in august
select cast(generate_series(timestamp '2012-08-01',
'2012-08-31','1 day') as date) as date
) as dategen
order by dategen.date; 




















