前言 

数仓或BI开发中,都会涉及到指标开发阶段,针对指标的计算口径,我们通过Hive不同的函数及算法来解决统计问题,当然目前也有很多的BI工具可以通过明细数据进行统计指标,但考虑到BI工具负载过高、深入理解此指标的计算逻辑,我们有必要去通过Hive实践一下。

一、同环比、月度占比指标

(日/周/月/年)同比计算方式:(当期数/同期数-1)*100% (日/周/月/年)环比计算方式:(当期数/上期数-1)*100%

1.创建表

CREATE TABLE `saleorder`  (

  `order_id` int ,

  `order_time` date ,

  `order_num` int

) ;

2.创建数据集

INSERT INTO `saleorder` VALUES 

(1, '2020-04-20', 420),

(2, '2020-04-04', 800),

(3, '2020-03-28', 500),

(4, '2020-03-13', 100),

(5, '2020-02-27', 300),

(6, '2020-01-07', 450),

(7, '2019-04-07', 800),

(8, '2019-03-15', 1200),

(9, '2019-02-17', 200),

(10, '2019-02-07', 600),

(11, '2019-01-13', 300);

3.月度占比

-- 写法1:基本思路:用隐式内连接,外加嵌套找出分子分母,相除(最后要分组)

-- 友情提示: 时间处理的时候除了用date_formate()也可以用substr()函数来截取年月日格式

set hive.exec.mode.local.auto=true;

select

aa. order_month,

aa.order_sum,

nvl(round(aa.order_sum/bb.order_total,4)*100,0) rate

from

(

select

substr(order_time,1,7) as order_month,

sum(order_num) order_sum

from `saleorder`

group by substr(order_time,1,7)

) aa -- 月度

left join (

select

substr(order_time,1,4) as order_year,

sum(order_num) order_total

from `saleorder`

group by substr(order_time,1,4)

) bb -- 年度

on substr(aa.order_month,1,4) = bb.order_year

order by aa.order_month

;

--写法2:开窗函数

set hive.exec.mode.local.auto=true;

set hive.fetch.task.conversion=more;

select

*,nvl(round(aa.order_sum/aa.order_total,4)*100,0) rate

from

(

select

substr(order_time,1,7) as order_month,

sum(order_num) over(partition by substr(order_time,1,7)) as order_sum,

sum(order_num) over(partition by substr(order_time,1,4)) as order_total,

row_number() over(partition by substr(order_time,1,7)) as rn

from `saleorder`

) aa

where aa.rn = 1

order by order_month

;

4、环比同比

-- 写法1:lag lead

set hive.exec.mode.local.auto=true;

set hive.fetch.task.conversion=more;

select

order_month,

order_sum,

nvl(last_order_sum,0) as last_order_sum,

round(if(last_order_sum is not null,order_sum/last_order_sum-1,0),4)*100 rate

from

(

select

order_month,

order_sum,

lag(order_sum,1) over(order by order_month) last_order_sum

from

(

select

substr(order_time,1,7) as order_month,

sum(order_num) order_sum

from `saleorder`

group by substr(order_time,1,7)

) aa

) bb

;

-- 备注:写法1有bug,如果时间不连续,则会跳到有记录的上一条。

-- 写法2:date_add方式,这里分别举例了环比和同比

select

aa.order_month,

aa.order_sum,

bb.last_order_sum as h_last_order_sum,

cc.last_order_sum as t_last_order_sum,

round(aa.order_sum/bb.last_order_sum-1,4) h_rate,

round(aa.order_sum/cc.last_order_sum-1,4) t_rate

from

(

select

substr(order_time,1,7) as order_month,

sum(order_num) order_sum

from `saleorder`

group by substr(order_time,1,7)

) aa

left join (

select

substr(add_months(order_time,1),1,7) as order_month,

sum(order_num) last_order_sum

from `saleorder`

group by substr(add_months(order_time,1),1,7)

) bb -- 环比

on aa.order_month = bb.order_month

left join (

select

substr(add_months(order_time,12),1,7) as order_month,

sum(order_num) last_order_sum

from `saleorder`

group by substr(add_months(order_time,12),1,7)

) cc

on aa.order_month = cc.order_month

;

参考链接:hive SQL实现占比、同比、环比计算(lag函数,lead函数)_雾岛与鲸的博客-CSDN博客_sql占比函数

好文推荐

评论可见,请评论后查看内容,谢谢!!!评论后请刷新页面。