Elasticsearch基础(十九)——聚合分析:histogram区间分组
我们在上一章讲到了Elasticsearch在聚合分析中,可以使用“term”对指定field按照数量分组。本章,我们将介绍另一种区间分组的方式——histogram。
一、简介
1.1 histogram
histogram,它会接收一个field,然后按照这个field值的各个范围区间,进行bucket分组操作:
GET /tvs/_search
{
"size" : 0,
"aggs":{
"price":{
"histogram":{
"field": "price",
"interval": 2000
}
}
}
}
上述请求中,我们对“price”字段进行区间分组,区间间隔为2000,返回结果如下:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"price" : {
"buckets" : [
{
"key" : 0.0,
"doc_count" : 3
},
{
"key" : 2000.0,
"doc_count" : 4
},
{
"key" : 4000.0,
"doc_count" : 0
},
{
"key" : 6000.0,
"doc_count" : 0
},
{
"key" : 8000.0,
"doc_count" : 1
}
]
}
}
}
按照区间分组之后,我们就可以对各个bucket执行metric操作了,比如计算总和:
GET /tvs/_search
{
"size" : 0,
"aggs":{
"price":{
"histogram":{
"field": "price",
"interval": 2000
},
"aggs":{
"revenue": {
"sum": {
"field" : "price"
}
}
}
}
}
}
1.2 date_histogram
如果我们希望的按区间分组的字段是date类型的,那么需要用到date_histogram
关键字。比如:
GET /tvs/_search
{
"size" : 0,
"aggs": {
"sales": {
"date_histogram": {
"field": "sold_date",
"interval": "month",
"format": "yyyy-MM-dd",
"min_doc_count" : 0,
"extended_bounds" : {
"min" : "2016-01-01",
"max" : "2017-12-31"
}
}
}
}
}
解释下上述几个关键参数:
- min_doc_count:某个日期区间内的doc数量至少要等于这个参数,这个区间才会返回;
- extended_bounds:划分bucket的时候,会限定在这个起始日期和截止日期内。
二、实战
2.1 date_histogram
假设我们现在的需求是:统计每季度每个品牌的电视销售额,那么可以这样构造请求:
GET /tvs/_search
{
"size": 0,
"aggs": {
"group_by_sold_date": {
"date_histogram": {
"field": "sold_date",
"interval": "quarter",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds": {
"min": "2016-01-01",
"max": "2017-12-31"
}
},
"aggs": {
"total_sum_price": {
"sum": {
"field": "price"
}
},
"group_by_brand": {
"terms": {
"field": "brand"
},
"aggs": {
"sum_price": {
"sum": {
"field": "price"
}
}
}
}
}
}
}
}
上述请求其实就是先按日期进行分组,然后下钻到组内再按照品牌分组,最后对每个子组执行求和metric操作,结果如下:
{
"took" : 97,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"group_by_sold_date" : {
"buckets" : [
{
"key_as_string" : "2016-01-01",
"key" : 1451606400000,
"doc_count" : 0,
"total_sum_price" : {
"value" : 0.0
},
"group_by_brand" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
},
{
"key_as_string" : "2016-04-01",
"key" : 1459468800000,
"doc_count" : 1,
"total_sum_price" : {
"value" : 3000.0
},
"group_by_brand" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "小米",
"doc_count" : 1,
"sum_price" : {
"value" : 3000.0
}
}
]
}
},
{
"key_as_string" : "2016-07-01",
"key" : 1467331200000,
"doc_count" : 2,
"total_sum_price" : {
"value" : 2700.0
},
"group_by_brand" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "TCL",
"doc_count" : 2,
"sum_price" : {
"value" : 2700.0
}
}
]
}
},
{
"key_as_string" : "2016-10-01",
"key" : 1475280000000,
"doc_count" : 3,
"total_sum_price" : {
"value" : 5000.0
},
"group_by_brand" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "长虹",
"doc_count" : 3,
"sum_price" : {
"value" : 5000.0
}
}
]
}
},
{
"key_as_string" : "2017-01-01",
"key" : 1483228800000,
"doc_count" : 2,
"total_sum_price" : {
"value" : 10500.0
},
"group_by_brand" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "三星",
"doc_count" : 1,
"sum_price" : {
"value" : 8000.0
}
},
{
"key" : "小米",
"doc_count" : 1,
"sum_price" : {
"value" : 2500.0
}
}
]
}
},
{
"key_as_string" : "2017-04-01",
"key" : 1491004800000,
"doc_count" : 0,
"total_sum_price" : {
"value" : 0.0
},
"group_by_brand" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
},
{
"key_as_string" : "2017-07-01",
"key" : 1498867200000,
"doc_count" : 0,
"total_sum_price" : {
"value" : 0.0
},
"group_by_brand" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
},
{
"key_as_string" : "2017-10-01",
"key" : 1506816000000,
"doc_count" : 0,
"total_sum_price" : {
"value" : 0.0
},
"group_by_brand" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
]
}
}
}
三、总结
本章,我介绍了聚合分析中的区间分组,Elasticsearch采用histogram
关键字来完成对指定字段值的区间分组,如果我们想要分组的字段类型为日期,则需要使用date_histogram
关键字。
正文到此结束
感谢赞赏~
- 本文标签: Elasticsearch
- 版权声明: 本站原创文章,于2019年05月10日由ressmix发布,作者保留所有权利,未经作者允许,禁止转载和演绎!