原创

Elasticsearch基础(三)——快速上手案例

本章,我将通过引入一个实战案例,带领新手童鞋了解下Elasticsearch的基本使用,请先确保按照上一章的讲解搭建好了Elasticsearch开发环境。

一、案例背景

假设我们有一个电商网站,现在需要基于Elasticsearch构建一个商品管理后台系统。这个系统需要提供以下功能:

  1. 对商品信息进行CRUD(增删改查)操作;
  2. 执行简单的结构化查询;
  3. 执行简单的全文检索,以及复杂的phrase(短语)检索;
  4. 对于全文检索的结果,进行高亮显示;
  5. 对数据进行简单的聚合分析

我将基于Elasticsearch,带领新手童鞋实现上述功能,从而掌握ES的基本使用。

二、CRUD操作

我们先来进行商品的增删改查操作。

2.1 新增商品

新增商品数据,其实就是新增一条document,ES的语法如下:

#index表示索引名,type是类型名,id是document的唯一标识
PUT /{index}/{type}/{id}
{
    #document数据
}

我们新建三条document:

PUT /ecommerce/product/1
{
    "name" : "gaolujie yagao",
    "desc" :  "gaoxiao meibai",
    "price" :  30,
    "producer" :      "gaolujie producer",
    "tags": [ "meibai", "fangzhu" ]
}

PUT /ecommerce/product/2
{
    "name" : "jiajieshi yagao",
    "desc" :  "youxiao fangzhu",
    "price" :  25,
    "producer" :      "jiajieshi producer",
    "tags": [ "fangzhu" ]
}

PUT /ecommerce/product/3
{
    "name" : "zhonghua yagao",
    "desc" :  "caoben zhiwu",
    "price" :  40,
    "producer" :      "zhonghua producer",
    "tags": [ "qingxin" ]
}

Elasticsearch会自动建立Index和Type,同时,ES默认会对document的每个field都建立倒排索引,让其可以被搜索。

2.2 查询商品

查询商品数据,ES语法如下:

#index表示索引名,type是类型名,id是document的唯一标识
GET /{index}/{type}/{id}

比如,我们执行GET /ecommerce/product/1,返回如下信息:

{
  "_index" : "ecommerce",
  "_type" : "product",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "gaolujie yagao",
    "desc" : "gaoxiao meibai",
    "price" : 30,
    "producer" : "gaolujie producer",
    "tags" : [
      "meibai",
      "fangzhu"
    ]
  }
}

2.3 修改商品

修改商品数据,有两种方式:替换更新

替换方式的ES语法如下:

#index表示索引名,type是类型名,id是document的唯一标识
PUT /{index}/{type}/{id}
{
    #document的所有字段
}

比如,我们执行:

PUT /ecommerce/product/1
{
    "name" : "jiaqiangban gaolujie yagao",
    "desc" :  "gaoxiao meibai",
    "price" :  30,
    "producer" :      "gaolujie producer",
    "tags": [ "meibai", "fangzhu" ]
}

返回如下信息:

{
  "_index" : "ecommerce",
  "_type" : "product",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 2,
  "_primary_term" : 1
}

注意,使用替换方式去修改商品信息时,document的所有field都需要带上。

我们再来看下更新方式。更新方式的语法如下:

#index表示索引名,type是类型名,id是document的唯一标识
POST /{index}/{type}/{id}/_update
{
    "doc":{
        #document的指定field
    }
}

比如,我们执行:

POST /ecommerce/product/1/_update
{
  "doc": {
    "name": "jiaqiangban gaolujie yagao"
  }
}

返回如下信息:

{
  "_index" : "ecommerce",
  "_type" : "product",
  "_id" : "1",
  "_version" : 4,
  "result" : "noop",
  "_shards" : {
    "total" : 0,
    "successful" : 0,
    "failed" : 0
  },
  "_seq_no" : 4,
  "_primary_term" : 1
}

2.4 删除商品

删除商品数据,ES语法如下:

#index表示索引名,type是类型名,id是document的唯一标识
DELETE /{index}/{type}/{id}

比如,我们执行DELETE /ecommerce/product/1,返回如下信息:

{
  "_index" : "ecommerce",
  "_type" : "product",
  "_id" : "1",
  "_version" : 5,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 5,
  "_primary_term" : 1
}

三、数据检索

本节,我们来讲解下Elasticsearch中的各种常见的数据检索方式。

语法:GET /{index}/{type}/_search

我们在很多网站进行搜索的时候,搜索词一般都是跟在search参数后面,以query string的http请求形式发起检索。比如,我们要搜索商品名中包含“yagao”关键子的所有商品,并将结果按照售价排序。那么就可以执行以下语句:

GET /ecommerce/product/_search?q=name:yagao&sort=price:desc

返回结果:

{
  "took" : 530,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "ecommerce",
        "_type" : "product",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "name" : "zhonghua yagao",
          "desc" : "caoben zhiwu",
          "price" : 40,
          "producer" : "zhonghua producer",
          "tags" : [
            "qingxin"
          ]
        },
        "sort" : [
          40
        ]
      },
      {
        "_index" : "ecommerce",
        "_type" : "product",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "gaolujie yagao",
          "desc" : "gaoxiao meibai",
          "price" : 30,
          "producer" : "gaolujie producer",
          "tags" : [
            "meibai",
            "fangzhu"
          ]
        },
        "sort" : [
          30
        ]
      },
      {
        "_index" : "ecommerce",
        "_type" : "product",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "name" : "jiajieshi yagao",
          "desc" : "youxiao fangzhu",
          "price" : 25,
          "producer" : "jiajieshi producer",
          "tags" : [
            "fangzhu"
          ]
        },
        "sort" : [
          25
        ]
      }
    ]
  }
}

我们对上面的一些关键返回字段的含义说明下:
took:耗费时长(毫秒)
timed_out:是否超时
_shards:数据拆成的分片数,所以对于搜索请求,会打到所有的primary shard(或者是primary shard对应的某个replica shard);
hits.total:查询结果的数量,3个document;
hits.max_score:score的含义,就是document对于一个search的相关度的匹配分数,越相关,就越匹配,分数也高;
hits.hits:包含了匹配搜索的document的详细数据。

query string search适用于一些临时、快速的检索请求,如果查询请求很复杂,那么query string search是不太适用的,所以在生产环境中,几乎很少使用query string search

3.2 query DSL

所谓DSL,就是Domain Specified Language,是Elasticsearch中很常用的一种检索方式。

语法:

GET /{index}/{type}/_search
{
    "query":{}
}

我们通过“查询商品名中包含yagao关键子的所有商品,并将结果按照售价排序”这个示例来看下如何使用:

GET /ecommerce/product/_search
{
    "query" : {
        "match" : {
            "name" : "yagao"
        }
    },
    "sort": [
        { "price": "desc" }
    ]
}

可以看到,所有查询请求参数全部放到了一个http requstbody里面,所以可以构建复杂的查询,适合生产环境使用。

3.3 query filter

query filter主要用于对数据进行过滤。比如,我们要搜索商品名称包含”yagao“,且售价大于25元的商品:

GET /ecommerce/product/_search
{
    "query" : {
        "bool" : {
            "must" : {
                "match" : {
                    "name" : "yagao" 
                }
            },
            "filter" : {
                "range" : {
                    "price" : { "gt" : 25 } 
                }
            }
        }
    }
}

上述bool里面可以封装多个查询条件。

full-text search,就是全文检索。比如我们要查询producer字段中包含”producer“或”jiajieshi“的所有商品:

GET /ecommerce/product/_search
{
    "query" : {
        "match" : {
            "producer" : "producer jiajieshi"
        }
    }
}

可以看到,我们在上述的”match“里面有空格分隔开了"producer jiajieshi",这样只要producer字段中包含了上述任意一个字符,就都会被检索到。结果如下:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.1143606,
    "hits" : [
      {
        "_index" : "ecommerce",
        "_type" : "product",
        "_id" : "2",
        "_score" : 1.1143606,
        "_source" : {
          "name" : "jiajieshi yagao",
          "desc" : "youxiao fangzhu",
          "price" : 25,
          "producer" : "jiajieshi producer",
          "tags" : [
            "fangzhu"
          ]
        }
      },
      {
        "_index" : "ecommerce",
        "_type" : "product",
        "_id" : "1",
        "_score" : 0.13353139,
        "_source" : {
          "name" : "gaolujie yagao",
          "desc" : "gaoxiao meibai",
          "price" : 30,
          "producer" : "gaolujie producer",
          "tags" : [
            "meibai",
            "fangzhu"
          ]
        }
      },
      {
        "_index" : "ecommerce",
        "_type" : "product",
        "_id" : "3",
        "_score" : 0.13353139,
        "_source" : {
          "name" : "zhonghua yagao",
          "desc" : "caoben zhiwu",
          "price" : 40,
          "producer" : "zhonghua producer",
          "tags" : [
            "qingxin"
          ]
        }
      }
    ]
  }
}

注意,采用全文检索时,返回的每一项数据中有一个“_score”字段,这个就是“相关度分数”的意思,分数越高,则越接近检索词。

phrase search(短语搜索),跟全文检索刚好相反。在全文检索会将输入的搜索串拆解开来,去倒排索引里面去一一匹配,只要能匹配上任意一个拆解后的单词,就可以作为结果返回。而phrase search则要求输入的关键字必须在出现指定的文本中,一模一样才算匹配。

举个例子,我们要查询producer字段中包含关键字”jiajieshi producer“的所有商品:

GET /ecommerce/product/_search
{
    "query" : {
        "match_phrase" : {
            "producer" : "jiajieshi producer"
        }
    }
}

返回结果如下,可以看到,只包含一条producer字段为“jiajieshi producer”的记录:

{
  "took" : 48,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.1143606,
    "hits" : [
      {
        "_index" : "ecommerce",
        "_type" : "product",
        "_id" : "2",
        "_score" : 1.1143606,
        "_source" : {
          "name" : "jiajieshi yagao",
          "desc" : "youxiao fangzhu",
          "price" : 25,
          "producer" : "jiajieshi producer",
          "tags" : [
            "fangzhu"
          ]
        }
      }
    ]
  }
}

highlight search(高亮搜索结果)。举个例子,我们希望检索name字段包含“yaogao”的所有商品,然后对检索结果中的“yaogao”文本进行高亮:

GET /ecommerce/product/_search
{
    "query" : {
        "match" : {
            "name" : "yagao"
        }
    },
    "highlight": {
        "fields" : {
            "name" : {}
        }
    }
}

返回结果如下,可以看到,匹配项的结果里多了个“highlight”字段,里面用<em>高亮了匹配的文本:

{
  "took" : 97,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.13353139,
    "hits" : [
      {
        "_index" : "ecommerce",
        "_type" : "product",
        "_id" : "1",
        "_score" : 0.13353139,
        "_source" : {
          "name" : "gaolujie yagao",
          "desc" : "gaoxiao meibai",
          "price" : 30,
          "producer" : "gaolujie producer",
          "tags" : [
            "meibai",
            "fangzhu"
          ]
        },
        "highlight" : {
          "name" : [
            "gaolujie <em>yagao</em>"
          ]
        }
      },
      {
        "_index" : "ecommerce",
        "_type" : "product",
        "_id" : "2",
        "_score" : 0.13353139,
        "_source" : {
          "name" : "jiajieshi yagao",
          "desc" : "youxiao fangzhu",
          "price" : 25,
          "producer" : "jiajieshi producer",
          "tags" : [
            "fangzhu"
          ]
        },
        "highlight" : {
          "name" : [
            "jiajieshi <em>yagao</em>"
          ]
        }
      },
      {
        "_index" : "ecommerce",
        "_type" : "product",
        "_id" : "3",
        "_score" : 0.13353139,
        "_source" : {
          "name" : "zhonghua yagao",
          "desc" : "caoben zhiwu",
          "price" : 40,
          "producer" : "zhonghua producer",
          "tags" : [
            "qingxin"
          ]
        },
        "highlight" : {
          "name" : [
            "zhonghua <em>yagao</em>"
          ]
        }
      }
    ]
  }
}

四、数据分析

这一节,我们来看下如何利用Elasticsearch进行数据分析。首先,我们还是录入以下document数据:

PUT /ecommerce/product/1
{
    "name" : "gaolujie yagao",
    "desc" :  "gaoxiao meibai",
    "price" :  30,
    "producer" :      "gaolujie producer",
    "tags": [ "meibai", "fangzhu" ]
}

PUT /ecommerce/product/2
{
    "name" : "jiajieshi yagao",
    "desc" :  "youxiao fangzhu",
    "price" :  25,
    "producer" :      "jiajieshi producer",
    "tags": [ "fangzhu" ]
}

PUT /ecommerce/product/3
{
    "name" : "zhonghua yagao",
    "desc" :  "caoben zhiwu",
    "price" :  40,
    "producer" :      "zhonghua producer",
    "tags": [ "qingxin" ]
}

注意,要使用聚合分析,需要将文本field的fielddata属性设置为true,所以还要执行:

PUT /ecommerce/_mapping
{
  "properties": {
    "tags": {
      "type": "text",
      "fielddata": true
    }
  }
}

关于mapping,我后续章节会详细讲解,这里读者只要跟着这么操作就行了。

4.1 聚合分析

语法:

GET /{index}/{type}/_search
{
  "aggs": {
    "聚合名称": {
      "terms": { "field": "分组字段" }
    }
  }
}

我们的第一个需求是:计算每个tag下的商品数量。比如,"fangzhu"这个tag,一共有两个商品包含它。

请求如下,意思是新建一个名称为“group_by_tags”的聚合,按照tags这个字段进行分组:

GET /ecommerce/product/_search
{
  "aggs": {
    "group_by_tags": {
      "terms": { "field": "tags" }
    }
  }
}

响应如下:

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "ecommerce",
        "_type" : "product",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "gaolujie yagao",
          "desc" : "gaoxiao meibai",
          "price" : 30,
          "producer" : "gaolujie producer",
          "tags" : [
            "meibai",
            "fangzhu"
          ]
        }
      },
      {
        "_index" : "ecommerce",
        "_type" : "product",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "jiajieshi yagao",
          "desc" : "youxiao fangzhu",
          "price" : 25,
          "producer" : "jiajieshi producer",
          "tags" : [
            "fangzhu"
          ]
        }
      },
      {
        "_index" : "ecommerce",
        "_type" : "product",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "zhonghua yagao",
          "desc" : "caoben zhiwu",
          "price" : 40,
          "producer" : "zhonghua producer",
          "tags" : [
            "qingxin"
          ]
        }
      }
    ]
  },
  "aggregations" : {
    "group_by_tags" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "fangzhu",
          "doc_count" : 2
        },
        {
          "key" : "meibai",
          "doc_count" : 1
        },
        {
          "key" : "qingxin",
          "doc_count" : 1
        }
      ]
    }
  }
}

我们关键看:

"aggregations" : {
  "group_by_tags" : {
    "doc_count_error_upper_bound" : 0,
    "sum_other_doc_count" : 0,
    "buckets" : [
      {
        "key" : "fangzhu",
        "doc_count" : 2
      },
      {
        "key" : "meibai",
        "doc_count" : 1
      },
      {
        "key" : "qingxin",
        "doc_count" : 1
      }
    ]
  }
}

上面的“buckets”就是按照tags进行分组的结果,key和doc_count就表示每个tag对应的数量。


我们再来个复杂点的需求:按照指定的价格范围区间进行分组,然后在每组内再按照tag进行分组,最后再计算组内的平均价格。

请求:

GET /ecommerce/product/_search
{
  "size": 0,
  "aggs": {
    "group_by_price": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 0,
            "to": 20
          },
          {
            "from": 20,
            "to": 40
          },
          {
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
        "group_by_tags": {
          "terms": {
            "field": "tags"
          },
          "aggs": {
            "average_price": {
              "avg": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}

响应:

{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_price" : {
      "buckets" : [
        {
          "key" : "0.0-20.0",
          "from" : 0.0,
          "to" : 20.0,
          "doc_count" : 0,
          "group_by_tags" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [ ]
          }
        },
        {
          "key" : "20.0-40.0",
          "from" : 20.0,
          "to" : 40.0,
          "doc_count" : 2,
          "group_by_tags" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "fangzhu",
                "doc_count" : 2,
                "average_price" : {
                  "value" : 27.5
                }
              },
              {
                "key" : "meibai",
                "doc_count" : 1,
                "average_price" : {
                  "value" : 30.0
                }
              }
            ]
          }
        },
        {
          "key" : "40.0-50.0",
          "from" : 40.0,
          "to" : 50.0,
          "doc_count" : 1,
          "group_by_tags" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "qingxin",
                "doc_count" : 1,
                "average_price" : {
                  "value" : 40.0
                }
              }
            ]
          }
        }
      ]
    }
  }
}

五、总结

本章,我通过一个简单的案例讲解了Elasticsearch的数据增删改查,基本的数据检索和数据分析,相信新手童鞋们对Elasticsearch的基本操作已经掌握了。下一章,我将讲解Elasticsearch的分布式架构。

正文到此结束
本文目录