Elasticsearch.Nest 教程系列 8 聚合:Writing Aggregations | 使用聚合


可以简单将 ES 中的聚合和 Sql server 中的“聚合函数(如 SUM,COUNT 等”)相关联。聚合可以嵌套,通过聚合可以找出某个字段的最大值,最小值,平均值,以及对字段进行求和操作等复杂数据的构建。

另外,ES 还提出了 buckets(桶) 这个概念,你可以简单理解为相当于是 Sql server 中的分组(GROUP BY),即在 ES 中的称 GROUP BY 为“分桶”。

关于 Elasticsearch 中的聚合说明,可以见此

编写聚合

Nest 提供了 3 种方式来让你使用聚合:

  • 通过 lambda 表达式的方式。
  • 通过内建的请求对象 AggregationDictionary。
  • 通过结合二元运算符来简化 AggregationDictionary 的使用。

假设有以下 Project 类:

public class Project
{
    public string Name { get; set; }
    public int Quantity { get; set; }
}

三种方式的请求命令见下方:

POST /project/_search?typed_keys=true
{
    "aggs": { //关键字 aggregations,可以用 aggs 简写
        "average_quantity": { //聚合的名字
            "avg": {  //聚合的类型,可以理解为相当于 sql server 中的聚合函数
                "field": "quantity"  //聚合体,对哪些字段进行聚合
            }
        },
        "max_quantity": {
            "max": {
                "field": "quantity"
            }
        },
        "min_quantity": {
            "min": {
                "field": "quantity"
            }
        }
    }
}

lambda 方式

通过 lambda 表达式来使用聚合是简洁的方式

var searchResponse = _client.Search<Project>(s => s
    .Aggregations(aggs => aggs
        .Average("average_quantity", avg => avg.Field(p => p.Quantity))
        .Max("max_quantity", avg => avg.Field(p => p.Quantity))
        .Min("min_quantity", avg => avg.Field(p => p.Quantity))
    )
);

响应结果如下:

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 6,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "project",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "name": "Emma",
                    "quantity": 1
                }
            },
            {
                "_index": "project",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.0,
                "_source": {
                    "name": "Tran",
                    "quantity": 2
                }
            },
            {
                "_index": "project",
                "_type": "_doc",
                "_id": "3",
                "_score": 1.0,
                "_source": {
                    "name": "Lucy",
                    "quantity": 3
                }
            },
            {
                "_index": "project",
                "_type": "_doc",
                "_id": "4",
                "_score": 1.0,
                "_source": {
                    "name": "Geo",
                    "quantity": 4
                }
            },
            {
                "_index": "project",
                "_type": "_doc",
                "_id": "5",
                "_score": 1.0,
                "_source": {
                    "name": "Luby",
                    "quantity": 5
                }
            },
            {
                "_index": "project",
                "_type": "_doc",
                "_id": "6",
                "_score": 1.0,
                "_source": {
                    "name": "Han",
                    "quantity": 6
                }
            }
        ]
    },
    "aggregations": {
        "avg#average_quantity": {
            "value": 3.5
        },
        "max#max_quantity": {
            "value": 6.0
        },
        "min#min_quantity": {
            "value": 1.0
        }
    }
}

一般进行聚合查询的时候,并不需要 _source 的东西,所以你在进行聚合查询是,可以在查询语句上指定 size=0,这样就只会返回 聚合 的结果,方式如下:

var searchResponse = _client.Search<Project>(s => s
    .Size(0)  //显式指定为 0
    .Aggregations(aggs => aggs
        .Average("average_quantity", avg => avg.Field(p => p.Quantity))
        .Max("max_quantity", avg => avg.Field(p => p.Quantity))
        .Min("min_quantity", avg => avg.Field(p => p.Quantity))
    )
);

调整后的返回结果如下:

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 6,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "avg#average_quantity": {
            "value": 3.5
        },
        "max#max_quantity": {
            "value": 6.0
        },
        "min#min_quantity": {
            "value": 1.0
        }
    }
}

通过内建对象 AggregationDictionary

以下代码的效果和通过 lambda 表达式的效果一样

var searchRequest = new SearchRequest<Project>
{
    Size = 0,
    Aggregations = new AggregationDictionary
    {
        {"average_quantity", new AverageAggregation("average_quantity", "quantity")},
        {"max_quantity", new MaxAggregation("max_quantity", "quantity")},
        {"min_quantity", new MinAggregation("min_quantity", "quantity")},
    }
};
var searchResponse = _client.Search<Project>(searchRequest);
  • 这种方式在可读性上较差。

通过结合二元运算符来简化 AggregationDictionary 的使用

通过二元运算符,可以让代码的可读性更高,以下代码等效于上方:

var searchRequest = new SearchRequest<Project>
{
    Size = 0,
    Aggregations = new AverageAggregation("average_quantity", "quantity")
    &&new MaxAggregation("max_quantity", "quantity")
    &&new MinAggregation("min_quantity", "quantity")
};
var searchResponse = _client.Search<Project>(searchRequest);

获取响应结果

通过使用响应模型的 .Aggregations 属性,可以让你得到聚合的结果,如下:

保留关键字

在使用聚合功能的时候,需要避免跟 ES 保留关键字冲突,如以下关键字(包含但不限于):

  • “score”
  • “value_as_string”
  • “keys”
  • “max_score”