Skip to content
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -393,9 +393,9 @@ buildRestTests.setups['exams'] = '''
refresh: true
body: |
{"index":{}}
{"grade": 100}
{"grade": 100, "weight": 2}
{"index":{}}
{"grade": 50}'''
{"grade": 50, "weight": 3}'''

buildRestTests.setups['stored_example_script'] = '''
# Simple script to load a field. Not really a good example, but a simple one.
Expand Down
2 changes: 2 additions & 0 deletions docs/reference/aggregations/metrics.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ bucket aggregations (some bucket aggregations enable you to sort the returned bu

include::metrics/avg-aggregation.asciidoc[]

include::metrics/weighted-avg-aggregation.asciidoc[]

include::metrics/cardinality-aggregation.asciidoc[]

include::metrics/extendedstats-aggregation.asciidoc[]
Expand Down
185 changes: 185 additions & 0 deletions docs/reference/aggregations/metrics/weighted-avg-aggregation.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
[[search-aggregations-metrics-weight-avg-aggregation]]
=== Weighted Avg Aggregation

A `single-value` metrics aggregation that computes the weighted average of numeric values that are extracted from the aggregated documents.
These values can be extracted either from specific numeric fields in the documents.

When calculating a regular average, each datapoint has an equal "weight" ... it contributes equally to the final value. Weighted averages,
on the other hand, weight each datapoint differently. The amount that each datapoint contributes to the final value is extracted from the
document, or provided by a script.

As a formula, a weighted average is the `∑(value * weight) / ∑(weight)`

A regular average can be thought of as a weighted average where every value has an implicit weight of `1`.


.`weighted_avg` Parameters
|===
|Parameter Name |Description |Required |Default Value
|`value` | The configuration for the field or script that provides the values |Required |
|`weight` | The configuration for the field or script that provides the weights |Required |
|`format` | The numeric response formatter |Optional |
|`value_type` | A hint about the values for pure scripts or unmapped fields |Optional |
|===

The `value` and `weight` objects have per-field specific configuration:

.`value` Parameters
|===
|Parameter Name |Description |Required |Default Value
|`field` | The field that values should be extracted from |Required |
|`missing` | A value to use if the field is missing entirely |Optional |
|`multi` | If a document has multiple values for the field, how should the values be combined |Optional | `avg`
|`script` | A script which provides the values for the document. This is mutually exclusive with `field` |Optional
|===

.`weight` Parameters
|===
|Parameter Name |Description |Required |Default Value
|`field` | The field that weights should be extracted from |Required |
|`missing` | A weight to use if the field is missing entirely |Optional |
|`multi` | If a document has multiple values for the field, how should the values be combined |Optional | `avg`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say weights instead of values?

|`script` | A script which provides the values for the document. This is mutually exclusive with `field` |Optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say weights instead of values?

|===


==== Examples

If our documents have a `"grade"` field that holds a 0-100 numeric score, and a `"weight"` field which holds an arbitrary numeric weight,
we can calculate the weighted average using:

[source,js]
--------------------------------------------------
POST /exams/_search
{
"size": 0,
"aggs" : {
"weighted_grade": {
"weighted_avg": {
"value": {
"field": "grade"
},
"weight": {
"field": "weight"
}
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[setup:exams]

Which yields a response like:

[source,js]
--------------------------------------------------
{
...
"aggregations": {
"weighted_grade": {
"value": 70.0
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]


==== Script

Both the value and the weight can be derived from a script, instead of a field. As a simple example, the following
will add one to the grade and weight in the document using a script:

[source,js]
--------------------------------------------------
POST /exams/_search
{
"size": 0,
"aggs" : {
"weighted_grade": {
"weighted_avg": {
"value": {
"script": "doc.grade.value + 1"
},
"weight": {
"script": "doc.weight.value + 1"
}
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[setup:exams]


==== Missing values

The `missing` parameter defines how documents that are missing a value should be treated.
The default behavior is different for `value` and `weight`:

By default, if the `value` field is missing the document is ignored and the aggregation moves on to the next document.
If the `weight` field is missing, it is assumed to have a weight of `1` (like a normal average).

Both of these defaults can be overridden with the `missing` parameter:

[source,js]
--------------------------------------------------
POST /exams/_search
{
"size": 0,
"aggs" : {
"weighted_grade": {
"weighted_avg": {
"value": {
"field": "grade",
"missing": 2
},
"weight": {
"field": "weight",
"missing": 3
}
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[setup:exams]

==== Multi-value mode

If a document has multiple values, you can configure the `multi` mode of both `value` and `weight`. This controls
how the multiple values should be combined when calculating the average. Acceptable values are:

- `avg`: average the multiple values together
- `min`: use the minimum value
- `max`: use the maximum value
- `sum`: sum all the values together

The default if unspecified is `avg`.

[source,js]
--------------------------------------------------
POST /exams/_search
{
"size": 0,
"aggs" : {
"weighted_grade": {
"weighted_avg": {
"value": {
"field": "grade",
"multi": "avg"
},
"weight": {
"field": "weight",
"multi": "min"
}
}
}
}
}
--------------------------------------------------
// CONSOLE
// TEST[setup:exams]
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
import org.elasticsearch.search.aggregations.AggregationBuilder;
import org.elasticsearch.search.aggregations.AggregatorFactories;
import org.elasticsearch.search.aggregations.AggregatorFactory;
import org.elasticsearch.search.aggregations.support.MultiValuesSourceAggregationBuilder;
import org.elasticsearch.search.aggregations.support.ArrayValuesSourceAggregationBuilder;
import org.elasticsearch.search.aggregations.support.ValueType;
import org.elasticsearch.search.aggregations.support.ValuesSource;
import org.elasticsearch.search.aggregations.support.ValuesSource.Numeric;
Expand All @@ -38,7 +38,7 @@
import java.util.Map;

public class MatrixStatsAggregationBuilder
extends MultiValuesSourceAggregationBuilder.LeafOnly<ValuesSource.Numeric, MatrixStatsAggregationBuilder> {
extends ArrayValuesSourceAggregationBuilder.LeafOnly<ValuesSource.Numeric, MatrixStatsAggregationBuilder> {
public static final String NAME = "matrix_stats";

private MultiValueMode multiValueMode = MultiValueMode.AVG;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
import org.elasticsearch.search.aggregations.LeafBucketCollectorBase;
import org.elasticsearch.search.aggregations.metrics.MetricsAggregator;
import org.elasticsearch.search.aggregations.pipeline.PipelineAggregator;
import org.elasticsearch.search.aggregations.support.MultiValuesSource.NumericMultiValuesSource;
import org.elasticsearch.search.aggregations.support.ArrayValuesSource.NumericArrayValuesSource;
import org.elasticsearch.search.aggregations.support.ValuesSource;
import org.elasticsearch.search.internal.SearchContext;

Expand All @@ -43,7 +43,7 @@
**/
final class MatrixStatsAggregator extends MetricsAggregator {
/** Multiple ValuesSource with field names */
private final NumericMultiValuesSource valuesSources;
private final NumericArrayValuesSource valuesSources;

/** array of descriptive stats, per shard, needed to compute the correlation */
ObjectArray<RunningStats> stats;
Expand All @@ -53,7 +53,7 @@ final class MatrixStatsAggregator extends MetricsAggregator {
Map<String,Object> metaData) throws IOException {
super(name, context, parent, pipelineAggregators, metaData);
if (valuesSources != null && !valuesSources.isEmpty()) {
this.valuesSources = new NumericMultiValuesSource(valuesSources, multiValueMode);
this.valuesSources = new NumericArrayValuesSource(valuesSources, multiValueMode);
stats = context.bigArrays().newObjectArray(1);
} else {
this.valuesSources = null;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
import org.elasticsearch.search.aggregations.AggregatorFactories;
import org.elasticsearch.search.aggregations.AggregatorFactory;
import org.elasticsearch.search.aggregations.pipeline.PipelineAggregator;
import org.elasticsearch.search.aggregations.support.MultiValuesSourceAggregatorFactory;
import org.elasticsearch.search.aggregations.support.ArrayValuesSourceAggregatorFactory;
import org.elasticsearch.search.aggregations.support.ValuesSource;
import org.elasticsearch.search.aggregations.support.ValuesSourceConfig;
import org.elasticsearch.search.internal.SearchContext;
Expand All @@ -33,7 +33,7 @@
import java.util.Map;

final class MatrixStatsAggregatorFactory
extends MultiValuesSourceAggregatorFactory<ValuesSource.Numeric, MatrixStatsAggregatorFactory> {
extends ArrayValuesSourceAggregatorFactory<ValuesSource.Numeric, MatrixStatsAggregatorFactory> {

private final MultiValueMode multiValueMode;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,14 @@
import org.elasticsearch.common.ParseField;
import org.elasticsearch.common.xcontent.XContentParser;
import org.elasticsearch.search.MultiValueMode;
import org.elasticsearch.search.aggregations.support.MultiValuesSourceParser.NumericValuesSourceParser;
import org.elasticsearch.search.aggregations.support.ArrayValuesSourceParser.NumericValuesSourceParser;
import org.elasticsearch.search.aggregations.support.ValueType;
import org.elasticsearch.search.aggregations.support.ValuesSourceType;

import java.io.IOException;
import java.util.Map;

import static org.elasticsearch.search.aggregations.support.MultiValuesSourceAggregationBuilder.MULTIVALUE_MODE_FIELD;
import static org.elasticsearch.search.aggregations.support.ArrayValuesSourceAggregationBuilder.MULTIVALUE_MODE_FIELD;

public class MatrixStatsParser extends NumericValuesSourceParser {

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,13 @@
/**
* Class to encapsulate a set of ValuesSource objects labeled by field name
*/
public abstract class MultiValuesSource <VS extends ValuesSource> {
public abstract class ArrayValuesSource<VS extends ValuesSource> {
protected MultiValueMode multiValueMode;
protected String[] names;
protected VS[] values;

public static class NumericMultiValuesSource extends MultiValuesSource<ValuesSource.Numeric> {
public NumericMultiValuesSource(Map<String, ValuesSource.Numeric> valuesSources, MultiValueMode multiValueMode) {
public static class NumericArrayValuesSource extends ArrayValuesSource<ValuesSource.Numeric> {
public NumericArrayValuesSource(Map<String, ValuesSource.Numeric> valuesSources, MultiValueMode multiValueMode) {
super(valuesSources, multiValueMode);
if (valuesSources != null) {
this.values = valuesSources.values().toArray(new ValuesSource.Numeric[0]);
Expand All @@ -51,8 +51,8 @@ public NumericDoubleValues getField(final int ordinal, LeafReaderContext ctx) th
}
}

public static class BytesMultiValuesSource extends MultiValuesSource<ValuesSource.Bytes> {
public BytesMultiValuesSource(Map<String, ValuesSource.Bytes> valuesSources, MultiValueMode multiValueMode) {
public static class BytesArrayValuesSource extends ArrayValuesSource<ValuesSource.Bytes> {
public BytesArrayValuesSource(Map<String, ValuesSource.Bytes> valuesSources, MultiValueMode multiValueMode) {
super(valuesSources, multiValueMode);
this.values = valuesSources.values().toArray(new ValuesSource.Bytes[0]);
}
Expand All @@ -62,14 +62,14 @@ public Object getField(final int ordinal, LeafReaderContext ctx) throws IOExcept
}
}

public static class GeoPointValuesSource extends MultiValuesSource<ValuesSource.GeoPoint> {
public static class GeoPointValuesSource extends ArrayValuesSource<ValuesSource.GeoPoint> {
public GeoPointValuesSource(Map<String, ValuesSource.GeoPoint> valuesSources, MultiValueMode multiValueMode) {
super(valuesSources, multiValueMode);
this.values = valuesSources.values().toArray(new ValuesSource.GeoPoint[0]);
}
}

private MultiValuesSource(Map<String, ?> valuesSources, MultiValueMode multiValueMode) {
private ArrayValuesSource(Map<String, ?> valuesSources, MultiValueMode multiValueMode) {
if (valuesSources != null) {
this.names = valuesSources.keySet().toArray(new String[0]);
}
Expand Down
Loading