-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Elasticsearch version (bin/elasticsearch --version):
6.8.3
Plugins installed: [discovery-ec2, repository-s3]
JVM version (java -version):
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
OS version (uname -a if on a Unix-like system):
Linux es-data-2 4.14.143-91.122.amzn1.x86_64 #1 SMP Wed Sep 11 00:43:34 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
We have an index with a field mapping of type 'completion' defined like so:
{
"mappings" : {
"_doc" : {
"dynamic" : "false",
"properties" : {
"id" : {
"type" : "keyword"
},
"term" : {
"type" : "completion",
"analyzer" : "analyzer_search_term_suggestions",
"search_analyzer" : "standard",
"preserve_separators" : true,
"preserve_position_increments" : true,
"max_input_length" : 50
},
"type" : {
"type" : "keyword"
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : "1",
"analysis" : {
"filter" : {
"custom_stop_filter" : {
"type" : "stop",
"stopwords" : [
"within",
"without"
]
}
},
"analyzer" : {
"analyzer_search_term_suggestions" : {
"filter" : [
"standard",
"lowercase",
"stop",
"custom_stop_filter"
],
"char_filter" : [
"html_strip"
],
"type" : "custom",
"tokenizer" : "standard"
}
}
}
}
}
}
In ES version 6.3.0 which we ran so far, ingesting a malformed document would fail, as would be expected:
curl -XPUT 'localhost:9200/app_store_suggested_search_term_test/_doc/app-name' -H 'Content-type: application/json' -d'{
"id": "app-name",
"term": "",
"type" : "full-app-name"
}'
{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse"}],"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"illegal_argument_exception","reason":"value must have a length > 0"}},"status":400}
After upgrading ES to 6.8.3, the ingestion of such documents works, with the term field being flagged as _ignored:
curl -XPUT 'localhost:9200/app_store_suggested_search_term_test/_doc/full_app_name-' -H 'Content-type: application/json' -d'{
"id": "full_app_name-",
"term": "",
"type" : "full-app-name"
}'
{"_index":"app_store_suggested_search_term_test","_type":"_doc","_id":"full_app_name-","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":2,"_primary_term":1}
...
{
"_index" : "app_store_suggested_search_term_test",
"_type" : "_doc",
"_id" : "full_app_name-",
"_score" : 1.0,
"_ignored" : [
"term"
],
"_source" : {
"id" : "full_app_name-",
"term" : "",
"type" : "full-app-name"
}
}
}
Documentation suggests that this behaviour should only happen when the ignore_malformed setting has been set to true on the index or the field itself. However, that setting is not set and we still see the behaviour. In fact, setting the ignore_malformed index level setting to either true or false on the index seems not to have any impact on the behaviour.
Steps to reproduce:
- Create an index with the
ignore_malformedexplicitly set to false:
curl -XPUT 'localhost:9200/my_index?pretty' -H 'Content-Type: application/json' -d'
{
"settings" : {
"index" : {
"mapping" : {
"ignore_malformed": false
},
"number_of_shards" : 1,
"number_of_replicas" : 1,
"analysis" : {
"filter" : {
"custom_stop_filter" : {
"type" : "stop",
"stopwords" : [
"within",
"without"
]
}
},
"analyzer" : {
"analyzer_search_term_suggestions" : {
"filter" : [
"standard",
"lowercase",
"stop",
"custom_stop_filter"
],
"char_filter" : [
"html_strip"
],
"type" : "custom",
"tokenizer" : "standard"
}
}
}
}
}
}'
curl -XPUT 'localhost:9200/my_index/_mapping/_doc?pretty' -H 'Content-Type: application/json' -d'
{
"dynamic" : "false",
"properties" : {
"id" : {
"type" : "keyword"
},
"term" : {
"type" : "completion",
"analyzer" : "analyzer_search_term_suggestions",
"search_analyzer" : "standard",
"preserve_separators" : true,
"preserve_position_increments" : true,
"max_input_length" : 50
},
"type" : {
"type" : "keyword"
}
}
}'
- Index a document with the malformed field:
curl -XPUT 'localhost:9200/my_index/_doc/full_app_name-' -H 'Content-type: application/json' -d'{
"id": "full_app_name",
"term": "",
"type" : "full-app-name"
}'
- Ingestion should result in a failure. Instead it succeeds.