Skip to content

Ignore_malformed enabled by default and cannot be disabled on fields of type completion #47166

@andrejbl

Description

@andrejbl

Elasticsearch version (bin/elasticsearch --version):
6.8.3

Plugins installed: [discovery-ec2, repository-s3]

JVM version (java -version):
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)

OS version (uname -a if on a Unix-like system):
Linux es-data-2 4.14.143-91.122.amzn1.x86_64 #1 SMP Wed Sep 11 00:43:34 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
We have an index with a field mapping of type 'completion' defined like so:

{
  "mappings" : {
    "_doc" : {
      "dynamic" : "false",
      "properties" : {
        "id" : {
          "type" : "keyword"
        },
        "term" : {
          "type" : "completion",
          "analyzer" : "analyzer_search_term_suggestions",
          "search_analyzer" : "standard",
          "preserve_separators" : true,
          "preserve_position_increments" : true,
          "max_input_length" : 50
        },
        "type" : {
          "type" : "keyword"
        }
      }
    }
  },
  "settings" : {
    "index" : {
      "number_of_shards" : "1",
      "analysis" : {
        "filter" : {
          "custom_stop_filter" : {
            "type" : "stop",
            "stopwords" : [
              "within",
              "without"
            ]
          }
        },
        "analyzer" : {
          "analyzer_search_term_suggestions" : {
            "filter" : [
              "standard",
              "lowercase",
              "stop",
              "custom_stop_filter"
            ],
            "char_filter" : [
              "html_strip"
            ],
            "type" : "custom",
            "tokenizer" : "standard"
          }
        }
      }
    }
  }
}

In ES version 6.3.0 which we ran so far, ingesting a malformed document would fail, as would be expected:

curl -XPUT 'localhost:9200/app_store_suggested_search_term_test/_doc/app-name' -H 'Content-type: application/json' -d'{
   "id": "app-name",
   "term": "",
   "type" : "full-app-name"
}'

{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse"}],"type":"mapper_parsing_exception","reason":"failed to parse","caused_by":{"type":"illegal_argument_exception","reason":"value must have a length > 0"}},"status":400}

After upgrading ES to 6.8.3, the ingestion of such documents works, with the term field being flagged as _ignored:

curl -XPUT 'localhost:9200/app_store_suggested_search_term_test/_doc/full_app_name-' -H 'Content-type: application/json' -d'{
   "id": "full_app_name-",
   "term": "",
   "type" : "full-app-name"
}'
{"_index":"app_store_suggested_search_term_test","_type":"_doc","_id":"full_app_name-","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":2,"_primary_term":1}

...
{
        "_index" : "app_store_suggested_search_term_test",
        "_type" : "_doc",
        "_id" : "full_app_name-",
        "_score" : 1.0,
        "_ignored" : [
          "term"
        ],
        "_source" : {
          "id" : "full_app_name-",
          "term" : "",
          "type" : "full-app-name"
        }
      }
}

Documentation suggests that this behaviour should only happen when the ignore_malformed setting has been set to true on the index or the field itself. However, that setting is not set and we still see the behaviour. In fact, setting the ignore_malformed index level setting to either true or false on the index seems not to have any impact on the behaviour.

Steps to reproduce:

  1. Create an index with the ignore_malformed explicitly set to false:
curl -XPUT 'localhost:9200/my_index?pretty' -H 'Content-Type: application/json' -d'  
{
  "settings" : {
    "index" : {
     "mapping" : {
        "ignore_malformed": false
      },
      "number_of_shards" : 1, 
      "number_of_replicas" : 1,
      "analysis" : {
        "filter" : {
          "custom_stop_filter" : {
            "type" : "stop",
            "stopwords" : [
              "within",
              "without"
            ]
          }
        },
        "analyzer" : {
          "analyzer_search_term_suggestions" : {
            "filter" : [
              "standard",
              "lowercase",
              "stop",
              "custom_stop_filter"
            ],
            "char_filter" : [
              "html_strip"
            ],
            "type" : "custom",
            "tokenizer" : "standard"
          }
        }
      }
    }
  }
}'

curl -XPUT 'localhost:9200/my_index/_mapping/_doc?pretty' -H 'Content-Type: application/json' -d'  
{
 "dynamic" : "false",
 "properties" : {
   "id" : {
     "type" : "keyword"
   },
   "term" : {
     "type" : "completion",
     "analyzer" : "analyzer_search_term_suggestions",
     "search_analyzer" : "standard",
     "preserve_separators" : true,
     "preserve_position_increments" : true,
     "max_input_length" : 50
   },
   "type" : {
     "type" : "keyword"
   }
 }
}'
  1. Index a document with the malformed field:
curl -XPUT 'localhost:9200/my_index/_doc/full_app_name-' -H 'Content-type: application/json' -d'{
  "id": "full_app_name",
  "term": "",
  "type" : "full-app-name"
}'
  1. Ingestion should result in a failure. Instead it succeeds.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions