Skip to content

Conversation

@dakrone
Copy link
Member

@dakrone dakrone commented Feb 25, 2015

This behaves similar to the way that minimum_should_match works for
the match query (in fact it is implemented in the exact same way)

Fixes #6449

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the best place to apply minShouldMatch. The difference between this and say match query is that it supports multiple fields (i realize the default is _all and this patch will work with that).

So in this case, this query will return a boolean query across say title, body, author fields. the current application of minShouldMatch will be unintuitive there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh okay, I will do this in SimpleQueryParser instead

@dakrone
Copy link
Member Author

dakrone commented Feb 25, 2015

@rmuir I dug into how this was changing the queries, and I think the current behavior is correct (I pushed more tests):

For these 4 documents:

{"body": "foo"} // id: 1
{"body": "bar"} // id: 2
{"body": "foo bar"} // id: 3
{"body": "foo bar baz"} // id: 4

I have 3 queries

First, an SQS query with foo bar and minumum_should_match of 2, this gets
transformed to

(_all:foo _all:bar)~2

Which is correct

Second an SQS query with foo bar for the fields "body" and "body2" (which
doesn't exist) with MSM=2, transformed to:

((body2:foo body:foo) (body2:bar body:bar))~2

This seems correct to be also, it can occur in either body or body2, but
more "foo" and "bar" have to be present.

Third, an SQS query with foo bar baz on the "body" and "body2" fields with
MSM=70%, which becomes:

((body2:foo body:foo) (body2:bar body:bar) (body2:baz body:baz))~2

Which seems correct to me also.

Next, I indexed 4 more documents to test the cross-field matching:

{"body2": "foo", "other": "foo"} // id: 5
{"body2": "bar", "other": "foo"} // id: 6
{"body2": "foo bar", "other": "foo"} // id: 7
{"body2": "foo bar baz", "other": "foo"} // id: 8

Then I issue 3 more queries.

First, a query for foo bar on the "body" and "body2" field with MSM=2

((body2:foo body:foo) (body2:bar body:bar))~2

This correctly matches documents 3, 4, 7, and 8

Second, a query for foo bar with no fields set (meaning "_all") and MSM=2

(_all:foo _all:bar)~2

This matches 3, 4, 6, 7, & 8, which is what I would expect

Finally, a query for foo bar baz on "body2" and "other" with MSM=70%

((other:foo body2:foo) (other:bar body2:bar) (other:baz body2:baz))~2

Matching 6, 7, and 8, which is what I would expect, because to me it is
analogous to saying "find 2 of the following: 'foo, bar, baz' in the fields
'body2' and 'other'".

Can you explain more about what you mean with the unintuitive multi-field
application of the minimum_should_match?

@rmuir
Copy link
Contributor

rmuir commented Feb 25, 2015

I think you are right (and great to have those new tests), in that it works for simple multi-field cases too. My brain must have mixed this up with other parsers.

I think it still degrades to the behavior I am concerned about when the language doesn't use whitespace to separate words, or when WHITESPACE operator is disabled (useful for query-time multi-word synonym support), or other cases. But in general it works and that is something to just fix about the parser one day.

+1 for this simple approach... sorry for the noise

@dakrone dakrone force-pushed the sqs-minimum-should-match branch from c85912b to 2e9ea4a Compare February 25, 2015 18:35
This behaves similar to the way that `minimum_should_match` works for
the `match` query (in fact it is implemented in the exact same way)

Fixes elastic#6449
@rmuir
Copy link
Contributor

rmuir commented Feb 25, 2015

as far as confusion over some of that stuff, i think it can apply in general with MSM, and will happen when things like stopwords are different across the fields too. Perhaps we should just doc some kind of warning, and keep it simple. I see all kinds of confusion about this e.g. on the solr lists around such things, and i fixed a bug in one of its parsers where it didnt apply it for chinese, which will certainly happen here. This is better than a lot of complexity I think. I always thought MSM was hard to integrate into parsers.

@dakrone dakrone merged commit 2e9ea4a into elastic:master Feb 25, 2015
@dakrone dakrone deleted the sqs-minimum-should-match branch April 6, 2015 17:50
@im-denisenko im-denisenko mentioned this pull request Apr 6, 2015
16 tasks
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Search/Search Search-related issues that do not fall into other categories v1.5.0 v2.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Minimium_should_match doesn't work with simple_query_string

3 participants