-
Notifications
You must be signed in to change notification settings - Fork 115
Adds weight
parameter type definition for RRF retrievers.
#5502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for weighted retrievers in the RRF (Reciprocal Rank Fusion) retriever by introducing a new weight
parameter. The change allows individual retrievers to have different influence levels when combining their results.
- Introduced
RRFRetrieverContainer
class extendingRetrieverContainer
with optionalweight
parameter - Updated
RRFRetriever.retrievers
type fromRetrieverContainer[]
toRRFRetrieverContainer[]
- Added comprehensive documentation for the weight parameter behavior and constraints
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Following you can find the validation changes against the target branch for the APIs.
You can validate these APIs yourself by using the |
You'll need to backport this to 9.2 right? |
Thanks for the reminder! I've added the |
The backport job only runs after the PR is merged (if there are conflicts might need to use the manual backport tool to clean those up) |
*/ | ||
export class RRFRetrieverContainer extends RetrieverContainer { | ||
/** Weight multiplier for this retriever's contribution to the RRF score. Higher values increase influence. Defaults to 1.0 if not specified. Must be non-negative. */ | ||
weight?: float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we also have to include the RetrieverContainer
definition here as we have:
retrievers: [
{
"retriever": {},
"weight": 1
]
Maybe we could also combine with the nested structure of LinearRetriever
which uses an InnerRetriever
component.
Thanks for the PR @mridula-s109 ! From looking at the JSON example provided in elastic/elasticsearch#130658, @pmpailis seems to be right that the structure is not correct this way. "retrievers": [ // <- RRFRetriever.retrievers
{
"retriever": { // <- RRFRetrieverComponent.retriever
"standard": { // <- RetrieverContainer variant
"query": { "match": { "description": "pizza" } }
}
},
"weight": 0.7 // <- RRFRetrieverComponent.weigth
}, I think it should rather look like this: export class RRFRetrieverComponent { // name taken from the server code
retriever: RetrieverContainer
weigth?: float
} and // In RRFRetriever:
retrievers: RRFRetrieverComponent[] Btw.: I'm a little bit surprised that we ship this breaking change in a minor release. Is this expected? Side note: For |
I think that this is an extension, i.e. an alternative format that we support, and not a breaking change. The previous format has not gone away (which also implies some adjustments in this PR) |
Ah right, I just saw the corresponding test cases 👍
Unfortunately. This kind of extension will result in an untagged/undiscriminated union in the specification - which is the worst possible construct to have for all statically typed clients 😐 |
I have removed the |
Changes
RRFRetriever.retrievers
:RetrieverContainer[]
→RRFRetrieverContainer[]
weight?: float
Related: elastic/elasticsearch#130658, elastic/elasticsearch#136477