-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Feature/vllm/input embedding completion api #17590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/vllm/input embedding completion api #17590
Conversation
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Co-authored-by: Nan2018 <[email protected]> Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
…mpty tensors instead of none Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
…oid having two vLLM instances in memory at once Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
…ion endpoint while remaining type safe for non-completions endpoints Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
@DarkLight1337 any ideas about this? Do you think it is a blocker for this pr? |
|
I'm fine with not supporting LoRA for now, unless LoRA is a very important use case for this. |
|
Can you add an example script to the documentation for both offline and online inference? |
Signed-off-by: Andrew Sansom <[email protected]>
… engine is chosen implicitly Signed-off-by: Andrew Sansom <[email protected]>
I don't think this is an important use case at this time. I think it only came up because the existing completion tests checked for LoRA compatibility and @Nan2018 tried to use both of them together.
I added the |
|
Yeah they should be added automatically |
Signed-off-by: Andrew Sansom <[email protected]>
|
@DarkLight1337 It looks like docs build timed out. All of the fast checks are passing. I do think this PR is ready for review. Thanks for your help with this! |
|
Regarding the subprocess issue, it may be related to #18308 (comment) |
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this first though
|
@DarkLight1337 will this make it into the v0.9.0 release? |
|
Yes |
|
|
||
|
|
||
| @pytest.fixture(scope="module") | ||
| def zephyr_lora_added_tokens_files(zephyr_lora_files): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this lora module used for?
Signed-off-by: Andrew Sansom <[email protected]> Signed-off-by: Nan2018 <[email protected]> Co-authored-by: 临景 <[email protected]> Co-authored-by: Bryce1010 <[email protected]> Co-authored-by: Andrew Sansom <[email protected]> Co-authored-by: Andrew Sansom <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>
adds support for passing prompt_embeds as b64 encoded bytes to the completions api.
Start the server with
query example:
Note
this does not work with lora or prompt adapters