Skip to content

Conversation

@ok2sh
Copy link
Contributor

@ok2sh ok2sh commented Nov 21, 2023

Description

This PR fixes ExLlama backend having fixed 2048 size cache. This results in the following error when 2048 context is exceeded: stderr RuntimeError: start (2009) + length (44) exceeds dimension size (2048)

This PR also passes through Rope scaling config to the ExLlama backend.

Notes for Reviewers

These attributes are set by default here, and not loaded from the model config json. Therefore, we need to set it from LocalAI's end. It would also be nice to add warnings for all backends of unimplemented config flags that the user defined in the yaml.

Signed commits

  • Yes, I signed my commits.

Copy link
Owner

@mudler mudler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, thanks

Copy link
Collaborator

@lunamidori5 lunamidori5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mudler why does it keep asking me for a review everytime I add you lol

@mudler mudler marked this pull request as ready for review November 21, 2023 18:26
@mudler mudler merged commit 20d637e into mudler:master Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants