Skip to content

Commit f5ca4f7

Browse files
committed
docs: add fine-tuning example
Signed-off-by: Ettore Di Giacinto <[email protected]>
1 parent e94a34b commit f5ca4f7

File tree

7 files changed

+1969
-15
lines changed

7 files changed

+1969
-15
lines changed

README.md

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -81,10 +81,15 @@ Note that this started just as a [fun weekend project](https://localai.io/#backs
8181

8282
## 🔥🔥 Hot topics / Roadmap
8383

84-
- [Roadmap](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
84+
[Roadmap](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
8585

86-
Hot topics:
87-
- https://github.com/mudler/LocalAI/issues/1126
86+
🆕 New! [LLM finetuning guide](https://localai.io/advanced/fine-tuning/)
87+
88+
Hot topics (looking for contributors):
89+
- Backends v2: https://github.com/mudler/LocalAI/issues/1126
90+
- Improving UX v2: https://github.com/mudler/LocalAI/issues/1373
91+
92+
If you want to help and contribute, issues up for grabs: https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3A%22up+for+grabs%22
8893

8994
## 🚀 [Features](https://localai.io/features/)
9095

@@ -98,20 +103,13 @@ Hot topics:
98103
- 🖼️ [Download Models directly from Huggingface ](https://localai.io/models/)
99104
- 🆕 [Vision API](https://localai.io/features/gpt-vision/)
100105

101-
## :book: 🎥 [Media, Blogs, Social](https://localai.io/basics/news/#media-blogs-social)
102-
103-
- [Create a slackbot for teams and OSS projects that answer to documentation](https://mudler.pm/posts/smart-slackbot-for-teams/)
104-
- [LocalAI meets k8sgpt](https://www.youtube.com/watch?v=PKrDNuJ_dfE)
105-
- [Question Answering on Documents locally with LangChain, LocalAI, Chroma, and GPT4All](https://mudler.pm/posts/localai-question-answering/)
106-
- [Tutorial to use k8sgpt with LocalAI](https://medium.com/@tyler_97636/k8sgpt-localai-unlock-kubernetes-superpowers-for-free-584790de9b65)
107-
108106
## 💻 Usage
109107

110108
Check out the [Getting started](https://localai.io/basics/getting_started/index.html) section in our documentation.
111109

112-
### Community
110+
### 🔗 Community and integrations
113111

114-
WebUI
112+
WebUIs:
115113
- https://github.com/Jirubizu/localai-admin
116114
- https://github.com/go-skynet/LocalAI-frontend
117115

@@ -123,11 +121,19 @@ Other:
123121

124122
### 🔗 Resources
125123

124+
- 🆕 New! [LLM finetuning guide](https://localai.io/advanced/fine-tuning/)
126125
- [How to build locally](https://localai.io/basics/build/index.html)
127126
- [How to install in Kubernetes](https://localai.io/basics/getting_started/index.html#run-localai-in-kubernetes)
128127
- [Projects integrating LocalAI](https://localai.io/integrations/)
129128
- [How tos section](https://localai.io/howtos/) (curated by our community)
130129

130+
## :book: 🎥 [Media, Blogs, Social](https://localai.io/basics/news/#media-blogs-social)
131+
132+
- [Create a slackbot for teams and OSS projects that answer to documentation](https://mudler.pm/posts/smart-slackbot-for-teams/)
133+
- [LocalAI meets k8sgpt](https://www.youtube.com/watch?v=PKrDNuJ_dfE)
134+
- [Question Answering on Documents locally with LangChain, LocalAI, Chroma, and GPT4All](https://mudler.pm/posts/localai-question-answering/)
135+
- [Tutorial to use k8sgpt with LocalAI](https://medium.com/@tyler_97636/k8sgpt-localai-unlock-kubernetes-superpowers-for-free-584790de9b65)
136+
131137
## Citation
132138

133139
If you utilize this repository, data in a downstream project, please consider citing it with:

docs/content/_index.en.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -89,10 +89,15 @@ Note that this started just as a [fun weekend project](https://localai.io/#backs
8989

9090
## 🔥🔥 Hot topics / Roadmap
9191

92-
- [Roadmap](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
92+
[Roadmap](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
9393

94-
Hot topics:
95-
- https://github.com/mudler/LocalAI/issues/1126
94+
🆕 New! [LLM finetuning guide](https://localai.io/advanced/fine-tuning/)
95+
96+
Hot topics (looking for contributors):
97+
- Backends v2: https://github.com/mudler/LocalAI/issues/1126
98+
- Improving UX v2: https://github.com/mudler/LocalAI/issues/1373
99+
100+
If you want to help and contribute, issues up for grabs: https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3A%22up+for+grabs%22
96101

97102
## How does it work?
98103

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
2+
+++
3+
disableToc = false
4+
title = "Fine-tuning LLMs for text generation"
5+
weight = 3
6+
+++
7+
8+
{{% notice note %}}
9+
Section under construction
10+
{{% /notice %}}
11+
12+
This section covers how to fine-tune a language model for text generation and consume it in LocalAI.
13+
14+
## Requirements
15+
16+
For this example you will need at least a 12GB VRAM of GPU and a Linux box.
17+
18+
## Fine-tuning
19+
20+
Fine-tuning a language model is a process that requires a lot of computational power and time.
21+
22+
Currently LocalAI doesn't support the fine-tuning endpoint as LocalAI but there are are [plans](https://github.com/mudler/LocalAI/issues/596) to support that. For the time being a guide is proposed here to give a simple starting point on how to fine-tune a model and use it with LocalAI (but also with llama.cpp).
23+
24+
There is an e2e example of fine-tuning a LLM model to use with [LocalAI](https://github/mudler/LocalAI) written by [@mudler](https://github.com/mudler) available [here](https://github.com/mudler/LocalAI/tree/master/examples/e2e-fine-tuning/).
25+
26+
The steps involved are:
27+
28+
- Preparing a dataset
29+
- Prepare the environment and install dependencies
30+
- Fine-tune the model
31+
- Merge the Lora base with the model
32+
- Convert the model to gguf
33+
- Use the model with LocalAI
34+
35+
## Dataset preparation
36+
37+
We are going to need a dataset or a set of datasets.
38+
39+
Axolotl supports a variety of formats, in the notebook and in this example we are aiming for a very simple dataset and build that manually, so we are going to use the `completion` format which requires the full text to be used for fine-tuning.
40+
41+
A dataset for an instructor model (like Alpaca) can look like the following:
42+
43+
```json
44+
[
45+
{
46+
"text": "As an AI language model you are trained to reply to an instruction. Try to be as much polite as possible\n\n## Instruction\n\nWrite a poem about a tree.\n\n## Response\n\nTrees are beautiful, ...",
47+
},
48+
{
49+
"text": "As an AI language model you are trained to reply to an instruction. Try to be as much polite as possible\n\n## Instruction\n\nWrite a poem about a tree.\n\n## Response\n\nTrees are beautiful, ...",
50+
}
51+
]
52+
```
53+
54+
Every block in the text is the whole text that is used to fine-tune. For example, for an instructor model it follows the following format (more or less):
55+
56+
```
57+
<System prompt>
58+
59+
## Instruction
60+
61+
<Question, instruction>
62+
63+
## Response
64+
65+
<Expected response from the LLM>
66+
```
67+
68+
The instruction format works such as when we are going to inference with the model, we are going to feed it only the first part up to the `## Instruction` block, and the model is going to complete the text with the `## Response` block.
69+
70+
Prepare a dataset, and upload it to your Google Drive in case you are using the Google colab. Otherwise place it next the `axolotl.yaml` file as `dataset.json`.
71+
72+
### Install dependencies
73+
74+
```bash
75+
# Install axolotl and dependencies
76+
git clone https://github.com/OpenAccess-AI-Collective/axolotl && pushd axolotl && git checkout 797f3dd1de8fd8c0eafbd1c9fdb172abd9ff840a && popd #0.3.0
77+
pip install packaging
78+
pushd axolotl && pip install -e '.[flash-attn,deepspeed]' && popd
79+
80+
# https://github.com/oobabooga/text-generation-webui/issues/4238
81+
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.0/flash_attn-2.3.0+cu117torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
82+
```
83+
84+
Configure accelerate:
85+
86+
```bash
87+
accelerate config default
88+
```
89+
90+
## Fine-tuning
91+
92+
We will need to configure axolotl. In this example is provided a file to use `axolotl.yaml` that uses openllama-3b for fine-tuning. Copy the `axolotl.yaml` file and edit it to your needs. The dataset needs to be next to it as `dataset.json`. You can find the axolotl.yaml file [here](https://github.com/mudler/LocalAI/tree/master/examples/e2e-fine-tuning/).
93+
94+
If you have a big dataset, you can pre-tokenize it to speedup the fine-tuning process:
95+
96+
```bash
97+
# Optional pre-tokenize (run only if big dataset)
98+
python -m axolotl.cli.preprocess axolotl.yaml
99+
```
100+
101+
Now we are ready to start the fine-tuning process:
102+
```bash
103+
# Fine-tune
104+
accelerate launch -m axolotl.cli.train axolotl.yaml
105+
```
106+
107+
After we have finished the fine-tuning, we merge the Lora base with the model:
108+
```bash
109+
# Merge lora
110+
python3 -m axolotl.cli.merge_lora axolotl.yaml --lora_model_dir="./qlora-out" --load_in_8bit=False --load_in_4bit=False
111+
```
112+
113+
And we convert it to the gguf format that LocalAI can consume:
114+
115+
```bash
116+
117+
# Convert to gguf
118+
git clone https://github.com/ggerganov/llama.cpp.git
119+
pushd llama.cpp && make LLAMA_CUBLAS=1 && popd
120+
121+
# We need to convert the pytorch model into ggml for quantization
122+
# It crates 'ggml-model-f16.bin' in the 'merged' directory.
123+
pushd llama.cpp && python convert.py --outtype f16 \
124+
../qlora-out/merged/pytorch_model-00001-of-00002.bin && popd
125+
126+
# Start off by making a basic q4_0 4-bit quantization.
127+
# It's important to have 'ggml' in the name of the quant for some
128+
# software to recognize it's file format.
129+
pushd llama.cpp && ./quantize ../qlora-out/merged/ggml-model-f16.gguf \
130+
../custom-model-q4_0.bin q4_0
131+
132+
```
133+
134+
Now you should have ended up with a `custom-model-q4_0.bin` file that you can copy in the LocalAI models directory and use it with LocalAI.

examples/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,14 @@ This example show how to use LocalAI inside Kubernetes with [k8sgpt](https://k8s
4141

4242
![Screenshot from 2023-06-19 23-58-47](https://github.com/go-skynet/go-ggml-transformers.cpp/assets/2420543/cab87409-ee68-44ae-8d53-41627fb49509)
4343

44+
### Fine-tuning a model and convert it to gguf to use it with LocalAI
45+
46+
_by [@mudler](https://github.com/mudler)_
47+
48+
This example is an e2e example on how to fine-tune a model with [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) and convert it to gguf to use it with LocalAI.
49+
50+
[Check it out here](https://github.com/mudler/LocalAI/tree/master/examples/e2e-fine-tuning/)
51+
4452
### Flowise
4553

4654
_by [@mudler](https://github.com/mudler)_

examples/e2e-fine-tuning/README.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
This is an example of fine-tuning a LLM model to use with [LocalAI](https://github/mudler/LocalAI) written by [@mudler](https://github.com/mudler).
2+
3+
Specifically, this example shows how to use [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) to fine-tune a LLM model to consume with LocalAI as a `gguf` model.
4+
5+
A notebook is provided that currently works on _very small_ datasets on Google colab on the free instance. It is far from producing good models, but it gives a sense of how to use the code to use with a better dataset and configurations, and how to use the model produced with LocalAI.
6+
7+
## Requirements
8+
9+
For this example you will need at least a 12GB VRAM of GPU and a Linux box.
10+
The notebook is tested on Google Colab with a Tesla T4 GPU.
11+
12+
## Clone this directory
13+
14+
Clone the repository and enter the example directory:
15+
16+
```bash
17+
git clone http://github.com/mudler/LocalAI
18+
cd LocalAI/examples/e2e-fine-tuning
19+
```
20+
21+
## Install dependencies
22+
23+
```bash
24+
# Install axolotl and dependencies
25+
git clone https://github.com/OpenAccess-AI-Collective/axolotl && pushd axolotl && git checkout 797f3dd1de8fd8c0eafbd1c9fdb172abd9ff840a && popd #0.3.0
26+
pip install packaging
27+
pushd axolotl && pip install -e '.[flash-attn,deepspeed]' && popd
28+
29+
# https://github.com/oobabooga/text-generation-webui/issues/4238
30+
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.0/flash_attn-2.3.0+cu117torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
31+
```
32+
33+
Configure accelerate:
34+
35+
```bash
36+
accelerate config default
37+
```
38+
39+
## Fine-tuning
40+
41+
We will need to configure axolotl. In this example is provided a file to use `axolotl.yaml` that uses openllama-3b for fine-tuning. Copy the `axolotl.yaml` file and edit it to your needs. The dataset needs to be next to it as `dataset.json`. The format used is `completion` which is a list of JSON objects with a `text` field with the full text to train the LLM with.
42+
43+
If you have a big dataset, you can pre-tokenize it to speedup the fine-tuning process:
44+
45+
```bash
46+
# Optional pre-tokenize (run only if big dataset)
47+
python -m axolotl.cli.preprocess axolotl.yaml
48+
```
49+
50+
Now we are ready to start the fine-tuning process:
51+
```bash
52+
# Fine-tune
53+
accelerate launch -m axolotl.cli.train axolotl.yaml
54+
```
55+
56+
After we have finished the fine-tuning, we merge the Lora base with the model:
57+
```bash
58+
# Merge lora
59+
python3 -m axolotl.cli.merge_lora axolotl.yaml --lora_model_dir="./qlora-out" --load_in_8bit=False --load_in_4bit=False
60+
```
61+
62+
And we convert it to the gguf format that LocalAI can consume:
63+
64+
```bash
65+
66+
# Convert to gguf
67+
git clone https://github.com/ggerganov/llama.cpp.git
68+
pushd llama.cpp && make LLAMA_CUBLAS=1 && popd
69+
70+
# We need to convert the pytorch model into ggml for quantization
71+
# It crates 'ggml-model-f16.bin' in the 'merged' directory.
72+
pushd llama.cpp && python convert.py --outtype f16 \
73+
../qlora-out/merged/pytorch_model-00001-of-00002.bin && popd
74+
75+
# Start off by making a basic q4_0 4-bit quantization.
76+
# It's important to have 'ggml' in the name of the quant for some
77+
# software to recognize it's file format.
78+
pushd llama.cpp && ./quantize ../qlora-out/merged/ggml-model-f16.gguf \
79+
../custom-model-q4_0.bin q4_0
80+
81+
```
82+
83+
Now you should have ended up with a `custom-model-q4_0.bin` file that you can copy in the LocalAI models directory and use it with LocalAI.
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
2+
base_model: openlm-research/open_llama_3b_v2
3+
model_type: LlamaForCausalLM
4+
tokenizer_type: LlamaTokenizer
5+
load_in_8bit: false
6+
load_in_4bit: true
7+
strict: false
8+
push_dataset_to_hub: false
9+
datasets:
10+
- path: dataset.json
11+
ds_type: json
12+
type: completion
13+
dataset_prepared_path:
14+
val_set_size: 0.05
15+
adapter: qlora
16+
lora_model_dir:
17+
sequence_len: 1024
18+
sample_packing: true
19+
lora_r: 8
20+
lora_alpha: 32
21+
lora_dropout: 0.05
22+
lora_target_modules:
23+
lora_target_linear: true
24+
lora_fan_in_fan_out:
25+
wandb_project:
26+
wandb_entity:
27+
wandb_watch:
28+
wandb_run_id:
29+
wandb_log_model:
30+
output_dir: ./qlora-out
31+
gradient_accumulation_steps: 1
32+
micro_batch_size: 2
33+
num_epochs: 4
34+
optimizer: paged_adamw_32bit
35+
torchdistx_path:
36+
lr_scheduler: cosine
37+
learning_rate: 0.0002
38+
train_on_inputs: false
39+
group_by_length: false
40+
bf16: false
41+
fp16: true
42+
tf32: false
43+
gradient_checkpointing: true
44+
early_stopping_patience:
45+
resume_from_checkpoint:
46+
local_rank:
47+
logging_steps: 1
48+
xformers_attention:
49+
flash_attention: false
50+
gptq_groupsize:
51+
gptq_model_v1:
52+
warmup_steps: 20
53+
eval_steps: 0.05
54+
save_steps:
55+
debug:
56+
deepspeed:
57+
weight_decay: 0.1
58+
fsdp:
59+
fsdp_config:
60+
special_tokens:
61+
bos_token: "<s>"
62+
eos_token: "</s>"
63+
unk_token: "<unk>"

0 commit comments

Comments
 (0)