FinLoRA: Benchmarking LoRA Methods for Fine-Tuning LLMs on Financial Datasets

Abstract

Low-rank adaptation (LoRA) methods offer an affordable solution to scale general-purpose LLMs to hundreds of use scenarios. However, their efficacy in high-stakes domains like finance is rarely explored, e.g., passing CFA exams and analyzing SEC filings.

This open-source FinLoRA project benchmarks LoRA methods on both general and highly professional financial tasks. First, we curated 19 datasets covering diverse financial applications; in particular, four novel XBRL analysis datasets are based on 150 SEC filings. Second, we evaluated five LoRA methods and five base LLMs. Finally, we provide extensive experimental results in terms of accuracy, F1, and BERTScore and report computational cost in terms of time and GPU memory during fine-tuning and inference stages. We find that LoRA methods achieved substantial performance gains of 36% on average over base models. Our FinLoRA project provides an affordable and scalable approach to democratize financial intelligence to the general public.

Motivation

The proprietary BloombergGPT model announced in April 2023 highlighted the potential of financial Large Language Models (FinLLMs). However, such a "train-from-scratch" approach was resource-intensive, requiring one million GPU hours at an estimated cost of $3 million ($3 per GPU hour in 2023) and 512 A100 GPUs. This substantial investment underscores the need for a cost-effective solution.

We propose to leverage open-source models, such as Llama 3.1, and employ the LoRA (Low-Rank Adaptation) fine-tuning method. It dramatically reduces the number of trainable parameters to as little as 0.01% of the full model's parameters. This enables fine-tuning on 4 A5000 GPUs and brings the cost of fine-tuning down to less than $100, making FinLLMs accessible to the general public.

Financial Tasks

Our goal is to develop models capable of performing a range of financial tasks, from general applications to professional-level functions. A critical area within professional finance is the eXtensible Business Reporting Language (XBRL), the global standard for digital business reporting. XBRL, being XML-based, is inherently complex, making it challenging for humans to curate and interpret directly.

We are particularly interested in two key XBRL applications:

1. Financial Reporting: Assisting small and medium-sized businesses (SMBs) in generating compliant financial reports in the XBRL format.

2. Financial Statement Analysis: Facilitating the extraction of data from XBRL financial reports and enabling insightful analysis.

Datasets

We test Llama 3.1 8B Instruct with our LoRA adapters on 19 datasets across 4 different types of tasks, ranging from general financial tasks to professional level XBRL (eXtensible Business Reporting Language)-based financial statement analysis. The train-test splits for the four task categories are as follows: General Financial Tasks-122.9k/31.7k, Financial Certificate Tasks—472/346, Financial Reporting Tasks—15.9k/8.3k, and Financial Statement Analysis Tasks—Total: 27.9k/7.3k. For each task, we compute an accuracy and F1 score except for XBRL Term and Finance Bench, for which we compute a BERTScore F1. The dataset statistics are shown below.

Dataset Formats

Each dataset has a specific format:

General Financial Tasks:

Sentiment Analysis (FPB, FiQA SA, TFNS): In these datasets, a financial sentence must be classified with a sentiment from {negative, neutral, positive}.
NWGI Sentiment: In NWGI, financial text is classified into 7-level sentiment from {strong negative, moderately negative, mildly negative, neutral, mildly positive, moderately positive, strong positive}. In our testing, we simplified this to the {negative, neutral, positive} set of sentiments.
Headline Analysis: In the Headline dataset, financial headlines are classified with binary answers from {Yes, No} based on various questions like whether the headline talks about a share price going up.
Named Entity Recognition: In NER, financial text with a highlighted entity is classified into entity types from {person, location, organization}.

Financial Certificate Tasks:

CFA Level I/II/III & CPA REG: The CFA and CPA exams datasets include multiple choice questions from mock exams. the LLM must select an answer from {A, B, C, D} or {A, B, C} based on the question and context in the case of CFA Level II and CFA Level III. Our question set has a particular focus on ethics and regulations questions.

Financial Reporting Tasks:

XBRL Term: In XBRL Term, the LLM must provide a brief explanation for XBRL terminology from the XBRL International website.
FiNER/FNXL Tagging: In FiNER/FNX;, financial text contains numerical entities that must be tagged with appropriate US GAAP tags. Answers are comma-separated when multiple entities need tagging.

Financial Statement Analysis Tasks:

XBRL Tag Extraction: In XBRL tag extraction, the LLM analyzes XBRL context and must respond with an XBRL tag for a specific element.
XBRL Value Extraction: In XBRL value extraction, the LLM analyzes XBRL context to find specific numerical values.
XBRL Formula Construction: In XBRL formula construction, the LLM creates financial formulas using US GAAP tags.
XBRL Formula Calculation: In XBRL formula calculation, the LLM substitutes actual numerical values from the XBRL context into financial formulas.
Financial Math: In Financial Math, the LLM applies financial formulas to solve numerical problems given a formula and specific values.
FinanceBench: In FinanceBench, the LLM answers various questions based on XBRL financial reports.

Benchmark Results

We use Llama 3.1 8B Instruct as the base model.

As illustrated in the performance comparison above, Llama 3.1 8B Intruct with our LoRA adpaters demonstrates substantial improvements across all financial task categories. The fine-tuned Llama 3.1 8B model using various LoRA methods achieves remarkable performance gains, with improvements ranging from +36.4% to +67.1% across different task types. Most notably, LoRA methods show exceptional effectiveness in Financial Certificate tasks (professional exams like CFA and CPA), where models achieve over 80% accuracy compared to the base model's 13-32% range. Similarly, our LoRA adpaters show significant improvements of +40% to +52% in Financial Statement Analysis tasks, particularly in our novel XBRL analysis datasets, highlighting LoRA's capability in handling complex, structured financial data.

The results reveal that, while larger base models like GPT-4o and DeepSeek V3 perform well on general financial tasks, our cost-effective LoRA-adapted Llama 3.1 8B models often match or exceed their performance while requiring only a fraction of the computational resources. This validates our approach of democratizing financial intelligence through parameter-efficient fine-tuning, making sophisticated financial AI accessible to organizations without massive computational budgets.

Our models achieve the following performance on financial tasks. The table below shows accuracy/F1 scores. -/value represents BERTScore F1.

Full Results

Datasets	Base Models					Fine-tuned Models
	Llama 3.1 8B Instruct	Llama 3.1 70B Instruct	DeepSeek V3	GPT-4o	Gemini 2.0 FL	Llama 3.1 8B Instruct LoRA	Llama 3.1 8B Instruct QLoRA	Llama 3.1 8B Instruct DoRA	Llama 3.1 8B Instruct rsLoRA	Gemini 2.0 FL
General Financial Tasks
FPB	68.73/0.677	74.50/0.736	78.76/0.764	81.13/0.818	81.02/0.894	85.64/0.922	84.16/0.909	81.93/0.901	82.84/0.853	87.62/0.878
FiQA SA	46.55/0.557	47.27/0.565	60.43/0.686	72.34/0.773	68.09/0.810	81.28/0.884	78.30/0.874	78.72/0.874	73.19/0.806	88.09/0.879
TFNS	69.97/0.683	68.42/0.686	84.38/0.846	73.32/0.740	26.38/0.385	88.02/0.932	83.84/0.910	59.09/0.702	59.51/0.655	89.49/0.896
NWGI	43.86/0.583	50.14/0.596	7.44/0.097	66.61/0.656	48.16/0.614	54.16/0.690	49.96/0.645	19.57/0.281	35.80/0.464	62.59/0.581
NER	48.89/0.569	46.28/0.454	40.82/0.360	52.11/0.523	65.13/0.769	98.05/0.981	96.63/0.966	71.59/0.834	95.92/0.963	97.29/0.973
Headline	45.34/0.558	71.68/0.729	76.06/0.779	80.53/0.814	76.60/0.847	84.66/0.852	88.03/0.886	64.93/0.781	71.75/0.828	97.32/0.973
Financial Certificate Tasks
CFA Level 1	13.33/0.133	42.22/0.418	54.44/0.556	63.33/0.631	55.56/0.556	86.67/0.867	87.78/0.878	87.78/0.878	87.78/0.878	52.22/0.530
CFA Level 2	19.48/0.199	29.87/0.303	46.75/0.485	55.84/0.563	56.67/0.567	88.31/0.883	83.12/0.835	90.91/0.909	92.21/0.922	51.11/0.519
CFA Level 3	16.67/0.179	24.36/0.271	47.44/0.496	51.28/0.517	52.56/0.538	70.51/0.705	66.67/0.675	69.23/0.697	79.49/0.795	51.28/0.557
CPA REG	31.68/0.317	41.58/0.426	65.35/0.654	67.33/0.667	63.37/0.638	80.20/0.802	88.12/0.885	90.10/0.901	90.10/0.901	51.28/0.557
Financial Reporting Tasks
FiNER	21.28/0.232	61.82/0.606	68.92/0.699	72.29/0.725	63.91/0.638	74.10/0.759	74.32/0.760	70.92/0.732	70.72/0.724	80.32/0.802
FNXL	3.64/0.045	20.14/0.210	27.33/0.288	42.41/0.398	37.75/0.356	23.57/0.250	23.05/0.253	33.50/0.311	35.68/0.348	47.98/0.438
XBRL Term	-/0.574	-/0.587	-/0.573	-/0.584	-/0.572	-/0.599	-/0.606	-/0.606	-/0.630	-/0.666
Financial Statement Analysis Tasks
Tag Extraction	69.16/0.739	69.64/0.782	85.03/0.849	81.60/0.864	80.27/0.811	89.13/0.886	86.89/0.872	80.44/0.896	85.26/0.879	85.03/0.907
Value Extraction	52.46/0.565	88.19/0.904	98.01/0.982	97.01/0.974	98.02/0.980	98.49/0.986	97.14/0.974	98.57/0.988	99.13/0.992	99.20/0.992
Formula Construction	12.92/0.201	59.28/0.665	22.75/0.315	79.76/0.820	61.90/0.644	77.61/0.876	89.34/0.898	88.02/0.882	89.46/0.893	67.85/0.786
Formula Calculation	27.27/0.317	77.49/0.783	85.99/0.868	83.59/0.857	53.57/0.536	98.68/0.990	92.81/0.947	98.92/0.993	98.80/0.988	54.76/0.548
FinanceBench	-/0.443	-/0.528	-/0.573	-/0.564	-/0.552	-/0.511	-/0.542	-/0.477	-/0.575	-/0.544
Financial Math	11.00/0.136	10.50/0.134	21.50/0.255	27.00/0.296	19.00/0.204	30.00/0.332	26.50/0.307	28.50/0.317	34.50/0.370	66.00/0.785
Overall Average	37.05	52.36	57.16	63.39	58.97	74.74	74.29	69.53	73.82	71.08

LoRA Methods

We use four LoRA methods: LoRA, QLoRA, DoRA, and rsLoRA.

You can download LoRA adapaters from the lora_adapters directory or Hugging Face. The adapters are fine-tuned on financial datasets using various configurations (e.g., 8-bit rank 8 and 4-bit rank 4).

File Structure

FinLoRA/
├── data/
│   ├── *.py  # Dataset processing code
│   ├── test/  # Test datasets
│   └── train/  # Training datasets
├── docs/  # Documentation
├── lora/
│   ├── finetune.py  # Fine-tuning code using Axolotl
│   ├── flowertune-llm/  # Federated learning implementation
│   └── lora/  # Fine-tuning using HF PEFT
├── lora_adapters/
│   ├── 4bits_r4/
│   ├── 8bits_r8/
│   ├── 8bits_r8_dora/
│   ├── 8bits_r8_rslora/
│   ├── fp16_r8/
├── test/
│   ├── __init__.py
│   ├── fingpt_tests/
│   ├── inference.py
│   ├── README.md
│   ├── *.sh  # Test shell scripts
│   └── *.py
├── environment.yml
├── LICENSE
├── README.md
├── readthedocs.yml
├── requirements.txt
├── setup.sh
└── sphinx_requirements.txt

Guide

Environment Setup

This guide will help you set up the environment for FinLoRA.

GPU Requirements

FinLoRA works with CUDA-enabled GPUs. CUDA should be at least version 11.8.

GPU memory requirements depend on the size of the LLM, quantization, batch size, and prompt length. For Llama 3.1 8B Instruct, we reccomend the following:

NVIDIA GPU with at least 24GB VRAM for 8-bit quantization
NVIDIA GPU with at least 16GB VRAM for 4-bit quantization

Runpod Setup (optional)

If you don't have access to GPUs with sufficient VRAM, you can rent them affordably from cloud providers like RunPod. To create a proper Runpod environment, you can follow these steps:

After you have created a Runpod account, go to the "Billing" tab and add $10 of credits. In our testing, when we rented 4 A5000 GPUs, we spent an average of $1.05/hr.
Now go click on the "Storage" tab. This tab allows you to create network volumes for persistent storage of uploaded files and models if you disconnect from the service.
Click on "New Network Volume" and select a Datacenter that shows that RTX A5000s are available.
Name your network volume and add make the size of the volume 50 GB. This should only cost $3.50 a month. Then click "Create Network Volume."
Under the storage tab, click "Deploy" on your network volume. Select the RTX A5000 GPU.
Name your pod, set "GPU Count" to 4, and select the "Runpod Pytorch 2.8.0" pod template. Note: If you only want to run inference instead of fine-tuning, you can select 1.
Make sure the instance pricing is set to on-demand. This should cost $0.26/hr per A5000 GPU.
Click "Deploy On-Demand."

Package Installation

You can set up the environment using either the provided setup script or conda environment file.

Using setup.sh

The easiest way to set up the environment is to use the provided setup script:

git clone https://github.com/Open-Finance-Lab/FinLoRA.git
cd FinLoRA
chmod +x setup.sh
./setup.sh

This script will install all the required dependencies, including:

PyTorch with CUDA support
Transformers library
Axolotl for fine-tuning
Other required libraries

Using conda

Alternatively, you can use the provided conda environment file:

conda env create -f environment.yml
conda activate finenv

Login to Hugging Face

When using Llama models, you need to login to Hugging Face due to the LLMs being gated. Run the following command:

huggingface-cli login

You will be prompted to enter your Hugging Face token. You can find your token at https://huggingface.co/settings/tokens.

Alternatively, you can set the HF_TOKEN environment variable:

export HF_TOKEN=your_token_here

Fine-Tuning

To perform fine-tuning, first navigate to the lora directory and fetch deepspeed configs. The deepspeed configs allow the fine-tuning framework to parallelize fine-tuning across GPUs

cd lora
axolotl fetch deepspeed_configs

Add your fine-tuning dataset (e.g., your_dataset_train.jsonl) in the ../data/train/ folder.
Open finetune_configs.json and add the configursation for the LoRA adapter you want to create with hyperparameters defined. There are examples you can reference in the file. The following is an example:

"your_config_name": {
  "base_model": "meta-llama/Llama-3.1-8B-Instruct",
  "dataset_path": "../data/train/your_dataset_train.jsonl",
  "lora_r": 8,
  "quant_bits": 8,
  "learning_rate": 0.0001,
  "num_epochs": 1,
  "batch_size": 4,
  "gradient_accumulation_steps": 2
}

Run fine-tuning with your configuration by executing the following command:

python finetune.py your_config_name

For example, to use the existing formula configuration:

python finetune.py formula_llama_3_1_8b_8bits_r8

After fine-tuning completes, the adapter will be saved in the axolotl-output subfolder within the 'lora' folder. Download the adapter files from this directory. You can remove checkpoints

If you don't have compute resources, you can rent 4 A5000s at a low cost from RunPod.

Using Your LoRA Adapter

Once you have trained a LoRA adapter, you can use it for inference by modifying the following code:

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model and tokenizer
base_model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Load and apply the LoRA adapter
adapter_path = "./path/to/your/adapter"  # Path to your adapter
model = PeftModel.from_pretrained(base_model, adapter_path)

# Generate text
prompt = "What is the formula for the Black-Scholes model?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        # This ensures that you get reproducible responses.
        temperature=0,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Federated Learning

To run federated LoRA, navigate to the federated learning directory:

cd lora/flowertune-llm

Install dependencies:

pip install -e .

Run the federated learning simulation:

flwr run .

You can customize the configuration:

# Use OpenLLaMA-7B instead of 3B and 8-bits quantization
flwr run . --run-config "model.name='openlm-research/open_llama_7b_v2' model.quantization=8"

# Run for 50 rounds with 25% client participation
flwr run . --run-config "num-server-rounds=50 strategy.fraction-fit=0.25"

Evaluation

Please note that the fp16 adapters we created are experimental and untested. They may not be suitable for use.

To test adapters, navigate to the test directory:

cd test
bash run_all_adapters.sh

Define the adapters and tasks you want to run in the script, then execute:

bash run_all_adapters.sh

To run a base model (e.g., OpenAI):

bash run_openai.sh

Enter your API key in the file, set the tasks to run, then execute:

bash run_openai.sh

Contributing

We welcome contributions to the FinLoRA project! Please feel free to submit issues, feature requests, and pull requests.

License

This project is released under OpenMDW-1.0. Please check individual dataset licenses for specific usage terms.

Citation

If you use this work, please cite:

@article{wang2025finlora,
  title={FinLoRA: Benchmarking LoRA Methods for Fine-Tuning LLMs on Financial Datasets},
  author={Wang, Dannong and Patel, Jaisal and Zha, Daochen and Yang, Steve Y and Liu, Xiao-Yang},
  journal={arXiv preprint arXiv:2505.19819},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FinLoRA: Benchmarking LoRA Methods for Fine-Tuning LLMs on Financial Datasets

Abstract

Motivation

Financial Tasks

Datasets

Dataset Formats

Benchmark Results

LoRA Methods

File Structure

Guide

Environment Setup

GPU Requirements

Runpod Setup (optional)

Package Installation

Using setup.sh

Using conda

Login to Hugging Face

Fine-Tuning

Using Your LoRA Adapter

Federated Learning

Evaluation

Contributing

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 562 Commits
_images		_images
data		data
docs		docs
lora		lora
lora_adapters		lora_adapters
rebuttal		rebuttal
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
readthedocs.yml		readthedocs.yml
rebuttal.md		rebuttal.md
requirements.txt		requirements.txt
setup.sh		setup.sh
sphinx_requirements.txt		sphinx_requirements.txt

License

Open-Finance-Lab/FinLoRA

Folders and files

Latest commit

History

Repository files navigation

FinLoRA: Benchmarking LoRA Methods for Fine-Tuning LLMs on Financial Datasets

Abstract

Motivation

Financial Tasks

Datasets

Dataset Formats

Benchmark Results

LoRA Methods

File Structure

Guide

Environment Setup

GPU Requirements

Runpod Setup (optional)

Package Installation

Using setup.sh

Using conda

Login to Hugging Face

Fine-Tuning

Using Your LoRA Adapter

Federated Learning

Evaluation

Contributing

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages