Skip to content

lemon42-ai/ThreatDetect-code-vulnerability-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ThreatDetect-code-vulnerability-detection

A repository for finetuning a ModernBERT-based model to detect vulnerabilities in code. This project adapts answerdotai/ModernBERT-base using LoRA techniques to classify code segments into vulnerability categories.

figure

Overview

ThreatDetect-code-vulnerability-detection is designed to automatically analyze code and detect potential vulnerabilities. By finetuning ModernBERT with a dedicated dataset of code samples, the model can classify code into multiple vulnerability categories (e.g., various CWE weaknesses) as well as mark code as safe.

Key features:

  • Finetuning using LoRA on Q and V matrices for efficient training.
  • Classification into 7 labels: six CWE-based vulnerability classes and one safe label.
  • Training scripts designed for high-performance computing environments using SLURM.

Vulnerability Labels

Label Description
CWE-119 Improper Restriction of Operations within the Bounds of a Memory Buffer
CWE-125 Out-of-bounds Read
CWE-20 Improper Input Validation
CWE-416 Use After Free
CWE-703 Improper Check or Handling of Exceptional Conditions
CWE-787 Out-of-bounds Write
safe Safe code

Repository Structure

.
├── data
│   ├── data_cleaning.ipynb                   # Notebook for cleaning and preparing the dataset
│   └── minified-diverseful-multilabels.parquet # Processed dataset for training
├── scripts
│   ├── torch_accelerate_lora.py                # Finetuning script using torch & accelerate frameworks
│   └── run_finetuning.sh                        # SLURM batch script to run training using sbatch
├── environment.yml                             # Environment configuration for micromamba/conda users
├── requirements.txt                            # Python dependencies for venv users
└── LICENSE                                     # MIT License

Getting Started

Environment Setup

You can set up your environment using one of the following methods:

Using Conda/Micromamba

  1. Ensure you have micromamba or conda installed.
  2. Create and activate the environment:
    micromamba env create -f environment.yml
    micromamba activate ThreatDetect-env
    (Alternatively, use conda env create -f environment.yml and conda activate ThreatDetect-env.)

Using Virtualenv

  1. Create and activate a virtual environment:
    python -m venv venv
    source venv/bin/activate   # On Windows use: venv\Scripts\activate
  2. Install dependencies:
    pip install -r requirements.txt

Finetuning the Model

The main finetuning script is located in the scripts folder. It utilizes torch and accelerate frameworks with LoRA modifications.

Running Training on SLURM

A SLURM batch file (run_finetuing.sh) is provided to run training on a cluster:

  1. Submit the job with:
    sbatch scripts/run_finetuing.sh
  2. Monitor the job logs for progress and accuracy metrics.

Running Locally

If you wish to run training locally (without SLURM), execute:

python scripts/torch_accelerate_lora.py

Ensure your environment is properly configured to use the appropriate GPU/CPU settings.


Model & Dataset Details

Model

  • Base Model: Finetuned from answerdotai/ModernBERT-base
  • Training Method: LoRA applied to Q and V matrices
  • Classification: Detects code vulnerabilities across 7 labels (six CWE-based classes and 'safe')

For further details and to explore the model, check out its Hugging Face Model Card.

Dataset


License

This project is licensed under the MIT License. See the LICENSE file for details.


Contributing

Contributions are welcome! Feel free to open issues or submit pull requests for improvements or bug fixes.

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Make your changes and test them.
  4. Submit a pull request with a detailed description of your changes.

Acknowledgements


Developed by Abdellah Oumida and Mohammed Sbaihi.


Happy coding and safe programming!

About

Fine-tuning ModernBERT to detect vulnerabilities in code.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •