Skip to content

Examples for data parallelism and model parallelism within Databricks notebook using PyTorch, Hugginface, and Accelerate

License

Notifications You must be signed in to change notification settings

willsmithDB/dbx_inference_examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed Model Inference Examples on Databricks with Huggingface and Accelerate

This repository provides tools and scripts for performing distributed model inference on Databricks using Huggingface and Accelerate. The focus is on leveraging data parallelism and model parallelism to efficiently utilize resources.

This is not official Databricks documentation nor official assets.

Models Used

  • Meta-Llama-3-8B-Instruct split across 2 V100 GPUs [32GB VRAM each] from Huggingface
  • Llama-3.3-70B-Instruct split across 2 A100 GPUs [80GB VRAM each] from Huggingface
  • Meta-Llama-3-8B-Instruct loaded into 1 V100 with 8-bit quantization from Huggingface

Data Used

Concepts

Data Parallelism

Data parallelism involves splitting the input data across multiple devices, where each device processes a portion of the data independently. This approach is beneficial for scaling inference tasks as it allows simultaneous processing of data batches.

Model Parallelism

Model parallelism involves splitting the model itself across multiple devices. This is useful for large models that cannot fit into the memory of a single device. By distributing the model, each device handles a portion of the model's layers or operations.

You can alsp have pipeline parallelism where we can end in as many inputs as we have GPUs whereas each part of will be working on their particular chunk. This results in more efficient usage of the GPUs rather than idling for certain model chunks during generation.

Model Parallelism

For more details, refer to the Huggingface Accelerate Distributed Inference Guide.

Configuration

Before running the scripts, ensure you update the config.yaml file with the appropriate settings for your environment.

Cluster Configurations

Cluster Configurations

General Cluster Config

  • DBR Version: 16.3 ML: 16.3.x-gpu-ml-scala2.12
  • Instance Type: Standard_NC48ads_A100_v4 [A100]
  • Memory: 440GB
  • GPUs: 2
  • VRAM: 80GB x 2 GPUs

Cluster Config for Llama 8b

  • DBR Version: 16.3 ML: 16.3.x-gpu-ml-scala2.12
  • Instance Type: Standard_NC12s_v3 [V100]
  • Memory: 224GB
  • GPUs: 2
  • VRAM: 16GB x 2 GPUs

Ensure your cluster is configured according to the specifications above to achieve optimal performance.

License

This project is licensed under the Apache License.

About

Examples for data parallelism and model parallelism within Databricks notebook using PyTorch, Hugginface, and Accelerate

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published