Skip to content

[WIP] Large scale rejection sampling with VLLM #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from
Draft

Conversation

msaroufim
Copy link
Member

@msaroufim msaroufim commented Jul 2, 2025

NOt intended to merge

This is a POC using Deepseek R (70B distlillation) 1, for some reason I'm running into connectivity issues for Qwen for large scale rejection sampling

If you squint it's a form of RL where you have a policy (generated kernels) with a tradeoff between explore (generate completely random samples) and exploit (generate using the best existing samples). It's not really gradient based RL but infra for that tends to be more complex whereas with this approach you can just purely use inference engines

python scripts/run_vllm_prototype.py --operations relu

deepseek-ai/DeepSeek-R1-Distill-Llama-70B

The core idea is that we have

  1. Workers that generate samples running VLLM
  2. Workers that execute those samples
  3. A coordinator running on CPU and collecting metrics
  4. Queue requests and store metrics using redis

An obvious problem is that utilization is insanely low for executors but that might be unavoidable considering noisy neighbor problems

Prototyping on 8 GPUs but the goal is that this should run on 1,000
Screenshot 2025-07-02 at 8 44 55 PM

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 2, 2025
@msaroufim msaroufim marked this pull request as draft July 2, 2025 16:51
@msaroufim msaroufim changed the title [WIP] VLLM backend [WIP] Large scale rejection sampling with VLLM Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants