From f48dd5b92fcf040dce221ce8bec9e8ebfa4ef7eb Mon Sep 17 00:00:00 2001 From: Tong Li Date: Wed, 28 May 2025 11:20:16 +0800 Subject: [PATCH 1/2] update readme --- .../ColossalChat/coati/distributed/README.md | 22 +++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/applications/ColossalChat/coati/distributed/README.md b/applications/ColossalChat/coati/distributed/README.md index e0773d838d1a..123796844a74 100644 --- a/applications/ColossalChat/coati/distributed/README.md +++ b/applications/ColossalChat/coati/distributed/README.md @@ -2,6 +2,8 @@ This repository implements a distributed Reinforcement Learning (RL) training framework designed to fine-tune large language models using algorithms such as **GRPO** and **DAPO**. It supports multi-node and multi-GPU setups, scalable rollout generation, and policy optimization using libraries like VLLM. +**Please note that we are still under intensive development, stay tuned.** + --- ## 🚀 Features @@ -85,6 +87,23 @@ export HCCL_SOCKET_IFNAME=eno0 export RAY_COLLECTIVE_MEET_TIMEOUT_SECONDS=7200 ``` + +## Architecture Design + +
+

+ +

+
+Producer-Consumer Pattern: a classic software design pattern used for managing resources, data, or tasks between two different processes or threads. + +* Producer: inference engine which rollouts out examples and saves them into a shared buffer. +* Consumer: training framework which takes training examples from the shared buffer and train the policy model. + +Key features for Producer-Consumer Pattern: +* Buffer: Acts as a shared queue where the producer adds data and the consumer removes data. +* Concurrency: Rollout and training can work concurrently. + ## 🧠 Data Format Each data sample in the training or evaluation `.jsonl` file should follow this format: @@ -287,5 +306,4 @@ python rl_example.py ``` ## Acknowledgement - ---- +Colossal-RL is a distributed version of ColossalChat and inspired by a few awesome open-source projects. We would like to express our gratitude to the Fuyao-ray team and the vllm-ascend team for their support throughout the development of the this project. We also thank the following awesome open-source projects and algorithms: GRPO, DAPO, TRL, Verl, OpenRLHF, StreamRL, Qwen, Logic-RL. From 621afa79d23a019e0f67773ca76690b3d21975b2 Mon Sep 17 00:00:00 2001 From: duanjunwen <935724073@qq.com> Date: Wed, 28 May 2025 11:28:21 +0800 Subject: [PATCH 2/2] [fix] add vllm & vllm-ascend installation --- applications/ColossalChat/coati/distributed/README.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/applications/ColossalChat/coati/distributed/README.md b/applications/ColossalChat/coati/distributed/README.md index 123796844a74..68c5e5c68dbe 100644 --- a/applications/ColossalChat/coati/distributed/README.md +++ b/applications/ColossalChat/coati/distributed/README.md @@ -30,6 +30,15 @@ pip install -e . cd ./applications/ColossalChat pip install -e . ``` + +Install vllm and vllm-ascend +```bash +apt update -y +apt install -y libnuma-dev +pip install vllm==0.7.3 +pip install vllm-ascend==0.7.3 --extra-index https://download.pytorch.org/whl/cpu/ +``` + Install Fuyao Ray. Please update CANN before install fuyao ray ```bash