Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
115 commits
Select commit Hold shift + click to select a range
17c23cd
add llama3.2 GPU example (#12137)
ch1y0q Sep 29, 2024
9b75806
Update Windows GPU quickstart regarding demo (#12124)
ch1y0q Sep 29, 2024
644af2a
add basic llama 3.2 vision support (#12163)
MeouSker77 Oct 8, 2024
e2ef9e9
Delete deprecated docs/readthedocs directory (#12164)
liu-shaojun Oct 8, 2024
3d044db
add llama3.2-vision Pytorch example (#12165)
lzivan Oct 9, 2024
412cf8e
[UPDATE] update mddocs/DockerGuides/vllm_docker_quickstart.md (#12166)
ACupofAir Oct 9, 2024
78d2531
optimize qwen2 vl perf again (#12167)
MeouSker77 Oct 9, 2024
aef1f67
Support LNL Windows release (#12169)
Oscilloscope98 Oct 9, 2024
535bee5
fix qwen2 vl again (#12174)
MeouSker77 Oct 10, 2024
8906626
Fix auto importer for LNL release (#12175)
Oscilloscope98 Oct 10, 2024
0ef7e1d
fix vllm docs (#12176)
gc-fu Oct 10, 2024
ac44e98
Update Windows guide regarding LNL support (#12178)
Oscilloscope98 Oct 11, 2024
4d93bb8
Initial support of NPU level0 Model (#12177)
rnwang04 Oct 11, 2024
724b2ae
add npu-level0 pipeline.dll to ipex-llm (#12181)
liu-shaojun Oct 11, 2024
1daab45
Upgrade oneccl to 0.0.4 in serving-xpu image (#12185)
liu-shaojun Oct 11, 2024
310f18c
update NPU pipeline generate (#12182)
rnwang04 Oct 11, 2024
f983f1a
Add Qwen2-VL gpu example (#12135)
ada-jt1725 Oct 11, 2024
ddcdf47
Support Windows ARL release (#12183)
Oscilloscope98 Oct 11, 2024
6ffaec6
[UPDATE] add prefix caching document into `vllm_docker_quickstart.md`…
ACupofAir Oct 11, 2024
49eb206
add --blocksize to doc and script (#12187)
liu-shaojun Oct 12, 2024
a768d71
Small fix to LNL installation guide (#12192)
Oscilloscope98 Oct 14, 2024
8e35800
Add llama 3.1 in igpu perf (#12194)
JinBridger Oct 14, 2024
7d80db7
Add benchmark_util for `transformers >= 4.44.0` (#12171)
lzivan Oct 14, 2024
7da3ab7
Add missing link for Llama3.2-Vision (#12197)
Oscilloscope98 Oct 14, 2024
516b578
Support cpp release for ARL on Windows (#12189)
Oscilloscope98 Oct 14, 2024
f8d1adc
Fix Llama 3.2 & 3.1 on LNL (#12196)
Oscilloscope98 Oct 14, 2024
d534458
optimize internvl2 vision model's attention (#12198)
MeouSker77 Oct 15, 2024
9b81236
optimzie qwen2-vl vision (#12203)
MeouSker77 Oct 15, 2024
f6611f9
optimize llama3.2 vison attention again (#12204)
MeouSker77 Oct 15, 2024
c9ac39f
Add Llama 3.2 to iGPU performance test (`transformers 4.45`) (#12209)
Oscilloscope98 Oct 15, 2024
f17cc4f
feat: add llama3.2-11b-vision in all in one (#12207)
cranechu0131 Oct 16, 2024
e279148
optimize llama3.2 vision again (#12211)
MeouSker77 Oct 16, 2024
bb247e9
refactor merge_qkv and attention_softmax (#12213)
MeouSker77 Oct 16, 2024
9104a16
refactor phi-2 to reduce old fuse rope usage (#12214)
MeouSker77 Oct 16, 2024
a4a7586
refactor gemma to reduce old fuse rope usage (#12215)
MeouSker77 Oct 16, 2024
26390f9
Update oneccl_wks_installer to 2024.0.0.4.1 (#12217)
liu-shaojun Oct 17, 2024
667f0db
Update Eagle example to Eagle2+ipex-llm integration (#11717)
jenniew Oct 17, 2024
324bcb0
refactor to reduce old rope usage (#12219)
MeouSker77 Oct 17, 2024
9ea6944
refactor ot remove old rope usage (#12224)
MeouSker77 Oct 17, 2024
b88c1df
Add Llama 3.1 & 3.2 to Arc Performance test (#12225)
Oscilloscope98 Oct 17, 2024
7825dc1
Upgrade oneccl to 0.0.5 (#12223)
liu-shaojun Oct 18, 2024
fe3b5cd
[Update] mmdocs/dockerguide vllm-quick-start awq,gptq online serving …
ACupofAir Oct 18, 2024
b10fc89
Update new reference link of xpu/docker/readme.md (#12188)
ACupofAir Oct 18, 2024
9d7f42f
Support manually trigger of dGPU perf test on Windows (#12229)
Oscilloscope98 Oct 18, 2024
ef65962
Small update to Windows dGPU perf test (#12230)
Oscilloscope98 Oct 18, 2024
5935b25
Further update windows gpu perf test regarding results integrity chec…
Oscilloscope98 Oct 18, 2024
da9270b
Further update to Windows dGPU perf test (#12233)
Oscilloscope98 Oct 18, 2024
ea5154d
Further update to Windows dGPU perf test (#12237)
Oscilloscope98 Oct 21, 2024
ac2dac8
Disable 4k input test for now for Windows dGPU performance test (#12239)
Oscilloscope98 Oct 21, 2024
b3df474
Fix Gemma 2 on LNL (#12240)
Oscilloscope98 Oct 21, 2024
a35cf4d
Update README.md (#12242)
jason-dai Oct 22, 2024
d8c1287
Further update for Windows dGPU performance tests (#12244)
Oscilloscope98 Oct 22, 2024
ec465fb
Add lookup generate in load_low_bit (#12243)
cyita Oct 22, 2024
8fa98e2
Remove Qwen2-7b from NPU example for "Run Optimized Models (Experimen…
JinBridger Oct 22, 2024
aedc4ed
[ADD] add open webui + vllm serving (#12246)
ACupofAir Oct 23, 2024
e37f951
[NPU] Groupwise (#12241)
cyita Oct 23, 2024
aae2490
fix UT (#12247)
liu-shaojun Oct 23, 2024
e8cf7f3
npu gw small fix (#12249)
cyita Oct 23, 2024
88dc120
fix fp16 linear (#12250)
MeouSker77 Oct 23, 2024
578aef2
Fix models auto choose SdpaAttention with ipex 2.3 (#12252)
MeouSker77 Oct 23, 2024
567b77a
Support IR and blob format for llama level0 pipeline (#12251)
plusbang Oct 23, 2024
b685cf4
Fix npu group size setting of optimize_model=False (#12256)
plusbang Oct 23, 2024
cacc891
Fix PR validation (#12253)
MeouSker77 Oct 23, 2024
821fd96
Initial integrate our L0 Llama impl into ipex-llm (#12255)
rnwang04 Oct 24, 2024
f3a2b20
Optimize gpt2 (#12259)
MeouSker77 Oct 24, 2024
39c9d1d
fix code geex (#12261)
qiuxin2012 Oct 24, 2024
e0a95eb
Add llama_cpp_quickstart.zh-CN.md (#12221)
joan726 Oct 24, 2024
48fc638
use oneccl 0.0.5.1 (#12262)
liu-shaojun Oct 24, 2024
b5e6638
[NPU] Support llama groupwise (#12260)
cyita Oct 24, 2024
ae57e23
fix incompatibility between llama GW & llama pipeline (#12267)
rnwang04 Oct 25, 2024
f7f62a3
Add OpenVINO performance tests to all-in-one benchmark (#12238)
lzivan Oct 25, 2024
93895b2
Openvino all in one benchmark small fix (#12269)
Oscilloscope98 Oct 25, 2024
94c4568
Update windows installation guide regarding troubleshooting (#12270)
Oscilloscope98 Oct 25, 2024
43b25a2
Fix llama 3.2 vision on LNL (#12264)
Oscilloscope98 Oct 25, 2024
e713296
Update all-in-one benchmark (#12272)
Oscilloscope98 Oct 25, 2024
854398f
update example to reduce peak memory usage (#12274)
rnwang04 Oct 25, 2024
a0c6432
[NPU] Add support for loading a FunASR model (#12073)
sgwhat Oct 25, 2024
08cb065
hot-fix redundant import funasr (#12277)
sgwhat Oct 25, 2024
ec362e6
Add llama3 level0 example (#12275)
plusbang Oct 28, 2024
16074ae
Update Linux prerequisites installation guide for MTL iGPU (#12263)
Oscilloscope98 Oct 28, 2024
42a528d
Small update to MTL iGPU Linux Prerequisites installation guide (#12281)
Oscilloscope98 Oct 28, 2024
3fe2ea3
[NPU] Reuse prefill of acc lib for pipeline (#12279)
rnwang04 Oct 28, 2024
67014cb
Add benchmark_latency.py to docker serving image (#12283)
gc-fu Oct 28, 2024
1cef0c4
Update README.md (#12286)
jason-dai Oct 28, 2024
4467645
[NPU] Support l0 Llama groupwise (#12276)
cyita Oct 28, 2024
821b003
[NPU L0] update layernorm & code refactor (#12287)
rnwang04 Oct 29, 2024
0bbc04b
Add ollama_quickstart.zh-CN.md (#12284)
joan726 Oct 29, 2024
3700e81
[fix] vllm-online-benchmark first token latency error (#12271)
ACupofAir Oct 29, 2024
546f455
Patch sdpa check function in specific module attributes table (#12285)
Oct 29, 2024
3feb58d
Support baichuan2 for level0 pipeline (#12289)
plusbang Oct 29, 2024
5a15098
Initial support for quantized forward on CPU when `quantization_group…
Oscilloscope98 Oct 29, 2024
2b2cb9c
[NPU pipeline] Support save & load and update examples (#12293)
rnwang04 Oct 30, 2024
540eaeb
refactor attention_softmax (#12295)
MeouSker77 Oct 30, 2024
70037ad
Groupwise prefill optimization (#12291)
cyita Oct 30, 2024
46d8300
bugfix for qlora finetuning on GPU (#12298)
ada-jt1725 Oct 30, 2024
41b8064
Support minicpm-1B in level0 pipeline (#12297)
plusbang Oct 30, 2024
0763268
[NPU]Qwen2 groupwise performance opt (#12299)
cyita Oct 30, 2024
6f22133
Update AWQ and GPTQ GPU example (#12300)
lzivan Oct 31, 2024
29400e2
feat: change oneccl to internal (#12296)
cranechu0131 Oct 31, 2024
4cf1ccc
Update DPO EADME.md (#12162)
rahulunair Oct 31, 2024
416c191
Add Qwen pipeline and example (#12292)
hkvision Oct 31, 2024
72605c7
fix llama3.1/3.2 quantize kv check (#12302)
MeouSker77 Oct 31, 2024
97a0f7f
Codegeex support (#12303)
qiuxin2012 Oct 31, 2024
30f668c
updated transformers & accelerate requirements (#12301)
ada-jt1725 Oct 31, 2024
4892df6
Add qwen2-1.5b in l0 pipeline example (#12306)
plusbang Oct 31, 2024
3df6195
Fix application quickstart (#12305)
JinBridger Oct 31, 2024
b9853f9
fix qwen2 attention_mask slice (#12307)
MeouSker77 Oct 31, 2024
eda7649
Add minicpm-2b in L0 pipeline (#12308)
plusbang Nov 1, 2024
05c5d02
[NPU] Llama2 prefill use ov sdp (#12310)
cyita Nov 1, 2024
126f95b
Fix DPO finetuning example (#12313)
JinBridger Nov 1, 2024
d409d9d
[NPU L0] Update streaming mode of example (#12312)
plusbang Nov 1, 2024
f53bb4e
[NPU L0] Update 1st token generation (#12314)
plusbang Nov 1, 2024
cd5e22c
Update Llava GPU Example (#12311)
lzivan Nov 1, 2024
48123af
add `npu_group_size` for `transformers_int4_npu_win` in all-in-one be…
ch1y0q Nov 1, 2024
20755e8
Small fix to all-in-one benchmark scripts (#12317)
Oscilloscope98 Nov 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/actions/llm/download-llm-binary/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ runs:
mv windows-avx2/* python/llm/llm-binary/
mv windows-avx-vnni/* python/llm/llm-binary/
mv windows-avx/* python/llm/llm-binary/
mv windows-npu-level0/* python/llm/llm-binary/
fi
rm -rf linux-avx2 || true
rm -rf linux-avx512 || true
Expand All @@ -36,3 +37,4 @@ runs:
rm -rf windows-avx2 || true
rm -rf windows-avx-vnni || true
rm -rf windows-avx || true
rm -rf windows-npu-level0 || true
56 changes: 56 additions & 0 deletions .github/workflows/llm-binary-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -443,6 +443,62 @@ jobs:
path: |
release

check-windows-npu-level0-artifact:
if: ${{contains(inputs.platform, 'Windows')}}
runs-on: [Shire]
outputs:
if-exists: ${{steps.check_artifact.outputs.exists}}
steps:
- name: Check if built
id: check_artifact
uses: xSAVIKx/artifact-exists-action@v0
with:
name: windows-npu-level0

windows-build-npu-level0:
runs-on: [self-hosted, Windows, npu-level0]
needs: check-windows-npu-level0-artifact
if: needs.check-windows-npu-level0-artifact.outputs.if-exists == 'false'
steps:
- name: Set access token
run: |
echo "github_access_token=$env:GITHUB_ACCESS_TOKEN" >> $env:GITHUB_ENV
echo "github_access_token=$env:GITHUB_ACCESS_TOKEN"
- uses: actions/checkout@f43a0e5ff2bd294095638e18286ca9a3d1956744 # actions/checkout@v3
with:
repository: "intel-analytics/llm.cpp"
ref: ${{ inputs.llmcpp-ref }}
token: ${{ env.github_access_token }}
submodules: "recursive"
- name: Add msbuild to PATH
uses: microsoft/[email protected]
with:
msbuild-architecture: x64
- name: Add cmake to PATH
uses: ilammy/msvc-dev-cmd@v1
- name: Build binary
shell: powershell
run: |
cd bigdl-core-npu-level0
mkdir build
cd build
cmake ..
cmake --build . --config Release -j
- name: Move release binary
shell: powershell
run: |
cd bigdl-core-npu-level0
if (Test-Path ./release) { rm -r -fo release }
mkdir release
mv build/Release/pipeline.dll release/pipeline.dll
- name: Archive build files
uses: actions/upload-artifact@v3
with:
name: windows-npu-level0
path: |
bigdl-core-npu-level0/release


# to make llm-binary-build optionally skippable
dummy-step:
if: ${{ inputs.platform == 'Dummy' }}
Expand Down
Loading