Releases · aws-neuron/aws-neuron-sdk

29 Oct 23:21

v2.26.1

1043931

Neuron SDK Release - October 29, 2025 Latest

Latest

Overview
Release 2.26.1 of the AWS Neuron SDK includes bug fixes applied to the AWS Neuron SDK v2.26.0. See the Neuron SDK v2.26.0 release notes for the full set of changes that shipped with the 2.26.0 release.

Bug fixes in this release
Fix: To address an issue with out-of-memory errors in torch-neuronx, this release enables you to use the Neuron Runtime API to apply direct memory allocation.

Resources
For the set of SDK package version changes in 2.26.1, see Release Content.

Assets 2

19 Sep 22:33

suneelnj

v2.26.0

49ee15c

Neuron SDK Release - September 18, 2025

AWS Neuron SDK 2.26.0 adds support for PyTorch 2.8, JAX 0.6.2, along with support for Python 3.11, and introduces inference improvements on Trainium2 (Trn2). This release includes expanded model support, enhanced parallelism features, new Neuron Kernel Interface (NKI) APIs, and improved development tools for optimization and profiling.

Inference Updates
NxD Inference - Model support expands with beta releases of Llama 4 Scout and Maverick variants on Trn2. The FLUX.1-dev image generation models are now available in beta on Trn2 instances.

Expert parallelism is now supported in beta, enabling MoE expert distribution across multiple NeuronCores. This release introduces on-device forward pipeline execution in beta and adds sequence parallelism in MoE routers for model deployment flexibility.

Neural Kernel Interface (NKI)
New APIs enable additional optimization capabilities:
- gelu_apprx_sigmoid: GELU activation with sigmoid approximation
- select_reduce: Selective element copying with maximum reduction

sequence_bounds: Sequence bounds computation

API enhancements include:

tile_size: Added total_available_sbuf_size field
dma_transpose: Added axes parameter for 4D transpose.
activation: Added gelu_apprx_sigmoid operation

Developer Tools
Neuron Profiler improvements include the ability to select multiple semaphores at once to correlate pending activity with semaphore waits and increments. Additionally, system profile grouping now uses a global NeuronCore ID instead of a process local ID for visibility across distributed workloads. The Profiler also adds warnings for dropped events due to limited buffer space.

The ncom-test utility adds State Buffer support on Trn2 for collective operations, including all-reduce, all-gather, and reduce-scatter operations. Error reporting provides messages for invalid all-to-all collective sizes to help developers identify and resolve issues.

Deep Learning AMI and Containers
The Deep Learning AMI now supports PyTorch 2.8 on Amazon Linux 2023 and Ubuntu 22.04. Container updates include PyTorch 2.8.0 and Python 3.11 across all DLCs. The transformers-neuronx environment and package have been removed from PyTorch inference DLAMI/DLC.

Component release highlights
These component release notes contain details on specific new and improved features, as well as breaking changes, bug fixes, and known issues for that component area of the Neuron SDK.

Assets 2

01 Aug 00:21

awsjoshir

v2.25.0

56af36c

Neuron SDK Release - July 31, 2025

Neuron 2.25.0 delivers updates across several key areas: inference performance optimizations, expanded model support, enhanced profiling capabilities, improved monitoring and observability tools, framework updates, and refreshed development environments and container offerings. The release includes bug fixes across the SDK components, along with updated tutorials and documentation for new features and model deployments.

Inference Optimizations (NxD Core and NxDI)
Neuron 2.25.0 introduces performance optimizations and new capabilities including:

On-device Forward Pipeline, reducing latency by up to 43% in models like Pixtral
Context and Data Parallel support for improved batch scaling
Chunked Attention for efficient long sequence processing
128k context length support for Llama 70B models
Automatic Aliasing (Beta) for faster tensor operations
Disaggregated Serving (Beta) showing 20% improvement in ITL/TTST

Model Support (NxDI)
Neuron 2.25.0 expands model support to include:

Qwen3 dense models (0.6B to 32B parameters)
Flux.1-dev model for text-to-image generation (Beta)
Pixtral-Large-Instruct-2411 for image-to-text generation (Beta)

Profiling Updates
Enhancements to profiling capabilities include:

Addition of timestamp sync points to align device execution with CPU events
Expanded JSON output providing the same detailed data set used by the Neuron Profiler UI
New total active time metric showing accelerator utilization as percentage of total runtime
Fixed DMA active time calculation for more accurate measurements

Monitoring and Observability

neuron-ls now displays CPU and NUMA node affinity information
neuron-ls adds NeuronCore IDs display for each Neuron Device
neuron-monitor improves accuracy of device utilization metrics

Framework Updates

JAX 0.6.1 support added, maintaining compatibility with versions 0.4.31-0.4.38 and 0.5
vLLM support upgraded to version 0.9.x V0

Development Environment Updates
Neuron SDK updated to version 2.25.0 in:

Deep Learning AMIs on Ubuntu 22.04 and Amazon Linux 2023
Multi-framework DLAMI with environments for both PyTorch and JAX
PyTorch 2.7 Single Framework DLAMI
JAX 0.6 Single Framework DLAMI

Container Support
Neuron SDK updated to version 2.25.0 in:

PyTorch 2.7 Training and Inference DLCs
JAX 0.6 Training DLC
vLLM 0.9.1 Inference DLC
Neuron Device Plugin and Scheduler container images for Kubernetes integration

Assets 2

25 Jun 02:54

ivashkst

v2.24.0

90a7fe4

Neuron SDK Release - June 24, 2025

Neuron version 2.24 introduces new inference capabilities including prefix caching, disaggregated inference (Beta), and context parallelization support (Beta). This release also includes NKI language enhancements and enhanced profiling visualizations for improved debugging and performance analysis. Neuron 2.24 adds support for PyTorch 2.7 and JAX 0.6, updates existing DLAMIs and DLCs, and introduces a new vLLM inference container.

Assets 2

20 May 19:03

natemail-aws

v2.23.0

4eba310

Neuron SDK Release - May 20, 2025

With the Neuron 2.23 release, we move NxD Inference (NxDI) library out of beta. It is now recommended for all multi-chip inference use-cases. In addition, Neuron has new training capabilities, including Context Parallelism and ORPO, NKI improvements (new operators and ISA features), and new Neuron Profiler debugging and performance analysis optimizations. Finaly, Neuron now supports PyTorch 2.6 and JAX 0.5.3.

Inference: NxD Inference (NxDI) moves from beta to GA. NxDI now supports Persistent Cache to reduce compilation times, and optimizes model loading with improved weight sharding performance.

Training: NxD Training (NxDT) added Context Parallelism support (beta) for Llama models, enabling sequence lengths up to 32K. NxDT now supports model alignment, ORPO, using DPO-style datasets. NxDT has upgraded supports for 3rd party libraries, specifically: PyTorch Lightning 2.5, Transformers 4.48, and NeMo 2.1.

Neuron Kernel Interface (NKI): New support for 32-bit integer nki.language.add and nki.language.multiply on GPSIMD Engine. NKI.ISA improvements include range_select for Trainium2, fine-grained engine control, and enhanced tensor operations. New performance tuning API no_reorder has been added to enable user-scheduling of instructions. When combined with allocation, this enables software pipelining. Language consistency has been improved for arithmetic operators (+=, -=, /=, *=) across loop types, PSUM, and SBUF.

Neuron Profiler: Profiling performance has improved, allowing users to view profile results 5x times faster on average. New features include timeline-based error tracking and JSON error event reporting, supporting execution and OOB error detection. Additionally, this release improves multiprocess visualization with Perfetto.

Neuron Monitoring: Added Kubernetes context information (pod_name, namespace, and container_name) to neuron monitor prometheus output, enabling resource utilization tracking by pod, namespace, and container.

Neuron DLCs: This release updates containers with PyTorch 2.6 support for inference and training. For JAX DLC, this release adds JAX 0.5.0 training support.

Neuron DLAMIs: This release updates MultiFramework AMIs to include PyTorch 2.6, JAX 0.5, and TensorFlow 2.10 and Single Framework AMIs for PyTorch 2.6 and JAX 0.5.

Assets 2

12 May 23:10

natemail-aws

v2.22.1

c3ac28c

Neuron SDK Release - May 12, 2025

Neuron 2.22.1 release includes a Neuron Driver update that resolves DMA abort errors on Trainium2 devices. These errors were previously occurring in the Neuron Runtime during specific workload executions.

Assets 2

04 Apr 05:52

natemail-aws

v2.22.0

9f6387f

Neuron SDK Release - April 3, 2025

The Neuron 2.22 release includes performance optimizations, enhancements and new capabilities across the Neuron software stack.

For inference workloads, the NxD Inference library now supports Llama-3.2-11B model and supports multi-LoRA serving, allowing customers to load and serve multiple LoRA adapters. Flexible quantization features have been added, enabling users to specify which model layers or NxDI modules to quantize. Asynchronous inference mode has also been introduced, improving performance by overlapping Input preparation with model execution.

For training, we added LoRA supervised fine-tuning to NxD Training to enable additional model customization and adaptation.

Neuron Kernel Interface (NKI): This release adds new APIs in nki.isa, nki.language, and nki.profile. These enhancements provide customers with greater flexibility and control.

The updated Neuron Runtime includes optimizations for reduced latency and improved device memory footprint. On the tooling side, the Neuron Profiler 2.0 (beta) has added UI enhancements and new event type support.

Neuron DLCs: this release reduces DLC image size by up to 50% and enables faster build times with updated Dockerfiles structure. On the Neuron DLAMI side, new PyTorch 2.5 single framework DLAMIs have been added for Ubuntu 22.04 and Amazon Linux 2023, along with several new virtual environments within the Neuron Multi Framework DLAMIs.

Assets 2

15 Jan 06:22

natemail-aws

v2.21.1

ae20816

Neuron SDK Release - January 14, 2025

Neuron 2.21.1 release pins Transformers NeuronX dependency to transformers<4.48 and fixes DMA abort errors on Trn2.

Additionally, this release addresses NxD Core and Training improvements, including fixes for sequence parallel support in quantized models and a new flag for dtype control in Llama3/3.1 70B configurations. See NxD Training Release Notes (neuronx-distributed-training) for details.

NxD Inference update includes minor bug fixes for sampling parameters. See NxD Inference Release Notes.

Neuron supported DLAMIs and DLCs have been updated to Neuron 2.21.1 SDK. Users should be aware of an incompatibility between Tensorflow-Neuron 2.10 (Inf1) and Neuron Runtime 2.21 in DLAMIs, which will be addressed in the next minor release. See Neuron DLAMI Release Notes.

The Neuron Compiler includes bug fixes and performance enhancements specifically targeting the Trn2 platform.

Assets 2

21 Dec 07:34

awsjoshir

v2.21.0

752349f

Neuron SDK Release - December 20, 2024

Overview: Neuron 2.21.0 introduces support for AWS Trainium 2 and Trn2 instances, including the trn2.48xlarge instance type and Trn2 UltraServer. The release adds new capabilities in both training and inference of large-scale models. It introduces NxD Inference (beta), a PyTorch-based library for deployment, Neuron Profiler 2.0 (beta), and PyTorch 2.5 support across the Neuron SDK, and Logical NeuronCore Configuration (LNC) for optimizing NeuronCore allocation. The release enables Llama 3.1 405B model inference on a single trn2.48xlarge instance.

NxD Inference: NxD Inference (beta) is a new PyTorch-based inference library for deploying large-scale models on AWS Inferentia and Trainium instances. It enables PyTorch model onboarding with minimal code changes and integrates with vLLM. NxDI supports various model architectures, including Llama versions for text processing (Llama 2, Llama 3, Llama 3.1, Llama 3.2, and Llama 3.3), Llama 3.2 multimodal for multimodal tasks, and Mixture-of-Experts (MoE) model architectures including Mixtral and DBRX. The library supports quantization methods, includes dynamic sampling, and is compatible with HuggingFace checkpoints and generate() API. NxDI also supports distributed strategies including tensor parallelism and incorporates speculative decoding techniques (Draft model and EAGLE). The release includes a Llama 3.1 405B model sample on a single trn2.48xlarge instance Llama 3.1 405B model inference.

For more information, see NxD Inference documentation and check the NxD Inference Github repository: aws-neuron/neuronx-distributed-inference

Transformers NeuronX (TNx): This release introduces several new features, including flash decoding support for speculative decoding, and on-device generation in speculative decoding flows. It adds Eagle speculative decoding with greedy and lossless sampling, as well as support for CPU compilation and sharded model saving. Performance improvements include optimized MLP and QKV for Llama models with sequence parallel norm and control over concurrent compilation workers.

Training Highlights: NxD Training in this release adds support for HuggingFace Llama3/3.1 70B on trn2 instances, introduces DPO support for post-training model alignment, and adds support for Mixture-of-Experts (MoE) models including Mixtral 7B. The release includes improved checkpoint conversion capabilities and supports MoE with Tensor, Sequence, Pipeline, and Expert parallelism.

ML Frameworks: Neuron 2.21.0 adds PyTorch 2.5 coming with improved support for eager mode, FP8, and Automatic Mixed Precision capabilities. JAX support extends to version 0.4.35, including support for JAX caching APIs.

Logical NeuronCore Configuration (LNC): This release introduces LNC for Trainium2 instances, optimizing NeuronCore allocation for ML applications. LNC offers two configurations: default (LNC=2) combining two physical cores, and alternative (LNC=1) mapping each physical core individually. This feature allows users to efficiently manage resources for large-scale model training and deployment through runtime variables and compiler flags.

Neuron Profiler 2.0: The new profiler provides system and device-level profiling, timeline annotations, container integration, and support for distributed workloads. It includes trace export capabilities for Perfetto visualization and integration with JAX and PyTorch profilers, and support for Logical NeuronCore Configuration (LNC).

Neuron Kernel Interface (NKI): NKI now supports Trainium2 including Logical NeuronCore Configuration (LNC), adds SPMD capabilities for multi-core operations, and includes new modules and APIs including support for float8_e5m2 datatype.

Deep Learning Containers (DLAMIs): This release expands support for JAX 0.4 within the Multi Framework DLAMI. It also introduces NeuronX Distributed Training (NxDT), Inference (NxDI), and Core (NxD) with PyTorch 2.5 support. Additionally, a new Single Framework DLAMI for TensorFlow 2.10 on Ubuntu 22 is now available.

Deep Learning Containers (DLCs): This release introduces new DLCs for JAX 0.4 training and PyTorch 2.5.1 inference and training. All DLCs have been updated to Ubuntu 22, and the pytorch-inference-neuronx DLC now supports both NxD Inference and TNx libraries.

Documentation: Documentation updates include architectural details about Trainium2 and NeuronCore-v3, along with specifications and topology information for the trn2.48xlarge instance type and Trn2 UltraServer.

Software Maintenance: This release includes the following announcements:

Announcing migration of NxD Core examples from NxD Core repository to NxD Inference repository in next release
Announcing end of support for Neuron DET tool starting next release
PyTorch Neuron versions 1.9 and 1.10 no longer supported
Announcing end of support for PyTorch 2.1 for Trn1, Trn2 and Inf2 starting next release
Announcing end of support for PyTorch 1.13 for Trn1 and Inf2 starting next release
Announcing end of support for Python 3.8 in future releases
Announcing end of support for Ubuntu20 DLCs and DLAMIs

Amazon Q: Use Q Developer as your Neuron Expert for general technical guidance and to jumpstart your NKI kernel development.

Assets 2

03 Dec 20:50

awsjoshir

v2.21.0.beta

2086a70

Neuron SDK Release - December 3, 2024

Neuron 2.21 beta introduces support for AWS Trainium2 and Trn2 instances, including the trn2.48xlarge instance type and u-trn2 UltraServer. The release showcases Llama 3.1 405B model inference using NxD Inference on a single trn2.48xlarge instance, and FUJI 70B model training using the AXLearn library across eight trn2.48xlarge instances.

NxD Inference, a new PyTorch-based library for deploying large language models and multi-modality models, is introduced in this release. It integrates with vLLM and enables PyTorch model onboarding with minimal code changes. The release also adds support for AXLearn training for JAX models.

The new Neuron Profiler 2.0 introduced in this release offers system and device-level profiling, timeline annotations, and container integration. The profiler supports distributed workloads and provides trace export capabilities for Perfetto visualization.

The documentation has been updated to include architectural details about Trainium2 and NeuronCore-v3, along with specifications and topology information for the trn2.48xlarge instance type and u-trn2 UltraServer.

Note:
This release (Neuron 2.21 Beta) was only tested with Trn2 instances. The next release (Neuron 2.21) will support all instances (Inf1, Inf2, Trn1, and Trn2).

For access to this release (Neuron 2.21 Beta) contact your account manager.

Assets 2

Releases: aws-neuron/aws-neuron-sdk

Neuron SDK Release - October 29, 2025

Uh oh!

Neuron SDK Release - September 18, 2025

Uh oh!

Neuron SDK Release - July 31, 2025

Uh oh!

Neuron SDK Release - June 24, 2025

Uh oh!

Neuron SDK Release - May 20, 2025

Uh oh!

Neuron SDK Release - May 12, 2025

Uh oh!

Neuron SDK Release - April 3, 2025

Uh oh!

Neuron SDK Release - January 14, 2025

Uh oh!

Neuron SDK Release - December 20, 2024

Uh oh!

Neuron SDK Release - December 3, 2024

Uh oh!