Releases: aws-neuron/aws-neuron-sdk
Neuron SDK Release - October 29, 2025
Overview
Release 2.26.1 of the AWS Neuron SDK includes bug fixes applied to the AWS Neuron SDK v2.26.0. See the Neuron SDK v2.26.0 release notes for the full set of changes that shipped with the 2.26.0 release.
Bug fixes in this release
Fix: To address an issue with out-of-memory errors in torch-neuronx, this release enables you to use the Neuron Runtime API to apply direct memory allocation.
Resources
For the set of SDK package version changes in 2.26.1, see Release Content.
Neuron SDK Release - September 18, 2025
AWS Neuron SDK 2.26.0 adds support for PyTorch 2.8, JAX 0.6.2, along with support for Python 3.11, and introduces inference improvements on Trainium2 (Trn2). This release includes expanded model support, enhanced parallelism features, new Neuron Kernel Interface (NKI) APIs, and improved development tools for optimization and profiling.
Inference Updates
NxD Inference - Model support expands with beta releases of Llama 4 Scout and Maverick variants on Trn2. The FLUX.1-dev image generation models are now available in beta on Trn2 instances.
Expert parallelism is now supported in beta, enabling MoE expert distribution across multiple NeuronCores. This release introduces on-device forward pipeline execution in beta and adds sequence parallelism in MoE routers for model deployment flexibility.
Neural Kernel Interface (NKI)
New APIs enable additional optimization capabilities:
- gelu_apprx_sigmoid: GELU activation with sigmoid approximation
- select_reduce: Selective element copying with maximum reduction
sequence_bounds: Sequence bounds computation
API enhancements include:
tile_size: Added total_available_sbuf_size fielddma_transpose: Added axes parameter for 4D transpose.activation: Added gelu_apprx_sigmoid operation
Developer Tools
Neuron Profiler improvements include the ability to select multiple semaphores at once to correlate pending activity with semaphore waits and increments. Additionally, system profile grouping now uses a global NeuronCore ID instead of a process local ID for visibility across distributed workloads. The Profiler also adds warnings for dropped events due to limited buffer space.
The ncom-test utility adds State Buffer support on Trn2 for collective operations, including all-reduce, all-gather, and reduce-scatter operations. Error reporting provides messages for invalid all-to-all collective sizes to help developers identify and resolve issues.
Deep Learning AMI and Containers
The Deep Learning AMI now supports PyTorch 2.8 on Amazon Linux 2023 and Ubuntu 22.04. Container updates include PyTorch 2.8.0 and Python 3.11 across all DLCs. The transformers-neuronx environment and package have been removed from PyTorch inference DLAMI/DLC.
Component release highlights
These component release notes contain details on specific new and improved features, as well as breaking changes, bug fixes, and known issues for that component area of the Neuron SDK.
Neuron SDK Release - July 31, 2025
Neuron 2.25.0 delivers updates across several key areas: inference performance optimizations, expanded model support, enhanced profiling capabilities, improved monitoring and observability tools, framework updates, and refreshed development environments and container offerings. The release includes bug fixes across the SDK components, along with updated tutorials and documentation for new features and model deployments.
Inference Optimizations (NxD Core and NxDI)
Neuron 2.25.0 introduces performance optimizations and new capabilities including:
- On-device Forward Pipeline, reducing latency by up to 43% in models like Pixtral
- Context and Data Parallel support for improved batch scaling
- Chunked Attention for efficient long sequence processing
- 128k context length support for Llama 70B models
- Automatic Aliasing (Beta) for faster tensor operations
- Disaggregated Serving (Beta) showing 20% improvement in ITL/TTST
Model Support (NxDI)
Neuron 2.25.0 expands model support to include:
- Qwen3 dense models (0.6B to 32B parameters)
- Flux.1-dev model for text-to-image generation (Beta)
- Pixtral-Large-Instruct-2411 for image-to-text generation (Beta)
Profiling Updates
Enhancements to profiling capabilities include:
- Addition of timestamp sync points to align device execution with CPU events
- Expanded JSON output providing the same detailed data set used by the Neuron Profiler UI
- New total active time metric showing accelerator utilization as percentage of total runtime
- Fixed DMA active time calculation for more accurate measurements
- neuron-ls now displays CPU and NUMA node affinity information
- neuron-ls adds NeuronCore IDs display for each Neuron Device
- neuron-monitor improves accuracy of device utilization metrics
- JAX 0.6.1 support added, maintaining compatibility with versions 0.4.31-0.4.38 and 0.5
- vLLM support upgraded to version 0.9.x V0
Development Environment Updates
Neuron SDK updated to version 2.25.0 in:
- Deep Learning AMIs on Ubuntu 22.04 and Amazon Linux 2023
- Multi-framework DLAMI with environments for both PyTorch and JAX
- PyTorch 2.7 Single Framework DLAMI
- JAX 0.6 Single Framework DLAMI
Container Support
Neuron SDK updated to version 2.25.0 in:
- PyTorch 2.7 Training and Inference DLCs
- JAX 0.6 Training DLC
- vLLM 0.9.1 Inference DLC
- Neuron Device Plugin and Scheduler container images for Kubernetes integration
Neuron SDK Release - June 24, 2025
Neuron version 2.24 introduces new inference capabilities including prefix caching, disaggregated inference (Beta), and context parallelization support (Beta). This release also includes NKI language enhancements and enhanced profiling visualizations for improved debugging and performance analysis. Neuron 2.24 adds support for PyTorch 2.7 and JAX 0.6, updates existing DLAMIs and DLCs, and introduces a new vLLM inference container.
Neuron SDK Release - May 20, 2025
With the Neuron 2.23 release, we move NxD Inference (NxDI) library out of beta. It is now recommended for all multi-chip inference use-cases. In addition, Neuron has new training capabilities, including Context Parallelism and ORPO, NKI improvements (new operators and ISA features), and new Neuron Profiler debugging and performance analysis optimizations. Finaly, Neuron now supports PyTorch 2.6 and JAX 0.5.3.
Inference: NxD Inference (NxDI) moves from beta to GA. NxDI now supports Persistent Cache to reduce compilation times, and optimizes model loading with improved weight sharding performance.
Training: NxD Training (NxDT) added Context Parallelism support (beta) for Llama models, enabling sequence lengths up to 32K. NxDT now supports model alignment, ORPO, using DPO-style datasets. NxDT has upgraded supports for 3rd party libraries, specifically: PyTorch Lightning 2.5, Transformers 4.48, and NeMo 2.1.
Neuron Kernel Interface (NKI): New support for 32-bit integer nki.language.add and nki.language.multiply on GPSIMD Engine. NKI.ISA improvements include range_select for Trainium2, fine-grained engine control, and enhanced tensor operations. New performance tuning API no_reorder has been added to enable user-scheduling of instructions. When combined with allocation, this enables software pipelining. Language consistency has been improved for arithmetic operators (+=, -=, /=, *=) across loop types, PSUM, and SBUF.
Neuron Profiler: Profiling performance has improved, allowing users to view profile results 5x times faster on average. New features include timeline-based error tracking and JSON error event reporting, supporting execution and OOB error detection. Additionally, this release improves multiprocess visualization with Perfetto.
Neuron Monitoring: Added Kubernetes context information (pod_name, namespace, and container_name) to neuron monitor prometheus output, enabling resource utilization tracking by pod, namespace, and container.
Neuron DLCs: This release updates containers with PyTorch 2.6 support for inference and training. For JAX DLC, this release adds JAX 0.5.0 training support.
Neuron DLAMIs: This release updates MultiFramework AMIs to include PyTorch 2.6, JAX 0.5, and TensorFlow 2.10 and Single Framework AMIs for PyTorch 2.6 and JAX 0.5.
Neuron SDK Release - May 12, 2025
Neuron 2.22.1 release includes a Neuron Driver update that resolves DMA abort errors on Trainium2 devices. These errors were previously occurring in the Neuron Runtime during specific workload executions.
Neuron SDK Release - April 3, 2025
The Neuron 2.22 release includes performance optimizations, enhancements and new capabilities across the Neuron software stack.
For inference workloads, the NxD Inference library now supports Llama-3.2-11B model and supports multi-LoRA serving, allowing customers to load and serve multiple LoRA adapters. Flexible quantization features have been added, enabling users to specify which model layers or NxDI modules to quantize. Asynchronous inference mode has also been introduced, improving performance by overlapping Input preparation with model execution.
For training, we added LoRA supervised fine-tuning to NxD Training to enable additional model customization and adaptation.
Neuron Kernel Interface (NKI): This release adds new APIs in nki.isa, nki.language, and nki.profile. These enhancements provide customers with greater flexibility and control.
The updated Neuron Runtime includes optimizations for reduced latency and improved device memory footprint. On the tooling side, the Neuron Profiler 2.0 (beta) has added UI enhancements and new event type support.
Neuron DLCs: this release reduces DLC image size by up to 50% and enables faster build times with updated Dockerfiles structure. On the Neuron DLAMI side, new PyTorch 2.5 single framework DLAMIs have been added for Ubuntu 22.04 and Amazon Linux 2023, along with several new virtual environments within the Neuron Multi Framework DLAMIs.
Neuron SDK Release - January 14, 2025
Neuron 2.21.1 release pins Transformers NeuronX dependency to transformers<4.48 and fixes DMA abort errors on Trn2.
Additionally, this release addresses NxD Core and Training improvements, including fixes for sequence parallel support in quantized models and a new flag for dtype control in Llama3/3.1 70B configurations. See NxD Training Release Notes (neuronx-distributed-training) for details.
NxD Inference update includes minor bug fixes for sampling parameters. See NxD Inference Release Notes.
Neuron supported DLAMIs and DLCs have been updated to Neuron 2.21.1 SDK. Users should be aware of an incompatibility between Tensorflow-Neuron 2.10 (Inf1) and Neuron Runtime 2.21 in DLAMIs, which will be addressed in the next minor release. See Neuron DLAMI Release Notes.
The Neuron Compiler includes bug fixes and performance enhancements specifically targeting the Trn2 platform.
Neuron SDK Release - December 20, 2024
Overview: Neuron 2.21.0 introduces support for AWS Trainium 2 and Trn2 instances, including the trn2.48xlarge instance type and Trn2 UltraServer. The release adds new capabilities in both training and inference of large-scale models. It introduces NxD Inference (beta), a PyTorch-based library for deployment, Neuron Profiler 2.0 (beta), and PyTorch 2.5 support across the Neuron SDK, and Logical NeuronCore Configuration (LNC) for optimizing NeuronCore allocation. The release enables Llama 3.1 405B model inference on a single trn2.48xlarge instance.
NxD Inference: NxD Inference (beta) is a new PyTorch-based inference library for deploying large-scale models on AWS Inferentia and Trainium instances. It enables PyTorch model onboarding with minimal code changes and integrates with vLLM. NxDI supports various model architectures, including Llama versions for text processing (Llama 2, Llama 3, Llama 3.1, Llama 3.2, and Llama 3.3), Llama 3.2 multimodal for multimodal tasks, and Mixture-of-Experts (MoE) model architectures including Mixtral and DBRX. The library supports quantization methods, includes dynamic sampling, and is compatible with HuggingFace checkpoints and generate() API. NxDI also supports distributed strategies including tensor parallelism and incorporates speculative decoding techniques (Draft model and EAGLE). The release includes a Llama 3.1 405B model sample on a single trn2.48xlarge instance Llama 3.1 405B model inference.
For more information, see NxD Inference documentation and check the NxD Inference Github repository: aws-neuron/neuronx-distributed-inference
Transformers NeuronX (TNx): This release introduces several new features, including flash decoding support for speculative decoding, and on-device generation in speculative decoding flows. It adds Eagle speculative decoding with greedy and lossless sampling, as well as support for CPU compilation and sharded model saving. Performance improvements include optimized MLP and QKV for Llama models with sequence parallel norm and control over concurrent compilation workers.
Training Highlights: NxD Training in this release adds support for HuggingFace Llama3/3.1 70B on trn2 instances, introduces DPO support for post-training model alignment, and adds support for Mixture-of-Experts (MoE) models including Mixtral 7B. The release includes improved checkpoint conversion capabilities and supports MoE with Tensor, Sequence, Pipeline, and Expert parallelism.
ML Frameworks: Neuron 2.21.0 adds PyTorch 2.5 coming with improved support for eager mode, FP8, and Automatic Mixed Precision capabilities. JAX support extends to version 0.4.35, including support for JAX caching APIs.
Logical NeuronCore Configuration (LNC): This release introduces LNC for Trainium2 instances, optimizing NeuronCore allocation for ML applications. LNC offers two configurations: default (LNC=2) combining two physical cores, and alternative (LNC=1) mapping each physical core individually. This feature allows users to efficiently manage resources for large-scale model training and deployment through runtime variables and compiler flags.
Neuron Profiler 2.0: The new profiler provides system and device-level profiling, timeline annotations, container integration, and support for distributed workloads. It includes trace export capabilities for Perfetto visualization and integration with JAX and PyTorch profilers, and support for Logical NeuronCore Configuration (LNC).
Neuron Kernel Interface (NKI): NKI now supports Trainium2 including Logical NeuronCore Configuration (LNC), adds SPMD capabilities for multi-core operations, and includes new modules and APIs including support for float8_e5m2 datatype.
Deep Learning Containers (DLAMIs): This release expands support for JAX 0.4 within the Multi Framework DLAMI. It also introduces NeuronX Distributed Training (NxDT), Inference (NxDI), and Core (NxD) with PyTorch 2.5 support. Additionally, a new Single Framework DLAMI for TensorFlow 2.10 on Ubuntu 22 is now available.
Deep Learning Containers (DLCs): This release introduces new DLCs for JAX 0.4 training and PyTorch 2.5.1 inference and training. All DLCs have been updated to Ubuntu 22, and the pytorch-inference-neuronx DLC now supports both NxD Inference and TNx libraries.
Documentation: Documentation updates include architectural details about Trainium2 and NeuronCore-v3, along with specifications and topology information for the trn2.48xlarge instance type and Trn2 UltraServer.
Software Maintenance: This release includes the following announcements:
- Announcing migration of NxD Core examples from NxD Core repository to NxD Inference repository in next release
- Announcing end of support for Neuron DET tool starting next release
- PyTorch Neuron versions 1.9 and 1.10 no longer supported
- Announcing end of support for PyTorch 2.1 for Trn1, Trn2 and Inf2 starting next release
- Announcing end of support for PyTorch 1.13 for Trn1 and Inf2 starting next release
- Announcing end of support for Python 3.8 in future releases
- Announcing end of support for Ubuntu20 DLCs and DLAMIs
Amazon Q: Use Q Developer as your Neuron Expert for general technical guidance and to jumpstart your NKI kernel development.
Neuron SDK Release - December 3, 2024
Neuron 2.21 beta introduces support for AWS Trainium2 and Trn2 instances, including the trn2.48xlarge instance type and u-trn2 UltraServer. The release showcases Llama 3.1 405B model inference using NxD Inference on a single trn2.48xlarge instance, and FUJI 70B model training using the AXLearn library across eight trn2.48xlarge instances.
NxD Inference, a new PyTorch-based library for deploying large language models and multi-modality models, is introduced in this release. It integrates with vLLM and enables PyTorch model onboarding with minimal code changes. The release also adds support for AXLearn training for JAX models.
The new Neuron Profiler 2.0 introduced in this release offers system and device-level profiling, timeline annotations, and container integration. The profiler supports distributed workloads and provides trace export capabilities for Perfetto visualization.
The documentation has been updated to include architectural details about Trainium2 and NeuronCore-v3, along with specifications and topology information for the trn2.48xlarge instance type and u-trn2 UltraServer.
Note:
This release (Neuron 2.21 Beta) was only tested with Trn2 instances. The next release (Neuron 2.21) will support all instances (Inf1, Inf2, Trn1, and Trn2).
For access to this release (Neuron 2.21 Beta) contact your account manager.