Releases: aws-neuron/aws-neuron-sdk
Neuron SDK Release - March 04, 2021
This release include bug fixes and minor enhancements to the Neuron Runtime and Tools.
Neuron SDK Release - February 24, 2021
This release updates all Neuron packages and libraries in response to the Python Secutity issue CVE-2021-3177 as described here: https://nvd.nist.gov/vuln/detail/CVE-2021-3177. This vulnerability potentially exists in multiple versions of Python including 3.5, 3.6, 3.7. Python is used by various components of Neuron, including the Neuron compiler as well as Machine Learning frameworks including TensorFlow, PyTorch and MXNet. It is recommended that the Python interpreters used in any AMIs and containers used with Neuron are also updated.
Python 3.5 reached end-of-life as described here: https://devguide.python.org/devcycle/?highlight=python%203.5%20end%20of%20life#end-of-life-branches
From this release Neuron packages will not support Python 3.5. Users should upgrade to latest DLAMI or upgrade to a newer Python versions if they are using other AMI.
Neuron SDK Release - January 30, 2021
This release continues to improve the NeuronCore Pipeline performance for BERT models. For example, running BERT Base with the the neuroncore-pipeline-cores compile option, at batch=3, seqlen=32 using 16 Neuron Cores, results in throughput of up to 5340 sequences per second and P99 latency of 9ms using Tensorflow Serving.
This release also adds operator support and performance improvements for the PyTorch based DistilBert model for sequence classification.
Neuron SDK Release - December 23, 2020
This release introduces a PyTorch 1.7 based torch-neuron package as a part of the Neuron SDK. Support for PyTorch model serving with TorchServe 0.2 is added and will be demonstrated with a tutorial. This release also provides an example tutorial for PyTorch based Yolo v4 model for Inferentia.
To aid visibility into compiler activity, the Neuron-extended Frameworks TensorFlow and PyTorch will display a new compilation status indicator that prints a dot (.) every 20 seconds to the console as compilation is executing.
Important to know:
- This update continues to support the torch-neuron version of PyTorch 1.5.1 for backwards compatibility.
Neuron SDK Release - November 17, 2020
This release improves NeuronCore Pipeline performance. For example, running BERT Small, batch=4, seqlen=32 using 4 Neuron Cores, results in throughput of up to 7000 sequences per second and P99 latency of 3ms using Tensorflow Serving.
Neuron tools updated the NeuronCore utilization metric to include all inf1 compute engines and DMAs. Added a new neuron-monitor example that connects to Grafana via Prometheus. We've added a new sample script which exports most of neuron-monitor's metrics to a Prometheus monitoring server. Additionally, we also provided a sample Grafana dashboard. More details here
ONNX support is limited and from this version onwards we are not planning to add any additional capabilities to ONNX. We recommend running models in TensorFlow, PyTorch or MXNet for best performance and support.
Neuron SDK Release - October 22, 2020
This release adds a Neuron kernel mode driver (KMD). The Neuron KMD simplifies Neuron Runtime deployments by removing the need for elevated privileges, improves memory management by removing the need for huge pages configuration, and eliminates the need for running neuron-rtd as a sidecar container. Documentation throughout the repo has been updated to reflect the new support. The new Neuron KMD is backwards compatible with prior versions of Neuron ML Frameworks and Compilers - no changes are required to existing application code.
More details in the Neuron Runtime release notes here.
Neuron SDK Release - September 22, 2020
This release improves performance of YOLO v3 and v4, VGG16, SSD300, and BERT. As part of these improvements, Neuron Compiler doesn’t require any special compilation flags for most models. Details on how to use the prior optimizations are outlined in the neuron-cc release notes.
The release also improves operational deployments of large scale inference applications, with a session management agent incorporated into all supported ML Frameworks and a new neuron tool called neuron-monitor allows to easily scale monitoring of large fleets of Inference applications. A sample script for connecting neuron-monitor to Amazon CloudWatch metrics is provided as well. Read more about using neuron-monitor.
Neuron SDK Release - August 19, 2020
Bug fix for an error reporting issue with the Neuron Runtime. Previous versions of the runtime were only reporting uncorrectable errors on half of the dram per Inferentia. Other Neuron packages are not changed.
Neuron SDK Release - August 08, 2020
This release of the Neuron SDK delivers performance enhancements for the BERT Base model. Sequence lengths including 128, 256 and 512 were found to have best performance at batch size 6, 3 and 1 respectively using publically available versions of both Pytorch (1.5.x) and Tensorflow-based (1.15.x) models. The compiler option "-O2" was used in all cases.
A new Kubernetes scheduler extension is included in this release to improve pod scheduling on inf1.6xlarge and inf1.24xlarge instance sizes. Details on how the scheduler works and how to apply the scheduler can be found here. Check the Neuron K8 release notes for details changes to k8 components going forawrd.
Neuron SDK Release - August 5, 2020
Bug fix for a latent issue caused by a race condition in Neuron Runtime leading to possible crashes. The crash was observed under stress load conditons. All customers are encouraged to update the latest Neuron Runtime package (aws-neuron-runtime), version 1.0.8813.0 or newer. Other Neuron packages are being updated as well, but are to be considered non-critical updates.