(n_tracks, batch_size, sequence_length) to (batch_size, sequence_length, n_tracks). This change was long overdue and eliminates the need for (potentially memory expensive) transpose operations downstream. If you're upgrading from an earlier version, please update your code accordingly (probaby you need to delete one transpose in your code).
✨ NEW FEATURE (v0.3.0+): Full bfloat16 support! You can now specify dtype="bfloat16" to get output tensors in bfloat16 format, reducing memory usage by 50%.