-
Notifications
You must be signed in to change notification settings - Fork 3k
compiler: Support long UTF-8 encoded atoms #8913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CT Test Results 5 files 538 suites 1h 46m 46s ⏱️ Results for commit 04b168d. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts// Erlang/OTP Github Action Bot |
Support for atoms containing any Unicode code point was added
in Erlang/OTP 20 (PR-1078).
After that change, an atom can contain up to 255 Unicode code
points. However, atoms used in Erlang source code is still limited to
255 bytes because the atom table in the BEAM file only has a byte for
holding the length in bytes of the atom text. For instance, the `🟦`
character has a four-byte encoding (`<<240,159,159,166>>`), meaning
that Erlang source code containing a literal atom consisting of 64 or
more such characters cannot be compiled.
This commit changes the atom table in BEAM files to use a variable
length encoding for the length of each atom. For atoms up to 15 bytes,
the length is encoded in one byte. The header for the atom table is
also changed to indicate that new encoding of lengths are used.
Attempting to load a BEAM file compiled with Erlang/OTP 28 in
Erlang/OTP 27 or earlier will result in the following error message:
1> l(t).
=ERROR REPORT==== 8-Oct-2024::08:49:01.750424 ===
beam/beam_load.c(150): Error loading module t:
corrupt atom table
{error,badfile}
`beam_lib` is updated to handle the new format. External tools that
use `beam_lib:chunks(Beam, [atoms])` to read the atom table will
continue to work. External tools that do their own parsing of the atom
table will need to be updated.
0b18e23 to
04b168d
Compare
jhogberg
approved these changes
Oct 8, 2024
This was referenced Oct 13, 2024
bettio
added a commit
to atomvm/AtomVM
that referenced
this pull request
Oct 13, 2024
CI: build-and-test: disable OTP master A recent change made impossible to use beam files compiled with OTP master, so let's disable it. See also #1321 and erlang/otp#8913 These changes are made under both the "Apache 2.0" and the "GNU Lesser General Public License 2.1 or later" license terms (dual license). SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
bettio
added a commit
to bettio/AtomVM-fork
that referenced
this pull request
Jan 24, 2025
Atom length prefix now is a varlen int, so up to 255 codepoints can be supported. AtomVM will not support this. When going embedded some features should be left out, and this is one of them. See also: - erlang/otp#8913 - erlang/otp#9336 Signed-off-by: Davide Bettio <[email protected]>
bettio
added a commit
to bettio/AtomVM-fork
that referenced
this pull request
Feb 8, 2025
Atom length prefix now is a varlen int, so up to 255 codepoints can be supported. AtomVM will not support this. When going embedded some features should be left out, and this is one of them. See also: - erlang/otp#8913 - erlang/otp#9336 Signed-off-by: Davide Bettio <[email protected]>
bettio
added a commit
to bettio/AtomVM-fork
that referenced
this pull request
Feb 8, 2025
Atom length prefix now is a varlen int, so up to 255 codepoints can be supported. AtomVM will not support this. When going embedded some features should be left out, and this is one of them. See also: - erlang/otp#8913 - erlang/otp#9336 Signed-off-by: Davide Bettio <[email protected]>
bettio
added a commit
to bettio/AtomVM-fork
that referenced
this pull request
Feb 9, 2025
Atom length prefix now is a varlen int, so up to 255 codepoints can be supported. AtomVM will not support this. When going embedded some features should be left out, and this is one of them. See also: - erlang/otp#8913 - erlang/otp#9336 Signed-off-by: Davide Bettio <[email protected]>
bettio
added a commit
to atomvm/AtomVM
that referenced
this pull request
Feb 12, 2025
Atom length prefix now is a varlen int, so up to 255 codepoints can be supported. AtomVM will not support this. When going embedded some features should be left out, and this is one of them. See also: - erlang/otp#8913 - erlang/otp#9336 Signed-off-by: Davide Bettio <[email protected]>
bettio
added a commit
to bettio/AtomVM-fork
that referenced
this pull request
Feb 13, 2025
Atom length prefix now is a varlen int, so up to 255 codepoints can be supported. AtomVM will not support this. When going embedded some features should be left out, and this is one of them. See also: - erlang/otp#8913 - erlang/otp#9336 Signed-off-by: Davide Bettio <[email protected]>
ansd
added a commit
to rabbitmq/rabbitmq-server
that referenced
this pull request
Jun 25, 2025
Test case two_nodes_different_otp_version was introduced in #14042. In CI, the code was compiled on OTP 27. The test case then applied the Ra commands on OTP 27 and OTP 26 at runtime. For that to work the test transferred the compiled BEAM modules to the OTP 26 node. Bumping OTP to 28 causes the lower version node (be it OTP 26 or OTP 27) to error when parsing the atom table chunk of the BEAM file that was compiled in OTP 28: ``` corrupt atom table {error,badfile} ``` That's expected as described in erlang/otp#8913 (comment) since erlang/otp#8913 changes the atom table chunk format in the BEAM files. ``` beam_lib:chunks("deps/rabbit/ebin/lqueue.beam", [atoms]). ``` will parse successfully if the file gets loaded on OTP 28 irrespective of whether the file was compiled with OTP 27 or 28. However, this file fails to load if it is compiled with 28 and loaded on 27. There is the `no_long_atoms` option that we could use just for this test case. However, given that we have a similar test case two_nodes_same_otp_version we skip test case two_nodes_different_otp_version in this commit. Really, the best solution would be to use different OTP versions in RabbitMQ mixed version testing in addition to using different RabbitMQ versions. This way, the test case could just RPC into the different RabbitMQ nodes and apply the Ra commands there.
mkuratczyk
pushed a commit
to rabbitmq/rabbitmq-server
that referenced
this pull request
Jun 26, 2025
Test case two_nodes_different_otp_version was introduced in #14042. In CI, the code was compiled on OTP 27. The test case then applied the Ra commands on OTP 27 and OTP 26 at runtime. For that to work the test transferred the compiled BEAM modules to the OTP 26 node. Bumping OTP to 28 causes the lower version node (be it OTP 26 or OTP 27) to error when parsing the atom table chunk of the BEAM file that was compiled in OTP 28: ``` corrupt atom table {error,badfile} ``` That's expected as described in erlang/otp#8913 (comment) since erlang/otp#8913 changes the atom table chunk format in the BEAM files. ``` beam_lib:chunks("deps/rabbit/ebin/lqueue.beam", [atoms]). ``` will parse successfully if the file gets loaded on OTP 28 irrespective of whether the file was compiled with OTP 27 or 28. However, this file fails to load if it is compiled with 28 and loaded on 27. There is the `no_long_atoms` option that we could use just for this test case. However, given that we have a similar test case two_nodes_same_otp_version we skip test case two_nodes_different_otp_version in this commit. Really, the best solution would be to use different OTP versions in RabbitMQ mixed version testing in addition to using different RabbitMQ versions. This way, the test case could just RPC into the different RabbitMQ nodes and apply the Ra commands there.
mkuratczyk
pushed a commit
to rabbitmq/rabbitmq-server
that referenced
this pull request
Jun 26, 2025
Test case two_nodes_different_otp_version was introduced in #14042. In CI, the code was compiled on OTP 27. The test case then applied the Ra commands on OTP 27 and OTP 26 at runtime. For that to work the test transferred the compiled BEAM modules to the OTP 26 node. Bumping OTP to 28 causes the lower version node (be it OTP 26 or OTP 27) to error when parsing the atom table chunk of the BEAM file that was compiled in OTP 28: ``` corrupt atom table {error,badfile} ``` That's expected as described in erlang/otp#8913 (comment) since erlang/otp#8913 changes the atom table chunk format in the BEAM files. ``` beam_lib:chunks("deps/rabbit/ebin/lqueue.beam", [atoms]). ``` will parse successfully if the file gets loaded on OTP 28 irrespective of whether the file was compiled with OTP 27 or 28. However, this file fails to load if it is compiled with 28 and loaded on 27. There is the `no_long_atoms` option that we could use just for this test case. However, given that we have a similar test case two_nodes_same_otp_version we skip test case two_nodes_different_otp_version in this commit. Really, the best solution would be to use different OTP versions in RabbitMQ mixed version testing in addition to using different RabbitMQ versions. This way, the test case could just RPC into the different RabbitMQ nodes and apply the Ra commands there.
mkuratczyk
pushed a commit
to rabbitmq/rabbitmq-server
that referenced
this pull request
Jun 27, 2025
Test case two_nodes_different_otp_version was introduced in #14042. In CI, the code was compiled on OTP 27. The test case then applied the Ra commands on OTP 27 and OTP 26 at runtime. For that to work the test transferred the compiled BEAM modules to the OTP 26 node. Bumping OTP to 28 causes the lower version node (be it OTP 26 or OTP 27) to error when parsing the atom table chunk of the BEAM file that was compiled in OTP 28: ``` corrupt atom table {error,badfile} ``` That's expected as described in erlang/otp#8913 (comment) since erlang/otp#8913 changes the atom table chunk format in the BEAM files. ``` beam_lib:chunks("deps/rabbit/ebin/lqueue.beam", [atoms]). ``` will parse successfully if the file gets loaded on OTP 28 irrespective of whether the file was compiled with OTP 27 or 28. However, this file fails to load if it is compiled with 28 and loaded on 27. There is the `no_long_atoms` option that we could use just for this test case. However, given that we have a similar test case two_nodes_same_otp_version we skip test case two_nodes_different_otp_version in this commit. Really, the best solution would be to use different OTP versions in RabbitMQ mixed version testing in addition to using different RabbitMQ versions. This way, the test case could just RPC into the different RabbitMQ nodes and apply the Ra commands there.
mkuratczyk
pushed a commit
to rabbitmq/rabbitmq-server
that referenced
this pull request
Jun 27, 2025
Test case two_nodes_different_otp_version was introduced in #14042. In CI, the code was compiled on OTP 27. The test case then applied the Ra commands on OTP 27 and OTP 26 at runtime. For that to work the test transferred the compiled BEAM modules to the OTP 26 node. Bumping OTP to 28 causes the lower version node (be it OTP 26 or OTP 27) to error when parsing the atom table chunk of the BEAM file that was compiled in OTP 28: ``` corrupt atom table {error,badfile} ``` That's expected as described in erlang/otp#8913 (comment) since erlang/otp#8913 changes the atom table chunk format in the BEAM files. ``` beam_lib:chunks("deps/rabbit/ebin/lqueue.beam", [atoms]). ``` will parse successfully if the file gets loaded on OTP 28 irrespective of whether the file was compiled with OTP 27 or 28. However, this file fails to load if it is compiled with 28 and loaded on 27. There is the `no_long_atoms` option that we could use just for this test case. However, given that we have a similar test case two_nodes_same_otp_version we skip test case two_nodes_different_otp_version in this commit. Really, the best solution would be to use different OTP versions in RabbitMQ mixed version testing in addition to using different RabbitMQ versions. This way, the test case could just RPC into the different RabbitMQ nodes and apply the Ra commands there.
mkuratczyk
pushed a commit
to rabbitmq/rabbitmq-server
that referenced
this pull request
Jun 27, 2025
Test case two_nodes_different_otp_version was introduced in #14042. In CI, the code was compiled on OTP 27. The test case then applied the Ra commands on OTP 27 and OTP 26 at runtime. For that to work the test transferred the compiled BEAM modules to the OTP 26 node. Bumping OTP to 28 causes the lower version node (be it OTP 26 or OTP 27) to error when parsing the atom table chunk of the BEAM file that was compiled in OTP 28: ``` corrupt atom table {error,badfile} ``` That's expected as described in erlang/otp#8913 (comment) since erlang/otp#8913 changes the atom table chunk format in the BEAM files. ``` beam_lib:chunks("deps/rabbit/ebin/lqueue.beam", [atoms]). ``` will parse successfully if the file gets loaded on OTP 28 irrespective of whether the file was compiled with OTP 27 or 28. However, this file fails to load if it is compiled with 28 and loaded on 27. There is the `no_long_atoms` option that we could use just for this test case. However, given that we have a similar test case two_nodes_same_otp_version we skip test case two_nodes_different_otp_version in this commit. Really, the best solution would be to use different OTP versions in RabbitMQ mixed version testing in addition to using different RabbitMQ versions. This way, the test case could just RPC into the different RabbitMQ nodes and apply the Ra commands there.
LoisSotoLopez
pushed a commit
to cloudamqp/rabbitmq-server
that referenced
this pull request
Jul 16, 2025
Test case two_nodes_different_otp_version was introduced in rabbitmq#14042. In CI, the code was compiled on OTP 27. The test case then applied the Ra commands on OTP 27 and OTP 26 at runtime. For that to work the test transferred the compiled BEAM modules to the OTP 26 node. Bumping OTP to 28 causes the lower version node (be it OTP 26 or OTP 27) to error when parsing the atom table chunk of the BEAM file that was compiled in OTP 28: ``` corrupt atom table {error,badfile} ``` That's expected as described in erlang/otp#8913 (comment) since erlang/otp#8913 changes the atom table chunk format in the BEAM files. ``` beam_lib:chunks("deps/rabbit/ebin/lqueue.beam", [atoms]). ``` will parse successfully if the file gets loaded on OTP 28 irrespective of whether the file was compiled with OTP 27 or 28. However, this file fails to load if it is compiled with 28 and loaded on 27. There is the `no_long_atoms` option that we could use just for this test case. However, given that we have a similar test case two_nodes_same_otp_version we skip test case two_nodes_different_otp_version in this commit. Really, the best solution would be to use different OTP versions in RabbitMQ mixed version testing in addition to using different RabbitMQ versions. This way, the test case could just RPC into the different RabbitMQ nodes and apply the Ra commands there.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
enhancement
team:VM
Assigned to OTP team VM
testing
currently being tested, tag is used by OTP internal CI
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Support for atoms containing any Unicode code point was added in Erlang/OTP 20 (#1078).
After that change, an atom can contain up to 255 Unicode code points. However, atoms used in Erlang source code is still limited to 255 bytes because the atom table in the BEAM file only has a byte for holding the length in bytes of the atom text. For instance, the
🟦character has a four-byte encoding (<<240,159,159,166>>), meaning that Erlang source code containing a literal atom consisting of 64 or more such characters cannot be compiled.This pull request changes the atom table in BEAM files to use
two bytesa variable length encoding for the length of each atom. For atoms up to 15 bytes, the length is encoded in one byte. The header for the atom table is also changed to indicate thattwo-byte lengththe new encodiging of lengths are used. Attempting to load a BEAM file compiled with Erlang/OTP 28 in Erlang/OTP 27 or earlier will result in the following error message:beam_libis updated to handle the new format. External tools that usebeam_lib:chunks(Beam, [atoms])to read the atom table will continue to work. External tools that do their own parsing of the atom table will need to be updated.