Skip to content

Conversation

@bjorng
Copy link
Contributor

@bjorng bjorng commented Oct 8, 2024

Support for atoms containing any Unicode code point was added in Erlang/OTP 20 (#1078).

After that change, an atom can contain up to 255 Unicode code points. However, atoms used in Erlang source code is still limited to 255 bytes because the atom table in the BEAM file only has a byte for holding the length in bytes of the atom text. For instance, the 🟦 character has a four-byte encoding (<<240,159,159,166>>), meaning that Erlang source code containing a literal atom consisting of 64 or more such characters cannot be compiled.

This pull request changes the atom table in BEAM files to use two bytes a variable length encoding for the length of each atom. For atoms up to 15 bytes, the length is encoded in one byte. The header for the atom table is also changed to indicate that two-byte length the new encodiging of lengths are used. Attempting to load a BEAM file compiled with Erlang/OTP 28 in Erlang/OTP 27 or earlier will result in the following error message:

1> l(t).
=ERROR REPORT==== 8-Oct-2024::08:49:01.750424 ===
beam/beam_load.c(150): Error loading module t:
  corrupt atom table

{error,badfile}

beam_lib is updated to handle the new format. External tools that use beam_lib:chunks(Beam, [atoms]) to read the atom table will continue to work. External tools that do their own parsing of the atom table will need to be updated.

@bjorng bjorng added team:VM Assigned to OTP team VM enhancement testing currently being tested, tag is used by OTP internal CI labels Oct 8, 2024
@bjorng bjorng self-assigned this Oct 8, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Oct 8, 2024

CT Test Results

     5 files    538 suites   1h 46m 46s ⏱️
 4 247 tests 4 154 ✅  93 💤 0 ❌
10 049 runs  9 932 ✅ 117 💤 0 ❌

Results for commit 04b168d.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@bjorng bjorng requested a review from jhogberg October 8, 2024 07:16
Support for atoms containing any Unicode code point was added
in Erlang/OTP 20 (PR-1078).

After that change, an atom can contain up to 255 Unicode code
points. However, atoms used in Erlang source code is still limited to
255 bytes because the atom table in the BEAM file only has a byte for
holding the length in bytes of the atom text. For instance, the `🟦`
character has a four-byte encoding (`<<240,159,159,166>>`), meaning
that Erlang source code containing a literal atom consisting of 64 or
more such characters cannot be compiled.

This commit changes the atom table in BEAM files to use a variable
length encoding for the length of each atom. For atoms up to 15 bytes,
the length is encoded in one byte. The header for the atom table is
also changed to indicate that new encoding of lengths are used.
Attempting to load a BEAM file compiled with Erlang/OTP 28 in
Erlang/OTP 27 or earlier will result in the following error message:

    1> l(t).
    =ERROR REPORT==== 8-Oct-2024::08:49:01.750424 ===
    beam/beam_load.c(150): Error loading module t:
      corrupt atom table

    {error,badfile}

`beam_lib` is updated to handle the new format. External tools that
use `beam_lib:chunks(Beam, [atoms])` to read the atom table will
continue to work. External tools that do their own parsing of the atom
table will need to be updated.
@bjorng bjorng force-pushed the bjorn/long-utf8-atoms/OTP-19285 branch from 0b18e23 to 04b168d Compare October 8, 2024 11:01
@bjorng bjorng merged commit 26d72b4 into erlang:master Oct 10, 2024
19 checks passed
@bjorng bjorng deleted the bjorn/long-utf8-atoms/OTP-19285 branch October 10, 2024 13:16
bettio added a commit to atomvm/AtomVM that referenced this pull request Oct 13, 2024
CI: build-and-test: disable OTP master

A recent change made impossible to use beam files compiled with OTP master, so
let's disable it.

See also #1321 and erlang/otp#8913

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
bettio added a commit to bettio/AtomVM-fork that referenced this pull request Jan 24, 2025
Atom length prefix now is a varlen int, so up to 255 codepoints can be
supported.

AtomVM will not support this. When going embedded some features should
be left out, and this is one of them.

See also:
- erlang/otp#8913
- erlang/otp#9336

Signed-off-by: Davide Bettio <[email protected]>
bettio added a commit to bettio/AtomVM-fork that referenced this pull request Feb 8, 2025
Atom length prefix now is a varlen int, so up to 255 codepoints can be
supported.

AtomVM will not support this. When going embedded some features should
be left out, and this is one of them.

See also:
- erlang/otp#8913
- erlang/otp#9336

Signed-off-by: Davide Bettio <[email protected]>
bettio added a commit to bettio/AtomVM-fork that referenced this pull request Feb 8, 2025
Atom length prefix now is a varlen int, so up to 255 codepoints can be
supported.

AtomVM will not support this. When going embedded some features should
be left out, and this is one of them.

See also:
- erlang/otp#8913
- erlang/otp#9336

Signed-off-by: Davide Bettio <[email protected]>
bettio added a commit to bettio/AtomVM-fork that referenced this pull request Feb 9, 2025
Atom length prefix now is a varlen int, so up to 255 codepoints can be
supported.

AtomVM will not support this. When going embedded some features should
be left out, and this is one of them.

See also:
- erlang/otp#8913
- erlang/otp#9336

Signed-off-by: Davide Bettio <[email protected]>
bettio added a commit to atomvm/AtomVM that referenced this pull request Feb 12, 2025
Atom length prefix now is a varlen int, so up to 255 codepoints can be
supported.

AtomVM will not support this. When going embedded some features should
be left out, and this is one of them.

See also:
- erlang/otp#8913
- erlang/otp#9336

Signed-off-by: Davide Bettio <[email protected]>
bettio added a commit to bettio/AtomVM-fork that referenced this pull request Feb 13, 2025
Atom length prefix now is a varlen int, so up to 255 codepoints can be
supported.

AtomVM will not support this. When going embedded some features should
be left out, and this is one of them.

See also:
- erlang/otp#8913
- erlang/otp#9336

Signed-off-by: Davide Bettio <[email protected]>
ansd added a commit to rabbitmq/rabbitmq-server that referenced this pull request Jun 25, 2025
Test case two_nodes_different_otp_version was introduced in
#14042.

In CI, the code was compiled on OTP 27. The test case then applied the
Ra commands on OTP 27 and OTP 26 at runtime. For that to work the test
transferred the compiled BEAM modules to the OTP 26 node.

Bumping OTP to 28 causes the lower version node (be it OTP 26 or OTP 27)
to error when parsing the atom table chunk of the BEAM file that was
compiled in OTP 28:
```
  corrupt atom table

{error,badfile}
```
That's expected as described in erlang/otp#8913 (comment)
since erlang/otp#8913 changes the atom table chunk
format in the BEAM files.

```
beam_lib:chunks("deps/rabbit/ebin/lqueue.beam", [atoms]).
```
will parse successfully if the file gets loaded on OTP 28 irrespective
of whether the file was compiled with OTP 27 or 28.
However, this file fails to load if it is compiled with 28 and loaded on
27.

There is the `no_long_atoms` option that we could use just for this test
case.
However, given that we have a similar test case
two_nodes_same_otp_version we skip test case two_nodes_different_otp_version
in this commit.

Really, the best solution would be to use different OTP versions in
RabbitMQ mixed version testing in addition to using different RabbitMQ
versions. This way, the test case could just RPC into the different
RabbitMQ nodes and apply the Ra commands there.
mkuratczyk pushed a commit to rabbitmq/rabbitmq-server that referenced this pull request Jun 26, 2025
Test case two_nodes_different_otp_version was introduced in
#14042.

In CI, the code was compiled on OTP 27. The test case then applied the
Ra commands on OTP 27 and OTP 26 at runtime. For that to work the test
transferred the compiled BEAM modules to the OTP 26 node.

Bumping OTP to 28 causes the lower version node (be it OTP 26 or OTP 27)
to error when parsing the atom table chunk of the BEAM file that was
compiled in OTP 28:
```
  corrupt atom table

{error,badfile}
```
That's expected as described in erlang/otp#8913 (comment)
since erlang/otp#8913 changes the atom table chunk
format in the BEAM files.

```
beam_lib:chunks("deps/rabbit/ebin/lqueue.beam", [atoms]).
```
will parse successfully if the file gets loaded on OTP 28 irrespective
of whether the file was compiled with OTP 27 or 28.
However, this file fails to load if it is compiled with 28 and loaded on
27.

There is the `no_long_atoms` option that we could use just for this test
case.
However, given that we have a similar test case
two_nodes_same_otp_version we skip test case two_nodes_different_otp_version
in this commit.

Really, the best solution would be to use different OTP versions in
RabbitMQ mixed version testing in addition to using different RabbitMQ
versions. This way, the test case could just RPC into the different
RabbitMQ nodes and apply the Ra commands there.
mkuratczyk pushed a commit to rabbitmq/rabbitmq-server that referenced this pull request Jun 26, 2025
Test case two_nodes_different_otp_version was introduced in
#14042.

In CI, the code was compiled on OTP 27. The test case then applied the
Ra commands on OTP 27 and OTP 26 at runtime. For that to work the test
transferred the compiled BEAM modules to the OTP 26 node.

Bumping OTP to 28 causes the lower version node (be it OTP 26 or OTP 27)
to error when parsing the atom table chunk of the BEAM file that was
compiled in OTP 28:
```
  corrupt atom table

{error,badfile}
```
That's expected as described in erlang/otp#8913 (comment)
since erlang/otp#8913 changes the atom table chunk
format in the BEAM files.

```
beam_lib:chunks("deps/rabbit/ebin/lqueue.beam", [atoms]).
```
will parse successfully if the file gets loaded on OTP 28 irrespective
of whether the file was compiled with OTP 27 or 28.
However, this file fails to load if it is compiled with 28 and loaded on
27.

There is the `no_long_atoms` option that we could use just for this test
case.
However, given that we have a similar test case
two_nodes_same_otp_version we skip test case two_nodes_different_otp_version
in this commit.

Really, the best solution would be to use different OTP versions in
RabbitMQ mixed version testing in addition to using different RabbitMQ
versions. This way, the test case could just RPC into the different
RabbitMQ nodes and apply the Ra commands there.
mkuratczyk pushed a commit to rabbitmq/rabbitmq-server that referenced this pull request Jun 27, 2025
Test case two_nodes_different_otp_version was introduced in
#14042.

In CI, the code was compiled on OTP 27. The test case then applied the
Ra commands on OTP 27 and OTP 26 at runtime. For that to work the test
transferred the compiled BEAM modules to the OTP 26 node.

Bumping OTP to 28 causes the lower version node (be it OTP 26 or OTP 27)
to error when parsing the atom table chunk of the BEAM file that was
compiled in OTP 28:
```
  corrupt atom table

{error,badfile}
```
That's expected as described in erlang/otp#8913 (comment)
since erlang/otp#8913 changes the atom table chunk
format in the BEAM files.

```
beam_lib:chunks("deps/rabbit/ebin/lqueue.beam", [atoms]).
```
will parse successfully if the file gets loaded on OTP 28 irrespective
of whether the file was compiled with OTP 27 or 28.
However, this file fails to load if it is compiled with 28 and loaded on
27.

There is the `no_long_atoms` option that we could use just for this test
case.
However, given that we have a similar test case
two_nodes_same_otp_version we skip test case two_nodes_different_otp_version
in this commit.

Really, the best solution would be to use different OTP versions in
RabbitMQ mixed version testing in addition to using different RabbitMQ
versions. This way, the test case could just RPC into the different
RabbitMQ nodes and apply the Ra commands there.
mkuratczyk pushed a commit to rabbitmq/rabbitmq-server that referenced this pull request Jun 27, 2025
Test case two_nodes_different_otp_version was introduced in
#14042.

In CI, the code was compiled on OTP 27. The test case then applied the
Ra commands on OTP 27 and OTP 26 at runtime. For that to work the test
transferred the compiled BEAM modules to the OTP 26 node.

Bumping OTP to 28 causes the lower version node (be it OTP 26 or OTP 27)
to error when parsing the atom table chunk of the BEAM file that was
compiled in OTP 28:
```
  corrupt atom table

{error,badfile}
```
That's expected as described in erlang/otp#8913 (comment)
since erlang/otp#8913 changes the atom table chunk
format in the BEAM files.

```
beam_lib:chunks("deps/rabbit/ebin/lqueue.beam", [atoms]).
```
will parse successfully if the file gets loaded on OTP 28 irrespective
of whether the file was compiled with OTP 27 or 28.
However, this file fails to load if it is compiled with 28 and loaded on
27.

There is the `no_long_atoms` option that we could use just for this test
case.
However, given that we have a similar test case
two_nodes_same_otp_version we skip test case two_nodes_different_otp_version
in this commit.

Really, the best solution would be to use different OTP versions in
RabbitMQ mixed version testing in addition to using different RabbitMQ
versions. This way, the test case could just RPC into the different
RabbitMQ nodes and apply the Ra commands there.
mkuratczyk pushed a commit to rabbitmq/rabbitmq-server that referenced this pull request Jun 27, 2025
Test case two_nodes_different_otp_version was introduced in
#14042.

In CI, the code was compiled on OTP 27. The test case then applied the
Ra commands on OTP 27 and OTP 26 at runtime. For that to work the test
transferred the compiled BEAM modules to the OTP 26 node.

Bumping OTP to 28 causes the lower version node (be it OTP 26 or OTP 27)
to error when parsing the atom table chunk of the BEAM file that was
compiled in OTP 28:
```
  corrupt atom table

{error,badfile}
```
That's expected as described in erlang/otp#8913 (comment)
since erlang/otp#8913 changes the atom table chunk
format in the BEAM files.

```
beam_lib:chunks("deps/rabbit/ebin/lqueue.beam", [atoms]).
```
will parse successfully if the file gets loaded on OTP 28 irrespective
of whether the file was compiled with OTP 27 or 28.
However, this file fails to load if it is compiled with 28 and loaded on
27.

There is the `no_long_atoms` option that we could use just for this test
case.
However, given that we have a similar test case
two_nodes_same_otp_version we skip test case two_nodes_different_otp_version
in this commit.

Really, the best solution would be to use different OTP versions in
RabbitMQ mixed version testing in addition to using different RabbitMQ
versions. This way, the test case could just RPC into the different
RabbitMQ nodes and apply the Ra commands there.
LoisSotoLopez pushed a commit to cloudamqp/rabbitmq-server that referenced this pull request Jul 16, 2025
Test case two_nodes_different_otp_version was introduced in
rabbitmq#14042.

In CI, the code was compiled on OTP 27. The test case then applied the
Ra commands on OTP 27 and OTP 26 at runtime. For that to work the test
transferred the compiled BEAM modules to the OTP 26 node.

Bumping OTP to 28 causes the lower version node (be it OTP 26 or OTP 27)
to error when parsing the atom table chunk of the BEAM file that was
compiled in OTP 28:
```
  corrupt atom table

{error,badfile}
```
That's expected as described in erlang/otp#8913 (comment)
since erlang/otp#8913 changes the atom table chunk
format in the BEAM files.

```
beam_lib:chunks("deps/rabbit/ebin/lqueue.beam", [atoms]).
```
will parse successfully if the file gets loaded on OTP 28 irrespective
of whether the file was compiled with OTP 27 or 28.
However, this file fails to load if it is compiled with 28 and loaded on
27.

There is the `no_long_atoms` option that we could use just for this test
case.
However, given that we have a similar test case
two_nodes_same_otp_version we skip test case two_nodes_different_otp_version
in this commit.

Really, the best solution would be to use different OTP versions in
RabbitMQ mixed version testing in addition to using different RabbitMQ
versions. This way, the test case could just RPC into the different
RabbitMQ nodes and apply the Ra commands there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants