Skip to content

Conversation

tamaroning
Copy link
Contributor

@tamaroning tamaroning commented Jul 4, 2023

Addresses #2287

gcc/rust/ChangeLog:

	* lex/rust-lex.cc (Lexer::input_source_is_valid_utf8): New method of `Lexer`.
	* lex/rust-lex.h: Likewise.
	* rust-session-manager.cc (Session::compile_crate): Add error.

gcc/testsuite/ChangeLog:

	* rust/compile/broken_utf8.rs: New test.

Comment on lines 221 to 225
{
uint8_t input = next_byte ();
uint32_t input = next_byte ();

if ((int8_t) input == EOF)
if ((int32_t) input == EOF)
return Codepoint::eof ();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed input from uint8_t to uint32_t so as to differentiate 0xff and EOF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a bugfix

Comment on lines 364 to 368
{
if (offs >= buffer.size ())
return EOF;

return buffer.at (offs++);
return (uint8_t) buffer.at (offs++);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added casting to prevend bytes whose MSB is 1 from being sign-extended.
Without casting , for example, 0xfe becomes 0xfffffffe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bugfix too

Comment on lines +1 to +2
// { dg-excess-errors "stream did not contain valid UTF-8" }
Copy link
Contributor Author

@tamaroning tamaroning Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contains a 0xff in line 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not ÿ (U+FF) as we see.

gcc/rust/ChangeLog:

	* lex/rust-lex.cc (Lexer::input_source_is_valid_utf8): New method of `Lexer`.
	* lex/rust-lex.h: Likewise.
	* rust-session-manager.cc (Session::compile_crate): Add error.

gcc/testsuite/ChangeLog:

	* rust/compile/broken_utf8.rs: New test.

Signed-off-by: Raiki Tamura <[email protected]>
@tamaroning tamaroning mentioned this pull request Jul 4, 2023
15 tasks
@philberty philberty requested review from CohenArthur, P-E-P and philberty and removed request for CohenArthur and philberty July 6, 2023 10:55
Copy link
Member

@philberty philberty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@philberty
Copy link
Member

The only thing i might say is it would be nice to have these constants named in some way but utf8 stuff is so specific i dont think that really helps.

@philberty philberty added this pull request to the merge queue Jul 6, 2023
Merged via the queue into Rust-GCC:master with commit 46a61f0 Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants