Skip to content

Conversation

@c-git
Copy link
Contributor

@c-git c-git commented Nov 10, 2022

Foreword: I tried my best to not affect the runtime performance of the regex matching but I was only able to account for the cases I could understand / think of. I've listed those below and how I tried to mitigate them.

The cases of main detection that are intended to be added by this PR are:

  • Code with no comments but ends in a semi colon before the declaration of main. No comments was required to prevent // being included to comment out the line. This was done by not allowing / as part of the code before the ; While this is more restrictive than necessary it didn't seem likely that the code before the main would include / so didn't seem worthwhile to take the performance hit to make the check less restrictive. To prevent long regex scans going multiple lines I also disallowed newlines in the code by including \n and \r as disallowed characters. (The whole point is code on the same line as main so this seemed fine and would help prevent unnecessarily matching all lines before main when it doesn't matter).

  • Allow multiline comments within the arguments area of main that both start and end within the brackets. Not expecting a substantial performance cost from this one because this case is near the end of the matching.

I used the following test cases:

fn main() { // Should work
    println!("Hello, world!");
}

//;fn main() { //Shouldn't work
    println!("Hello, world!");
}


use std; fn main(){ println!("Hello, world!"); } // Should work

const fn main() { // Should work
    println!("Hello, world!");
}

/* fn main() {} */ // Shouldn't work

fn main(/* comment */) { // Should work
   // snip
}

@adwinwhite
Copy link
Contributor

adwinwhite commented Jan 13, 2023

It's really nice to have detailed explanation of the regex change.

The small caveat in your PR,

This was done by not allowing / as part of the code before the ; While this is more restrictive than necessary it didn't seem likely that the code before the main would include / so didn't seem worthwhile to take the performance hit to make the check less restrictive.

While it's indeed uncommon to write / before main(), it will confuse people greatly if happens (In code golf like #624).

I wonder if it's possible to solve the problem of main() in comments once for all.
One approach would be filter out all comments before detecting main().
Since the editor detects comments in real-time, the runtime cost shouldn't high and it's only executed when the user runs/builds.

I just did a small benckmark with following code,

const SINGLELINE_COMMENTS_RE = /\/\/.*$/gm;
const BLOCK_COMMENTS_RE = /\/\*.*\*\//gs;

const url = "https://raw.githubusercontent.com/rust-lang/rust/master/library/alloc/src/collections/btree/map.rs";
fetch(url).then(res => res.text())
            .then(text => {
                const code = text;
                console.log("length of code: ", code.length);
                const startTime = performance.now();
                const code_without_comments = code.replaceAll(SINGLELINE_COMMENTS_RE, "").replaceAll(BLOCK_COMMENTS_RE, "");
                const endTime = performance.now();

                console.log("It takes ", (endTime - startTime) / 1000, "seconds");
                console.log("trimmed code length: ", code_without_comments.length);
            });

and got result(repeated, it usually cost less than 1 millisecond which is barely tangible to user),

[LOG]: "length of code: ",  77816 
[LOG]: "It takes ",  0.00040000009536743164,  "seconds" 
[LOG]: "trimmed code length: ",  49181 

Without comments, detection of main() could be much easier.
And the bug that main() might be contained in block comments is eliminated.
New regex:

const HAS_MAIN_FUNCTION_RE = /^(.*;)?\s*(pub\s+)?\s*(const\s+)?\s*(async\s+)?\s*fn\s+main\s*\(\s*)?\s*\)/m;
  • \n\r is no longer needed since . won't match newline.




However considering all the forms main() may appears,

  1. main<>()
  2. pub extern "C" fn main(...)
  3. #[main]
  4. nested main() in submodule

We'd better just add a notice when main is not detected.

Current warning doesn't guide user to change mode to RUN.



BTW, you can add the regex tests to this file .

@c-git
Copy link
Contributor Author

c-git commented Jan 13, 2023

Thank you for the response. I'm sorry but I'm not quite sure what action is being requested.

@adwinwhite
Copy link
Contributor

Sorry for the lengthy words.

I just realized that detection is done in real-time.
Then comments filtering will affect performance much more.

My overall suggestion is that we'd better add a notice to guide user to change execution mode manually when main() detection fails.

And change of regex should be accompanied by tests.

@c-git
Copy link
Contributor Author

c-git commented Jan 17, 2023

Thanks for explaining,

I have suggested a change to the notice here and I'm not sure where the tests should be added. If you can guide me to them, I'd be happy to add them as they are documented in the comments here already.

@shepmaster
Copy link
Member

I wonder if it's possible to solve the problem of main() in comments once for all.

It's certainly possible, but the cost of such a fix is unclear to me. The best answer I can think of would involve running the Rust parser itself (or something equivalent) and having it report if a function named main is found.

The current architecture of the playground doesn't have a great way of doing this. Right now, it would involve sending the code to the backend, running something, then getting a response. That's far too slow for something that should roughly correspond to every keystroke.

Another technique would be to compile something into Wasm and execute that in the browser. It would allow us to reuse some authoritative Rust parser without a network connection. However, it would bring in a large new dependency which might not work for all of our browsers.

I don't think either of these directions is appropriate right now, but are worth considering!

Since the editor detects comments in real-time

I'm not sure exactly which editor you are referring to here, but if you mean Ace / Monaco, this is another interesting idea I hadn't considered. Those libraries already have some established JS for parsing Rust code (with differing quality). If we could piggyback off of the work that they've already done, we could get the result "for free". However, it's not clear how that would work for the simple editor, or if Ace/Monaco have appropriate APIs to query what we want.

One approach would be filter out all comments before detecting main().

This seems like the best compromise at this point in time. A quick search found Regex to match a C-style multiline comment and there are a few JS libraries out there that claim to remove comments (although many have lots of open issues). However, a purpose-built JS (or just a file here) that removes Rust comments1 may be more appropriate.

A benefit to this approach is that it would also apply to cases beyond just fn main, such as:

/*
#[test]
*/
fn x() {}

Footnotes

  1. For example, this is invalid Rust, but I think most regex solutions would accept it: /* /* */

@shepmaster
Copy link
Member

All that being said, I'm OK with incremental improvements to the current scheme. Your unique test cases do need to be added to the test file (as mentioned above, those are in selectors/index.spec.ts). You can run the tests via yarn test

@c-git
Copy link
Contributor Author

c-git commented Jan 18, 2023

Ok thanks, will add them when I get a chance.

@Ezrashaw
Copy link

Ezrashaw commented Apr 7, 2023

Has anyone considered macro-expanded main functions?

This is naive but: couldn't this logic be done with the compiler? Then you have an authoritative answer as to whether the crate has a main function or not. Trying to parse Rust's grammar with regexs probably isn't going to work in the long run; I'm sure we can come up with failing examples indefinitely.

@shepmaster
Copy link
Member

This is naive but: couldn't this logic be done with the compiler?

Please take the time to read previous comments.

@Ezrashaw
Copy link

Gosh, I can't believe I missed that lol. I will note that just the lexing/parsing stage is very fast (especially if it wasn't over the network), rustc spends most of its time after it has a list of function names.

@c-git
Copy link
Contributor Author

c-git commented Nov 9, 2023

TLDR

I don't think I will add the testing myself. Happy to help if someone else wants to take it to the finish line but I won't be able to get it there myself.

Details

I've really wanted to setup the testing but I haven't been able to make the time to install and become familiar with yarn. And I've been doing some reflection on the things on my todo list and doing a bit of house keeping. Realistically this isn't something I can do quickly so it will never get done in that category and hence it would have to be otherwise productive time that I'd have to allocate to this. And while I really want to help out I don't realistically think I will schedule the time to learn yarn over something else. If someone else wants to take what I've done and bring it to the finish line, I'm happy to help with that but don't think I will get around to doing it myself.

shepmaster and others added 2 commits December 1, 2023 22:04
The cases that are intended to be added by this commit are:

- Code with no comments but ends in a semicolon before the declaration
  of `main`.

  No comments was required to prevent `//` being included to comment
  out the line. This was done by not allowing `/` as part of the code
  before the `;` While this is more restrictive than necessary, it
  didn't seem likely that the code before `main` would include `/`
  so didn't seem worthwhile to take the performance hit to make the
  check less restrictive. To prevent long regex scans going multiple
  lines I also disallowed newlines in the code by including `\n` and
  `\r` as disallowed characters. The whole point is code on the same
  line as `main` so this seemed fine and would help prevent
  unnecessarily matching all lines before main when it doesn't
  matter.

- Allow multiline comments within the arguments area of main that both
  start and end within the brackets.

  I'm not expecting a substantial performance cost from this one
  because this case is near the end of the matching.

I used the following test cases

```rust
fn main() { // Should work
    println!("Hello, world!");
}

//;fn main() { //Shouldn't work
    println!("Hello, world!");
}

use std; fn main(){ println!("Hello, world!"); } // Should work

const fn main() { // Should work
    println!("Hello, world!");
}

/* fn main() {} */ // Shouldn't work

fn main(/* comment */) { // Should work
   // snip
}
```

Co-authored-by: Jake Goulding <[email protected]>
@shepmaster shepmaster added the CI: approved Allowed access to CI secrets label Dec 2, 2023
Copy link
Member

@shepmaster shepmaster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@shepmaster shepmaster merged commit 49a08d9 into rust-lang:main Dec 3, 2023
@c-git
Copy link
Contributor Author

c-git commented Dec 3, 2023

I'm sorry I wasn't able to bring this to completion. Thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI: approved Allowed access to CI secrets

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants