Skip to content

Conversation

shilangyu
Copy link
Collaborator

No description provided.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the starting position calculation for bounded look-behinds in the regex automata engine. Key changes include:

  • Replacing the flat list of look-behind start states with a tree structure to capture nesting.
  • Constructing and sorting a vector of look-behind tuples in PikeVM to ensure correct evaluation order.
  • Updating the compiler and builder components to handle the new look-behind tree structure.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
regex-automata/src/nfa/thompson/pikevm.rs Updated look-behind evaluation logic and added sorting of look-behind states.
regex-automata/src/nfa/thompson/nfa.rs Refactored look-behind representation from a vector of state IDs to a tree structure.
regex-automata/src/nfa/thompson/compiler.rs Modified look-around compilation to incorporate look-behind offset and nesting.
regex-automata/src/nfa/thompson/builder.rs Updated builder API to construct the look-behind tree according to nesting paths.
Comments suppressed due to low confidence (3)

regex-automata/src/nfa/thompson/nfa.rs:1579

  • [nitpick] The condition using preorder in the Debug implementation is not immediately clear. Adding an inline comment to clarify its intent would help future maintainers understand the logic.
                .iter().any(|i| !i.preorder(&|e| e.start_id() != sid))

regex-automata/src/nfa/thompson/compiler.rs:1060

  • [nitpick] Clarify in a comment the assumption behind combining relative_start and maximum_len to calculate start_offset, ensuring that the resulting value correctly bounds the look-around start position.
let start_offset = match (relative_start, maximum_len) {

regex-automata/src/nfa/thompson/builder.rs:722

  • [nitpick] The new start_lookbehind API relies on a nesting path. Adding a brief example or more detailed inline comment about the structure and expected format of 'path' would improve clarity.
pub fn start_lookbehind(&mut self, start_id: StateID, offset_from_start: Option<usize>, path: &[usize]) {

Comment on lines 977 to 979
*self.lookaround_index.borrow_mut() = SmallIndex::ZERO;
*self.lookbehind_nesting_path.borrow_mut() = vec![0];
*self.current_lookbehind_offset_from_start.borrow_mut() = Some(0);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can/should be moved to the Builder?

@shilangyu shilangyu force-pushed the feat/pikevm_bounded_lb branch from 4740ceb to 7bf7ba8 Compare August 25, 2025 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant