Skip to content

Conversation

@el-yawd
Copy link

@el-yawd el-yawd commented Nov 19, 2025

Description

What began with a simple change to apply zero-copy on data end up in a saga.

This PR is entirely and unashamed based on the incredible crate hannoy, the intent with the rewrite is to make the vector core easier to work with, more performant (by reducing allocations and copies), and for free we gained future support for quantization and other distance methods. There is only three caveats:

  1. Hannoy uses u32 for vector ids, but we use u128 as a global id. To don't loose hannoy's optimizations I chose to map the "global" u128 to a "local" u32 id, and back-and-forth. Probably there is a better approach here, so any light on this is welcome.
  2. It doesn't support filtering search for now, I do plan to add it while in draft, but only after having a good "normal" search working.
  3. Again for performance reasons it switches the current vector representation from f64 to f32, since it seems to be the default approach among vector databases.

Checklist when merging to main

  • No compiler warnings (if applicable)
  • Code is formatted with rustfmt
  • No useless or dead code (if applicable)
  • Code is easy to understand
  • Doc comments are used for all functions, enums, structs, and fields (where appropriate)
  • All tests pass
  • Performance has not regressed (assuming change was not to fix a bug)
  • Version number has been updated in helix-cli/Cargo.toml and helixdb/Cargo.toml

pantShrey and others added 6 commits November 13, 2025 19:10
## Description
<!-- Provide a brief description of the changes in this PR -->
implementing docker build reading env vars 
## Related Issues
<!-- Link to any related issues using #issue_number -->

Closes #

## Checklist when merging to main
<!-- Mark items with "x" when completed -->

- [ ] No compiler warnings (if applicable)
- [ ] Code is formatted with `rustfmt`
- [ ] No useless or dead code (if applicable)
- [ ] Code is easy to understand
- [ ] Doc comments are used for all functions, enums, structs, and
fields (where appropriate)
- [ ] All tests pass
- [ ] Performance has not regressed (assuming change was not to fix a
bug)
- [ ] Version number has been updated in `helix-cli/Cargo.toml` and
`helixdb/Cargo.toml`

## Additional Notes
<!-- Add any additional information that would be helpful for reviewers
-->
xav-db and others added 14 commits November 19, 2025 10:54
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
## Description
<!-- Provide a brief description of the changes in this PR -->

## Related Issues
<!-- Link to any related issues using #issue_number -->

Closes #

## Checklist when merging to main
<!-- Mark items with "x" when completed -->

- [ ] No compiler warnings (if applicable)
- [ ] Code is formatted with `rustfmt`
- [ ] No useless or dead code (if applicable)
- [ ] Code is easy to understand
- [ ] Doc comments are used for all functions, enums, structs, and
fields (where appropriate)
- [ ] All tests pass
- [ ] Performance has not regressed (assuming change was not to fix a
bug)
- [ ] Version number has been updated in `helix-cli/Cargo.toml` and
`helixdb/Cargo.toml`

## Additional Notes
<!-- Add any additional information that would be helpful for reviewers
-->

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<h3>Greptile Summary</h3>


This PR fixes critical panic issues in HQL's EXISTS operation and field
validation logic. The changes replace unsafe `unwrap()` calls with
proper error handling using match statements.

## Key Changes

- **Panic Prevention**: Replaced `.unwrap()` calls with match statements
in `infer_expr_type.rs` to gracefully handle missing schema fields/types
instead of panicking
- **EXISTS Validation**: Added validation to ensure identifiers used in
EXISTS operations are either in scope or are query parameters (generates
E301 error if not)
- **Code Generation Fix**: Fixed EXISTS optimization in `bool_ops.rs` to
only apply when source step is `Identifier` or `Anonymous`, preventing
incorrect code generation
- **Test Coverage**: Added comprehensive test case with cloud queries
schema to verify EXISTS operation works correctly

## Impact

The changes improve robustness by preventing runtime panics when users
write queries with invalid field names or types that don't match the
schema. Errors are now properly reported during compilation rather than
causing panics.

<details><summary><h3>Important Files Changed</h3></summary>



File Analysis



| Filename | Score | Overview |
|----------|-------|----------|
| helix-db/src/helixc/analyzer/methods/infer_expr_type.rs | 4/5 |
replaced unwrap() calls with proper match statements to prevent panics
when fields/types don't exist in schema; added validation for EXISTS
identifiers to ensure they exist in scope or are parameters |
| helix-db/src/helixc/generator/bool_ops.rs | 5/5 | added check to
ensure EXISTS optimization only applies to Identifier/Anonymous source
steps, preventing incorrect code generation |
| hql-tests/tests/cloud_queries_2/queries.hx | 5/5 | added test query
for EXISTS operation with User node and github_id parameter |

</details>


</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant User as HQL Query
    participant Analyzer as Type Inference Analyzer
    participant Validator as Field Validator
    participant Generator as Code Generator
    participant Schema as Schema Store
    
    User->>Analyzer: EXISTS(N<User>({github_id: github_id}))
    Analyzer->>Analyzer: infer_expr_type() for EXISTS
    Analyzer->>Validator: Check identifier in scope or param
    alt Identifier not in scope and not param
        Validator-->>Analyzer: Generate E301 error
    end
    Analyzer->>Schema: Get node_fields for User type
    alt Type doesn't exist
        Schema-->>Analyzer: None
        Note over Analyzer: Skip validation, error already generated
    else Type exists
        Schema-->>Analyzer: Some(fields)
        Analyzer->>Schema: Get field github_id
        alt Field doesn't exist
            Schema-->>Analyzer: None
            Note over Analyzer: Skip validation, error already generated
        else Field exists
            Schema-->>Analyzer: Some(field)
            Analyzer->>Validator: Validate value type matches field type
            alt Type mismatch
                Validator-->>Analyzer: Generate E205 error
            end
        end
    end
    Analyzer->>Generator: Generate traversal code
    Generator->>Generator: Check if optimization applies
    Note over Generator: Only optimize if source is<br/>Identifier or Anonymous
    Generator-->>User: Generated Rust code
```
</details>


<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->
…elixDB#713)

## Description
<!-- Provide a brief description of the changes in this PR -->
tested locally with failing init and add 

## Related Issues
<!-- Link to any related issues using #issue_number -->

Closes #

## Checklist when merging to main
<!-- Mark items with "x" when completed -->

- [ ] No compiler warnings (if applicable)
- [ ] Code is formatted with `rustfmt`
- [ ] No useless or dead code (if applicable)
- [ ] Code is easy to understand
- [ ] Doc comments are used for all functions, enums, structs, and
fields (where appropriate)
- [ ] All tests pass
- [ ] Performance has not regressed (assuming change was not to fix a
bug)
- [ ] Version number has been updated in `helix-cli/Cargo.toml` and
`helixdb/Cargo.toml`

## Additional Notes
<!-- Add any additional information that would be helpful for reviewers
-->
@el-yawd el-yawd force-pushed the hnsw-fr branch 3 times, most recently from 9610560 to b40fa3a Compare November 22, 2025 04:25
@el-yawd el-yawd force-pushed the hnsw-fr branch 5 times, most recently from dac2fd4 to a0cfd81 Compare November 22, 2025 16:18
the idea is to delay the HVector creation as much as possible, since its
creation is expensive
Still room for improvements since it does two allocations for vector :/
Yeah, this sucks. Althought the current approach incrementally rebuilds the index, it's way more expensive than it should
Only supports Cosine for now, but we definitely want to support more
distance methods
@el-yawd el-yawd force-pushed the hnsw-fr branch 4 times, most recently from 83a5991 to 02f02fe Compare November 24, 2025 02:02
What's life but a collection of tests to pass?
get_item bound HVector to 'txn lifetime, doing zero-copy, but this
complicates some traversal operations. So now, when data is required, we
allocates using the arena and therefore binds HVector's lifetime to the
arena
@el-yawd el-yawd changed the base branch from main to rocks-impl November 25, 2025 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants