Skip to content

Conversation

@RGBCube
Copy link

@RGBCube RGBCube commented Nov 13, 2025

This PR marks the start of the new Helix DSL, in Python.

The reason we chose Python instead of creating a new language is because:

  1. Python is already very familiar to any data scientist, ML engineer, or LLM.
  2. We don't lose any performance, as we can efficiently compile this DSL into static Rust code. It's kind of similar to sympy in the way it constructs the database representation. There is no Python at runtime, it's strictly a compile-time only and used to construct an AST/DB schema.
  3. There's nothing we would like to represent in the Python DSL that we can't in a custom language, so creating a new tokenizer, parser, MIR, generator, LSP is not required when you can write a python library that does this all in less code.

Greptile Overview

Greptile Summary

Introduces new Python DSL for HelixDB as a compile-time schema definition language that compiles to static Rust code. The example demonstrates a bookstore RAG system with chapters, subchapters, and vector embeddings.

Critical Issues:

  • @index decorator on line 62 will cause NameError (not imported)
  • searchdocs_rag retrieves wrong node type - should get SubChapter nodes from vectors, not Chapter nodes

Style Inconsistencies:

  • Property naming differs from old DSL (index vs chapter_index)
  • Vector dimension specification missing in ArgSubchapter.chunk
  • Misleading variable name in searchdocs_rag (chapters should be subchapters)

The Python DSL approach is sound for compile-time schema generation, but this example needs fixes before it can run successfully.

Important Files Changed

File Analysis

Filename Score Overview
examples/bookstore.py 2/5 New Python DSL example for bookstore RAG system - contains critical syntax error (@index decorator) and logic error in vector search query that retrieves wrong node type

Sequence Diagram

sequenceDiagram
    participant User
    participant PythonDSL as Python DSL (bookstore.py)
    participant HelixCompiler as Helix Compiler
    participant Database as HelixDB

    User->>PythonDSL: Define schema (Chapter, SubChapter, EmbeddingVector)
    PythonDSL->>HelixCompiler: Compile to AST
    HelixCompiler->>Database: Generate static Rust code
    
    User->>Database: Call loaddocs_rag(chapters)
    loop For each chapter
        Database->>Database: Create Chapter node
        loop For each subchapter
            Database->>Database: Create SubChapter node with embedding
            Database->>Database: Create Contains edge (Chapter→SubChapter)
        end
    end
    Database-->>User: Return "Success"
    
    User->>Database: Call searchdocs_rag(query, k)
    Database->>Database: Search vectors by similarity
    Database->>Database: Traverse to SubChapter nodes
    Database-->>User: Return subchapter data
Loading

@RGBCube RGBCube marked this pull request as draft November 13, 2025 18:17
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

db = helix.Db()

class Chapter(db.Node):
@index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: @index decorator not imported - will cause NameError at runtime

Suggested change
@index
index: helix.I64
Prompt To Fix With AI
This is a comment left during a code review.
Path: examples/bookstore.py
Line: 62:62

Comment:
**syntax:** `@index` decorator not imported - will cause NameError at runtime

```suggestion
    index: helix.I64
```

How can I resolve this? If you propose a fix, please make it concise.

def searchdocs_rag(query: helix.Vector, k: helix.I32) -> helix.Iterator[dict[str, helix.Value]]:
# TODO
vecs = db.search_vector(query, k)
chapters = vecs.incoming_nodes[Contains]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: wrong node type retrieved - based on schema, vectors are embedded in SubChapter nodes, not Chapter nodes. Old DSL shows subchapters <- vecs::In<EmbeddingOf>

Suggested change
chapters = vecs.incoming_nodes[Contains]
subchapters = vecs.incoming_nodes[SubChapter]
Prompt To Fix With AI
This is a comment left during a code review.
Path: examples/bookstore.py
Line: 106:106

Comment:
**logic:** wrong node type retrieved - based on schema, vectors are embedded in `SubChapter` nodes, not `Chapter` nodes. Old DSL shows `subchapters <- vecs::In<EmbeddingOf>`

```suggestion
    subchapters = vecs.incoming_nodes[SubChapter]
```

How can I resolve this? If you propose a fix, please make it concise.

class ArgSubchapter(helix.Struct):
title: helix.String
content: helix.String
chunk: helix.Vector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: helix.Vector lacks dimension specification - should match EmbeddingVector (1536 dimensions) for consistency

Prompt To Fix With AI
This is a comment left during a code review.
Path: examples/bookstore.py
Line: 84:84

Comment:
**style:** `helix.Vector` lacks dimension specification - should match `EmbeddingVector` (1536 dimensions) for consistency

How can I resolve this? If you propose a fix, please make it concise.

@db.query
def loaddocs_rag(chapters: helix.List[ArgChapter]) -> str:
for c in chapters:
c_node = db.add_node(Chapter(index=c.id))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: property name mismatch - old DSL uses chapter_index (line 31), new uses index

Prompt To Fix With AI
This is a comment left during a code review.
Path: examples/bookstore.py
Line: 89:89

Comment:
**style:** property name mismatch - old DSL uses `chapter_index` (line 31), new uses `index`

How can I resolve this? If you propose a fix, please make it concise.

# TODO
vecs = db.search_vector(query, k)
chapters = vecs.incoming_nodes[Contains]
return chapters.map(lambda c: {"index": c.index})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: variable name misleading - these are subchapters, not chapters

Prompt To Fix With AI
This is a comment left during a code review.
Path: examples/bookstore.py
Line: 107:107

Comment:
**style:** variable name misleading - these are subchapters, not chapters

How can I resolve this? If you propose a fix, please make it concise.

@el-yawd
Copy link

el-yawd commented Nov 13, 2025

dumb question: Can users call third-party, access the std library etc? If not, maybe we should use another file suffix to avoid confusion

@el-yawd
Copy link

el-yawd commented Nov 13, 2025

dumb question: Can users call third-party, access the std library etc? If not, maybe we should use another file suffix to avoid confusion

AFAIC we're just stealing Python's syntax right?

@RGBCube
Copy link
Author

RGBCube commented Nov 13, 2025

dumb question: Can users call third-party, access the std library etc? If not, maybe we should use another file suffix to avoid confusion

They can, but I don't think everything will work (we will try to error when it doesn't). But for example, a function such as:

def foo(x): return x ** 2 // 9

Will work perfectly fine, as you can give it an input that records all the operations and creates a recording of the AST basically.

AFAIC we're just stealing Python's syntax right?

We aren't just stealing Python's syntax, this is literally Python code that will get run by a Python runtime, and by abusing how dynamic Python is we will construct a DB schema and a query AST.

@misterclayt0n
Copy link
Contributor

how are you thinking about enforcing the DSL boundary? I thought about validating the module via an AST pass and only allowing constructs that are statically analyzable. however, would arbitrary python code raise an error or just be ignored for the sake of simplicity?

@RGBCube RGBCube closed this Nov 21, 2025
@RGBCube RGBCube deleted the python-dsl branch November 21, 2025 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants