Skip to content

Conversation

@dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented Jun 16, 2025

Description

This PR introduces a new api module containing the UnifiedQueryPlanner class, which provides a high-level interface for parsing and planning PPL queries. This module is designed to support external consumers such as Spark and CLI without exposing Calcite or OpenSearch internals. README and unit tests are included to document usage and verify correctness.

Related Issues

Resolves #3734

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dai-chen dai-chen self-assigned this Jun 16, 2025
@dai-chen dai-chen added the enhancement New feature or request label Jun 16, 2025
@dai-chen dai-chen changed the title Add unified query API module for external integration Add unified query API for external integration Jun 18, 2025
.cacheSchema(true)
.build();

RelNode plan = planner.plan("source = opensearch.test");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does one execute the plan after they receive it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Currently, the plan isn’t directly executable. As noted in the README, the planner is designed to eventually return an executable plan—either a Calcite physical plan for immediate execution in the current JVM (useful for the OpenSearch plugin and CLI), or a SparkSQL plan for distributed execution by Spark (useful for PPL in Spark).

I initially considered designing the API this way, but haven’t yet found a clean way to model everything within Calcite’s optimizer. I plan to work on this later, especially since PPL in Spark Phase 2 may require it.

@dai-chen
Copy link
Collaborator Author

@LantaoJin @penghuo Please have a look when you have a moment. This is currently only for initial phase in opensearch-project/opensearch-spark#1136 so we can begin publishing PRs on Spark side. Thanks!

@dai-chen
Copy link
Collaborator Author

There seems flaky test.

2025-06-24T17:39:21.3968640Z 3577 tests completed, 1 failed, 540 skipped
2025-06-24T17:39:21.3969870Z Tests with failures:
2025-06-24T17:39:21.4097140Z  - org.opensearch.sql.calcite.tpch.CalcitePPLTpchIT.testQ19

@dai-chen dai-chen merged commit c0858b5 into opensearch-project:feature/unified-ppl Jun 24, 2025
27 of 29 checks passed
@dai-chen dai-chen deleted the add-unified-query-api-module branch June 24, 2025 18:01
dai-chen added a commit to dai-chen/sql-1 that referenced this pull request Jun 24, 2025
* Add api module with API and UT

Signed-off-by: Chen Dai <[email protected]>

* Refactor catalog API and clean up build.gradle

Signed-off-by: Chen Dai <[email protected]>

* Add cache schema API and refactor UT

Signed-off-by: Chen Dai <[email protected]>

* Add readme

Signed-off-by: Chen Dai <[email protected]>

* Add comment for hardcoding query size limit

Signed-off-by: Chen Dai <[email protected]>

* Add default namespace API with more UTs

Signed-off-by: Chen Dai <[email protected]>

---------

Signed-off-by: Chen Dai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants