Skip to content

Conversation

Adamyuanyuan
Copy link
Contributor

@Adamyuanyuan Adamyuanyuan commented Aug 5, 2025

Purpose of this pull request

X2SeaTunnel is a tool for converting DataX and other configuration files to SeaTunnel configuration files, designed to help users quickly migrate from other data integration platforms to SeaTunnel.
This is the implementation of the first version.

Does this PR introduce any user-facing change?

Added a new tool with the following functions:

  • Standard Configuration Conversion: DataX → SeaTunnel configuration file conversion
  • Custom Template Conversion: Support for user-defined conversion templates
  • Detailed Conversion Reports: Generate Markdown format conversion reports
  • Regular Expression Variable Extraction: Extract variables from configuration using regex, supporting custom scenarios
  • Batch Conversion Mode: Support directory and file wildcard batch conversion, automatic report and summary report generation

Basic Usage

# Standard conversion: Use default template system with built-in common Sources and Sinks
./bin/x2seatunnel.sh -s examples/source/datax-mysql2hdfs.json -t examples/target/mysql2hdfs-result.conf -r examples/report/mysql2hdfs-report.md

# Custom task: Implement customized conversion requirements through custom templates
# Scenario: MySQL → Hive (DataX doesn't have HiveWriter)
# DataX configuration: MySQL → HDFS Custom task: Convert to MySQL → Hive
./bin/x2seatunnel.sh -s examples/source/datax-mysql2hdfs2hive.json -t examples/target/mysql2hive-result.conf -r examples/report/mysql2hive-report.md -T templates/datax/custom/mysql-to-hive.conf

# YAML configuration method (equivalent to above command line parameters)
./bin/x2seatunnel.sh -c examples/yaml/datax-mysql2hdfs2hive.yaml

# Batch conversion mode: Process by directory
./bin/x2seatunnel.sh -d examples/source -o examples/target2 -R examples/report2

# Batch mode supports wildcard filtering
./bin/x2seatunnel.sh -d examples/source -o examples/target3 -R examples/report3 --pattern "*-full.json" --verbose

# View help
./bin/x2seatunnel.sh --help

Conversion Report

After conversion is completed, view the generated Markdown report file, which includes:

  • Basic Information: Conversion time, source/target file paths, connector types, conversion status, etc.
  • Conversion Statistics: Counts and percentages of direct mappings, smart transformations, default values used, and unmapped fields
  • Detailed Field Mapping Relationships: Source values, target values, filters used for each field
  • Default Value Usage: List of all fields using default values
  • Unmapped Fields: Fields present in DataX but not converted
  • Possible Error and Warning Information: Issue prompts during conversion process

For batch conversions, a batch summary report summary.md will be generated in the batch report directory, including:

  • Conversion Overview: Overall statistics, success rate, duration, etc.
  • Successful Conversion List: Complete list of successfully converted files
  • Failed Conversion List: Failed files and error messages (if any)

How was this patch tested?

Yes, including unit tests, local tests, and production tests.

Check list

@Adamyuanyuan
Copy link
Contributor Author

#9507

@Adamyuanyuan Adamyuanyuan mentioned this pull request Aug 5, 2025
3 tasks
@Hisoka-X
Copy link
Member

Hisoka-X commented Aug 5, 2025

I think we should create new sub repository just like mcp. cc @davidzollo @TyrantLucifer

@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Aug 5, 2025
@Hisoka-X
Copy link
Member

Hisoka-X commented Aug 6, 2025

I think we should create new sub repository just like mcp. cc @davidzollo @TyrantLucifer

We need to reach a consensus in the email, please send a discussion in dev @Adamyuanyuan

@davidzollo
Copy link
Contributor

I think we should create new sub repository just like mcp. cc @davidzollo @TyrantLucifer

We need to reach a consensus in the email, please send a discussion in dev @Adamyuanyuan
Yes, we need to discuss to create a new repository for this PR.

@Adamyuanyuan you did a amazing job. Please send a discussion in [email protected], then we can create a new repository, thank you

@Adamyuanyuan
Copy link
Contributor Author

I think we should create new sub repository just like mcp. cc @davidzollo @TyrantLucifer
We need to first discuss the necessity of adding a new repo. Is it better to integrate it into the Seatunnel project or have it as a separate project?

  1. If we add a new repo, it will be inconvenient for me to reuse the current code management solution. Also, there might be a lot of work in building a new project, and I don't have much experience in this regard.
  2. Previously, we chose Java for implementation considering the reuse of the technical stack of the current Seatunnel project. If it's a new project, should we consider Python? Each has its advantages and disadvantages, but Python generally has fewer lines of code and better support for engines like Jinja2.
  3. If it's a separate project, the functionality of X2Seatunnel seems a bit thin, and I'm worried that the project value might not be sufficient.

Of course, the advantages of a separate project are obvious. It can be iterated faster and is easier to control.

@Hisoka-X
Copy link
Member

Hi @Adamyuanyuan ,
We created an new repo https://github.com/apache/seatunnel-tools.
Please raise a PR to commit the code of x2seatunnel.
Thanks!

@Hisoka-X
Copy link
Member

Hisoka-X commented Sep 2, 2025

Move to apache/seatunnel-tools#2

@Hisoka-X Hisoka-X closed this Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
Status: Doing
Development

Successfully merging this pull request may close these issues.

3 participants