Skip to content

syncpulse-solutions/satif

Repository files navigation

SATIF AI

License: MIT Python Version

AI-powered toolkit for transforming any input files into any output files with minimal code.

⚠️ Disclaimer

EXPERIMENTAL STATUS: This project is in early development and not production-ready. The API may change significantly between versions.

Installation

pip install satif-ai

Overview

SATIF AI enables automated transformation of heterogeneous data sources (CSV, Excel, PDF, XML, etc.) into any desired output format in 2 steps:

  1. Standardization: Ingests heterogeneous source files (CSV, Excel, PDF, XML, etc.) and transforms them into SDIF, a structured intermediate format.
  2. Transformation: Applies business logic to the standardized data to generate the target output files, with transformation code generated by AI.

Key Features

  • Any Format Support: Process virtually any input, even challenging unstructured content (PDFs, complex Excel sheets)
  • AI-Powered Code Generation: Automatically generate transformation code from examples and natural language instructions
  • Robust Schema Enforcement: Handle input data drift and schema inconsistencies through configurable validation
  • SQL-Based Data Processing: Query and manipulate all data using SQL
  • Decoupled Processing Stages: Standardize once, transform many times with different logic

Usage

Basic Workflow

import asyncio
from satif_ai import astandardize, atransform

async def main():
    # Step 1: Standardize input files into SDIF
    sdif_path = await astandardize(
        datasource=["data.csv", "reference.xlsx"],
        output_path="standardized.sdif",
        overwrite=True
    )

    # Step 2: Transform SDIF into desired output using AI
    await atransform(
        sdif=sdif_path,
        output_target_files="output.json",
        instructions="Extract customer IDs and purchase totals, calculate the average purchase value per customer, and output as JSON with customer_id and avg_purchase_value fields.",
        llm_model="o4-mini"  # Choose AI model based on needs
    )

if __name__ == "__main__":
    asyncio.run(main())

Architecture

┌─────────────────┐     ┌───────────────────────┐     ┌─────────────────┐
│  Source Files   │────▶│ Standardization Layer │────▶│   SDIF File     │
│ CSV/Excel/PDF/  │     │                       │     │ (SQLite-based)  │
│ XML/JSON/etc.   │     └───────────────────────┘     └────────┬────────┘
└─────────────────┘                                            │
                                                               │
┌─────────────────┐     ┌───────────────────────┐              │
│  Output Files   │◀────│  Transformation Layer │◀─────────────┘
│ Any format      │     │  (AI-generated code)  │
└─────────────────┘     └───────────────────────┘

SDIF (Standardized Data Interoperable Format) is the intermediate SQLite-based format that:

  • Stores structured tables alongside JSON objects and binary media
  • Maintains rich metadata about data origins and relationships
  • Provides direct SQL queryability for complex transformations

Documentation

For detailed documentation, examples, and advanced features, visit SATIF Documentation.

Contributing

Contributions are welcome! Whether it's bug reports, feature requests, or code contributions, please feel free to get involved.

Contribution Workflow

  1. Fork the repository on GitHub.

  2. Clone your fork locally:

    git clone https://github.com/syncpulse-solutions/satif.git
    cd satif/libs/ai
  3. Create a new branch for your feature or bug fix:

    git checkout -b feature/your-feature-name

    or

    git checkout -b fix/your-bug-fix-name
  4. Set up the development environment as described in the From Source (for Development) section:

    make install  # or poetry install
  5. Make your changes. Ensure your code follows the project's style guidelines.

  6. Format and lint your code:

    make format
    make lint
  7. Run type checks:

    make typecheck
  8. Run tests to ensure your changes don't break existing functionality:

    make test

    To also generate a coverage report:

    make coverage
  9. Commit your changes with a clear and descriptive commit message.

  10. Push your changes to your fork on GitHub:

    git push origin feature/your-feature-name
  11. Submit a Pull Request (PR) to the main branch of the original syncpulse-solutions/satif repository.

License

This project is licensed under the MIT License.

Maintainer: Bryan Djafer ([email protected])

About

AI toolkit for transforming any input files into any output files with minimal code.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published