AI-powered toolkit for transforming any input files into any output files with minimal code.
EXPERIMENTAL STATUS: This project is in early development and not production-ready. The API may change significantly between versions.
pip install satif-aiSATIF AI enables automated transformation of heterogeneous data sources (CSV, Excel, PDF, XML, etc.) into any desired output format in 2 steps:
- Standardization: Ingests heterogeneous source files (CSV, Excel, PDF, XML, etc.) and transforms them into SDIF, a structured intermediate format.
- Transformation: Applies business logic to the standardized data to generate the target output files, with transformation code generated by AI.
- Any Format Support: Process virtually any input, even challenging unstructured content (PDFs, complex Excel sheets)
- AI-Powered Code Generation: Automatically generate transformation code from examples and natural language instructions
- Robust Schema Enforcement: Handle input data drift and schema inconsistencies through configurable validation
- SQL-Based Data Processing: Query and manipulate all data using SQL
- Decoupled Processing Stages: Standardize once, transform many times with different logic
import asyncio
from satif_ai import astandardize, atransform
async def main():
# Step 1: Standardize input files into SDIF
sdif_path = await astandardize(
datasource=["data.csv", "reference.xlsx"],
output_path="standardized.sdif",
overwrite=True
)
# Step 2: Transform SDIF into desired output using AI
await atransform(
sdif=sdif_path,
output_target_files="output.json",
instructions="Extract customer IDs and purchase totals, calculate the average purchase value per customer, and output as JSON with customer_id and avg_purchase_value fields.",
llm_model="o4-mini" # Choose AI model based on needs
)
if __name__ == "__main__":
asyncio.run(main())┌─────────────────┐ ┌───────────────────────┐ ┌─────────────────┐
│ Source Files │────▶│ Standardization Layer │────▶│ SDIF File │
│ CSV/Excel/PDF/ │ │ │ │ (SQLite-based) │
│ XML/JSON/etc. │ └───────────────────────┘ └────────┬────────┘
└─────────────────┘ │
│
┌─────────────────┐ ┌───────────────────────┐ │
│ Output Files │◀────│ Transformation Layer │◀─────────────┘
│ Any format │ │ (AI-generated code) │
└─────────────────┘ └───────────────────────┘
SDIF (Standardized Data Interoperable Format) is the intermediate SQLite-based format that:
- Stores structured tables alongside JSON objects and binary media
- Maintains rich metadata about data origins and relationships
- Provides direct SQL queryability for complex transformations
For detailed documentation, examples, and advanced features, visit SATIF Documentation.
Contributions are welcome! Whether it's bug reports, feature requests, or code contributions, please feel free to get involved.
-
Fork the repository on GitHub.
-
Clone your fork locally:
git clone https://github.com/syncpulse-solutions/satif.git cd satif/libs/ai -
Create a new branch for your feature or bug fix:
git checkout -b feature/your-feature-name
or
git checkout -b fix/your-bug-fix-name
-
Set up the development environment as described in the From Source (for Development) section:
make install # or poetry install -
Make your changes. Ensure your code follows the project's style guidelines.
-
Format and lint your code:
make format make lint
-
Run type checks:
make typecheck
-
Run tests to ensure your changes don't break existing functionality:
make testTo also generate a coverage report:
make coverage
-
Commit your changes with a clear and descriptive commit message.
-
Push your changes to your fork on GitHub:
git push origin feature/your-feature-name
-
Submit a Pull Request (PR) to the
mainbranch of the originalsyncpulse-solutions/satifrepository.
This project is licensed under the MIT License.
Maintainer: Bryan Djafer ([email protected])