ClickHouse-FlatFile-Sync is a data integration tool designed to allow bidirectional data transfer between ClickHouse databases and flat files (primarily CSV). Built with a modern tech stack and optimized for performance, this application provides an intuitive interface for database administrators, data engineers, and analysts to move data efficiently between different storage formats.
Whether you need to export query results with JOINs across multiple tables to CSV files or ingest data from CSV files into ClickHouse with proper column mapping and type detection, ClickHouse-FlatFile-Sync streamlines the entire process through an easy-to-use web interface.
-
ClickHouse β Flat File: Export data from ClickHouse tables to CSV files
- Support for complex queries with JOINs across multiple tables
- Column selection with custom ordering
- Custom delimiter configuration
- Optimized streaming for large datasets
- Real-time progress tracking
-
Flat File β ClickHouse: Import CSV data into new or existing ClickHouse tables
- Automatic schema detection and data type inference
- Column mapping with validation
- Batch processing with progress monitoring
- Support for various delimiters including tabs
- Handling of quoted fields in complex CSV formats
-
Multiple Authentication Methods:
- Password-based authentication
- JWT token-based authentication
- Connection pooling for optimal performance
-
Connection Testing and Validation:
- Ping functionality to verify connectivity
- Database permissions validation
- Connection health monitoring
- Detailed error reporting
-
Automatic Schema Discovery:
- Table listing and metadata exploration
- Column information with data types
- Primary key detection
- Index recommendations
-
Smart Data Type Mapping:
- Automatic CSV data type inference
- ClickHouse-specific type conversion
- Custom type mapping overrides
- Support for complex nested types
-
Modern, Responsive UI:
- Mobile and desktop friendly design
- Dark mode support
- Intuitive workflow with clear visual feedback
- Drag-and-drop file uploads
-
Progress Monitoring:
- Real-time progress indicators during data transfer
- Transfer speed metrics
- Time remaining estimation for large datasets
- Detailed operation logs
-
Advanced Error Handling:
- Detailed error messages with suggestions
- Retry mechanisms for transient errors
- Comprehensive logging and diagnostics
- Validation before operations start
-
Advanced CSV Preview:
- Support for headerless files with auto-generated column names
- Proper handling of tab-delimited files with quoted fields
- Custom delimiter detection
- Preview of large files with efficient streaming
-
Format Flexibility:
- Support for various CSV dialects
- Handling of quoted fields and escape characters
- Support for multi-line text within fields
- Custom date/time format parsing
- Java 24+ (OpenJDK or Oracle JDK)
- Node.js 18+ and NPM 9+
- ClickHouse Server (local or remote)
- Maven 3.8+ (for building from source)
- Git (for cloning the repository)
The easiest way to get started is using our Docker Compose setup:
# Clone the repository
git clone https://github.com/imshrishk/ClickHouse-FlatFile-Sync.git
cd ClickHouse-FlatFile-Sync
# Start the application (backend + frontend + clickhouse)
docker-compose up -d
This will start the application with the following services:
- Frontend: http://localhost:5173
- Backend API: http://localhost:8080/api
- ClickHouse: http://localhost:8123
# Clone the repository
git clone https://github.com/imshrishk/ClickHouse-FlatFile-Sync.git
cd ClickHouse-FlatFile-Sync/backend
# Build the application
./mvnw clean package -DskipTests
# Run the application
java -jar target/bidirectional-0.0.1-SNAPSHOT.jar
Backend will be available at http://localhost:8080/api.
# Navigate to the frontend directory
cd ../frontend
# Install dependencies
npm install
# Start the development server
npm run dev
Frontend will be available at http://localhost:5173.
Create an application.yml
file in the backend/src/main/resources
directory or use environment variables to customize the configuration:
# Server configuration
server:
port: 8080
servlet:
context-path: /api
compression:
enabled: true
mime-types: text/html,text/css,application/javascript,application/json
tomcat:
max-http-form-post-size: -1
max-swallow-size: -1
connection-timeout: 300000
# ClickHouse default connection settings (optional)
clickhouse:
default:
host: localhost
port: 8123
protocol: http
database: default
username: default
password: ""
auth-type: basic
connect-timeout: 60000
socket-timeout: 300000
# File storage configuration
file:
upload:
temp-dir: ./temp
max-size: 10GB
chunk-size: 10MB
streaming-enabled: true
Or via environment variables:
export SERVER_PORT=8080
export CLICKHOUSE_DEFAULT_HOST=localhost
export CLICKHOUSE_DEFAULT_PORT=8123
export SPRING_PROFILES_ACTIVE=docker
When using Docker, you can configure the application via environment variables in the docker-compose.yml file:
backend:
environment:
- SPRING_PROFILES_ACTIVE=docker
- CLICKHOUSE_DEFAULT_HOST=clickhouse
- CLICKHOUSE_DEFAULT_PORT=8123
- CLICKHOUSE_DEFAULT_USERNAME=default
- CLICKHOUSE_DEFAULT_PASSWORD=clickhouse
- CONFIG_FRONTEND=http://frontend:80, http://localhost:5173
- JAVA_OPTS=-Dserver.port=8080 -Dserver.servlet.context-path=/api
frontend:
environment:
- VITE_SPRING_BOOT_URL=http://backend:8080/api
-
Launch the Application:
- Open your browser and navigate to the application URL
- For local setup: http://localhost:5173
-
Choose Operation Type:
- Select "Upload to ClickHouse" (CSV β Database) or
- Select "Download from ClickHouse" (Database β CSV)
-
Database Connection:
- Configure ClickHouse connection parameters
- Test the connection to ensure connectivity
- Save connection details for future use
-
Data Selection and Mapping:
- For uploads: Select CSV file, configure delimiter, map columns
- For downloads: Select table(s), configure JOIN operations, select columns
-
Execute Transfer:
- Review configuration
- Click "Start Transfer" to begin the operation
- Monitor progress in real-time
-
View Results:
- Download generated files or
- View import summary with row counts
To export data from multiple joined tables:
- Select "Download from ClickHouse" from the main menu
- Configure your ClickHouse connection
- Select the main table from the tables list
- Click "Add Join" to add additional tables
- Configure join type (LEFT, INNER, etc.) and join conditions
- Select columns from all tables to include in the export
- Preview the data to verify query correctness
- Execute the export to download the CSV file
For large file uploads (>1GB):
- Increase JVM heap size:
java -Xmx4g -jar bidirectional-0.0.1-SNAPSHOT.jar
- Enable streaming mode in the UI by checking "Use streaming mode"
- Consider using tab-delimited files for better performance
- For very large files, use the batch import feature to process chunks
For CSV files without headers:
- Upload your file and select the appropriate delimiter
- Uncheck the "Has Header" option in the CSV preview screen
- The application will generate column names automatically (Column1, Column2, etc.)
- You can rename these columns before proceeding with the import
- Map the generated columns to appropriate ClickHouse data types
The backend provides a RESTful API that can be consumed programmatically:
Endpoint | Method | Description |
---|---|---|
/api/clickhouse/test-connection |
POST | Test connection to ClickHouse |
/api/clickhouse/tables |
POST | List available tables |
/api/clickhouse/columns |
POST | Get columns for a table |
/api/clickhouse/query-selected-columns |
POST | Query selected columns with optional JOINs |
/api/clickhouse/preview-csv |
POST | Preview CSV file contents |
/api/clickhouse/upload |
POST | Upload CSV to ClickHouse |
/api/clickhouse/download |
POST | Export ClickHouse data to CSV |
/api/clickhouse/types |
POST | Get available data types |
/api/health-test |
GET | Check API health status |
All API endpoints are accessible via the /api
context path (e.g., http://localhost:8080/api/clickhouse/test-connection
).
The application follows a modern microservices architecture:
+-------------------+ +-------------------+
| | | |
| React Frontend |<------->| Spring Backend |
| (TypeScript) | API | (Java) |
| | | |
+-------------------+ +-------------------+
|
v
+-------------------+
| |
| ClickHouse |
| Database |
| |
+-------------------+
- Frontend: React 18 with TypeScript, Vite, and modern UI components
- Backend: Spring Boot 3.4 with Java 24
- Data Processing: Custom CSV parsing and ClickHouse integration
- API Layer: RESTful endpoints with streaming support for large files
- Security: CORS configuration and authentication options
- All REST endpoints are protected against CSRF attacks
- JWT tokens are securely stored and transmitted
- Password authentication uses bcrypt hashing
- File uploads are validated for content type and size
- CORS is properly configured for secure cross-origin requests
Problem: Unable to connect to ClickHouse server.
Solution:
- Verify server URL, port, and credentials
- Check firewall settings
- Ensure ClickHouse server is running
- Check network connectivity with
ping
ortelnet
- Verify Docker network settings if using containerized setup
- Check the logs for detailed error messages:
docker-compose logs backend
Problem: Frontend not connecting to backend API, or 405 Method Not Allowed errors.
Solution:
- Ensure backend is running on the expected port (8080)
- Verify that the server context path is set to
/api
in application properties - Check that controller mappings don't duplicate the
/api
prefix - Confirm proper CORS configuration in both frontend and backend
- Ensure the Nginx configuration correctly proxies requests to the backend
- Verify environment variables for frontend API URL:
VITE_SPRING_BOOT_URL=http://backend:8080/api
- Use the built-in server health check:
http://localhost:8080/api/health-test
Problem: OutOfMemoryError
during large file processing.
Solution:
- Increase JVM heap size:
java -Xmx4g -jar bidirectional-0.0.1-SNAPSHOT.jar
- Enable streaming mode in the UI
- Process files in smaller batches
- Adjust the
file.upload.chunk-size
setting in application.yml - Use tab-delimited format for better performance with large files
- For Docker, increase container memory limits in docker-compose.yml
Problem: Data type conversion errors during import.
Solution:
- Check source data for invalid values
- Manually specify column types instead of auto-detection
- Adjust type inference settings in the UI
- Pre-process problematic columns in your CSV files
- Check for date format inconsistencies
- Review logs for specific conversion errors
Problem: Getting "Download Failed: Network Error" when trying to download data from ClickHouse.
Solution:
- This issue is typically caused by improper handling of large file downloads or CORS configuration issues
- Ensure Nginx is configured with proper timeout settings and disabled buffering for large downloads:
# Disable buffering for large responses proxy_buffering off; proxy_request_buffering off; # Increase timeouts for large operations proxy_connect_timeout 600s; proxy_send_timeout 600s; proxy_read_timeout 600s;
- Add the Access-Control-Expose-Headers to include 'Content-Disposition' and 'Content-Length':
add_header 'Access-Control-Expose-Headers' 'Content-Disposition,Content-Length,X-Line-Count' always;
- For more reliable downloads, prefer using the Fetch API over Axios in frontend code
- Use direct streaming from the backend to the client rather than creating temporary files
- If still facing issues, check network monitor in browser developer tools for specific error codes
- In Docker environments, ensure sufficient memory is allocated to containers
- Increase timeout settings in application.yml:
server: tomcat: connection-timeout: 300000 clickhouse: default: socket-timeout: 300000
To enable debug logging, add the following to your application.yml
:
logging:
level:
root: INFO
org.example.bidirectional: DEBUG
org.example.bidirectional.service: TRACE
com.clickhouse.client: DEBUG
Log files are located in:
- Linux/macOS:
./logs/application.log
- Windows:
.\logs\application.log
- Docker: Access logs via
docker-compose logs backend
The application includes specialized handling for complex CSV formats:
- Headerless Files: Automatically generates column names for files without headers
- Quoted Fields: Properly handles quoted fields in tab-delimited files
- Custom Delimiters: Supports comma, tab, semicolon, and custom delimiters
- Large File Preview: Efficiently previews large files by processing only the first portion
For large datasets, the application uses streaming to efficiently transfer data:
- Chunked Processing: Processes large files in manageable chunks
- Memory Optimization: Minimizes memory usage during large transfers
- Progress Tracking: Provides real-time progress updates during transfers
- Resumable Transfers: Supports resuming interrupted transfers
The application includes smart data type inference for CSV imports:
- Automatic Detection: Analyzes sample data to suggest appropriate types
- ClickHouse Optimization: Recommends optimal ClickHouse-specific types
- Manual Override: Allows manual type selection when needed
- Validation: Validates data against selected types before import
- Use tab-delimited format for large files (faster parsing than CSV)
- Enable streaming mode for files larger than 1GB
- Use appropriate ClickHouse data types for better query performance
- For very large datasets, consider using ClickHouse's native format instead of CSV
- Adjust batch size settings for optimal performance on your hardware
- Use SSD storage for temporary files during processing
- Monitor memory usage and adjust JVM settings accordingly