This focuses on predicting critical environmental variables for Harveston, a self-sufficient agricultural region experiencing climate shifts. Our models forecast five key climate variables to help farmers make informed decisions about planting cycles, resource allocation, and preparation for weather extremes.
- Average Temperature (°C)
- Radiation (W/m²)
- Rain Amount (mm)
- Wind Speed (km/h)
- Wind Direction (°)
/climate-forecasting
├── data/
│ ├── sample_submission.csv
│ ├── test.csv
│ └── train.csv
│
├── Notebooks_and_Scripts/
│ ├── 01_EDA_and_Analysis.ipynb # Exploratory analysis
│ ├── 02_model_training.ipynb # Model development
│ ├── utils.py
│ └── plots/ # Generated visualizations
│
├── final_submission.csv
├── technical_report.pdf
├── README.md
└── requirements.txt
The required libraries for this project are listed in requirements.txt
. The main dependencies include:
- numpy
- pandas
- scikit-learn
- lightgbm
- optuna
- matplotlib
- seaborn
- Clone this repository:
git clone https://github.com/FouetteBytes/climate-forecasting.git
cd Data_Crunch_106
- Create and activate a virtual environment (optional):
python -m venv env
source env/bin/activate # On Windows: env\Scripts\activate
- Install required packages:
pip install -r requirements.txt
Our solution employs a comprehensive approach to time series forecasting, combining advanced feature engineering with ensemble modeling techniques:
- Temperature unit standardization (Kelvin to Celsius)
- Missing value imputation using hierarchical methods
- Outlier detection and treatment
- Geographic clustering of kingdoms
- Temporal features with cyclical encoding
- Lagged variables (1-30 days)
- Rolling window statistics (multiple window sizes)
- Exponentially weighted moving averages
- Differencing features for trend removal
- Cross-feature interactions
- Primary model: LightGBM with optimized hyperparameters
- Multi-seed ensemble approach (3 seeds)
- Target-specific feature selection
- Time-series cross-validation
- Specialized handling for directional data (Wind Direction)
jupyter notebook 01_EDA_and_Analysis.ipynb
This notebook performs comprehensive data exploration, visualizing distributions, temporal patterns, and correlations between variables.
jupyter notebook 02_model_training.ipynb
This notebook implements the complete modeling pipeline:
- Data preprocessing
- Feature engineering
- Model training with hyperparameter optimization
- Ensemble prediction
- Submission file generation
This project is licensed under the MIT License - see the LICENSE file for details.