Apache Spark development environment for macOS, Linux, and Windows (on WSL).
Follow the WSL installation instructions, then run all commands in WSL.
Open WSL by opening a PowerShell terminal and running wsl.
wsl- Download Temurin 17 from Adoptium: https://adoptium.net/installation/
- Install the
.pkgfile for Java 17
Open your WSL/Linux terminal and run:
sudo apt update
sudo apt install openjdk-17-jdk -yjava --version- Log in to GitHub.
- Go to: https://github.com/denisecase/pro-analytics-apache-starter.
- Click "Use this template" to get a copy in your GitHub account.
- In your shell ($ prompt) terminal (Windows users: inside WSL), go to your
Reposfolder, clone your new repository URL, and change directory into it:
cd ~/Repos
git clone https://github.com/YOUR_ACCOUNT/pro-analytics-apache-starter
cd pro-analytics-apache-starterRun the following commands in your shell ($ prompt) terminal. Windows users use WSL.
git pull origin main
uv python pin 3.12
uv venv
source .venv/bin/activate
uv sync --extra dev --extra docs --upgrade
uv run pre-commit installuv run python src/analytics_project/test-pyspark.pyChange your commit message to explain what was done, e.g. "add new .py file".
git add .
git commit -m "your change in quotes"
git push -u origin main
---
## Troubleshooting
### JDK Versions
If you have multiple JDK versions installed (either 17 or 21 should work), you can select which one is active using:
```shell
sudo update-alternatives --config javaIf you see:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/spark/./bin/spark-submit'
FAIL: Exception during test. See logs above.
# Check what's set
echo $SPARK_HOME
# Probably shows /opt/spark
# Temporarily unset it
unset SPARK_HOME
# Test again
uv run python src/analytics_project/test-pyspark.py