This repository serves as a collection of different web scraping projects, each designed to extract data from specific sources using various scraping techniques. Each project is structured as an independent submodule to maintain modularity and allow easy cloning and updates.
- Scrapes property listings from apartments.com.
- Uses Playwright for browser automation.
- Parses and extracts data using Selectolax.
- Enhances extracted data with LLM integration.
- Outputs structured results in JSON format.
# Clone the project separately
git clone https://github.com/nazzal5448/AiEstateScraper.git
cd AiEstateScraper
pip install -r requirements.txt
playwright install
python main.py
- Scrapes game details from an online streaming site.
- Extracts games on sale with relevant details.
- Uses Playwright and Selectolax for scraping.
- Outputs results in JSON format.
# Clone the project separately
git clone https://github.com/nazzal5448/stream-scraper.git
cd stream-scraper
pip install -r requirements.txt
playwright install
python main.py
Since each project is stored as a Git submodule, use the following command to clone everything properly:
git clone --recurse-submodules https://github.com/nazzal5448/WebScrapingProjects.git
If youβve already cloned it without submodules, run:
git submodule update --init --recursive
To update any submodule to its latest version:
git submodule update --remote --merge
Each project in this repository is independently licensed under the MIT License.
This repository is structured to help manage multiple web scraping projects efficiently while keeping them modular and maintainable. π