Important UPDATE: As of February 2023, the overcomingbias blog has moved to substack. As a result, this web scraper no longer works. Apologies for the inconvenience!
obscraper lets you scrape blog posts and associated metadata from the
overcomingbias blog.
It's easy to get a single post:
>>> import obscraper
>>> intro_url = 'https://www.overcomingbias.com/2006/11/introduction.html'
>>> post = obscraper.get_post_by_url(intro_url)
>>> post.title
'How To Join'
>>> post.plaintext
'How can we better believe what is true? ...'
>>> post.internal_links
{'http://www.overcomingbias.com/2007/02/moderate_modera.html': 1,
'http://www.overcomingbias.com/2006/12/contributors_be.html': 1}
>>> post.comments
20
Or a full list of post URLs and edit dates:
>>> import obscraper
>>> edit_dates = obscraper.get_edit_dates()
...
>>> len(edit_dates)
4352
>>> {url: str(edit_dates[url]) for url in list(edit_dates)[:5]}
{'2022/01/much-talk-is-sales-patter':
'2022-01-14 20:46:35+00:00',
'2022/01/old-man-rant':
'2022-01-13 15:21:33+00:00',
'2022/01/my-11-bets-at-10-1-odds-on-10m-covid-deaths-by-2022':
'2022-01-12 19:15:10+00:00',
'2022/01/to-innovate-unify-or-fragment':
'2022-01-11 01:03:44+00:00',
'2022/01/on-what-is-advice-useful':
'2022-01-10 18:46:26+00:00'}
- Get posts by their URLs or edit dates, or get all posts hosted on the overcomingbias site
- Provides detailed post metadata including post URLs, titles, authors, tags, publish dates, and last edit dates
- Provides summary of post content including full post text as HTML or plaintext, and a list of hyperlinks to other overcomingbias posts
- Asynchronous execution and caching for fast downloads
- Use via
import obscraperor the simple command line interface - Comprehensively tested
- Supports python 3.8+
Read the full documentation here, including the Installation and Getting Started Guide and the Public API Reference.
Please use the GitHub issue tracker to submit bugs or request features.
See the Changelog for a list of fixes and enhancements at each version.
Copyright (c) 2022 Christopher McDonald
Distributed under the terms of the MIT license.
All overcomingbias posts are copyright the original authors.