Skip to content

Commit 13609b2

Browse files
committed
Port to notebooks
1 parent ee6cec9 commit 13609b2

15 files changed

+737
-659
lines changed

Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ test:
2525
env/bin/pytest .
2626

2727
doc:
28-
env/bin/python docs/preprocess.py
28+
@pandoc --version >/dev/null 2>&1 || (echo "ERROR: pandoc is required (install via your platform's package manager)"; exit 1)
2929
env/bin/sphinx-build -b html docs docs/_build
3030
env/bin/python -m webbrowser -t "docs/_build/index.html"
3131

@@ -42,7 +42,7 @@ clean_python:
4242
find . -name '*.pyo' -exec rm -f {} +
4343
find . -name '__pycache__' -exec rm -fr {} +
4444

45-
clean: clean_docs clean_build clean_python
45+
clean: clean_doc clean_build clean_python
4646

4747
release: clean lint test
4848
env/bin/python -m build --sdist --wheel

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,21 @@ make release # upload the current version to pypi
3535
make clean # clean build and docs artifacts
3636
```
3737

38+
Building the documentation additionally requires the Pandoc package. These commands can be used
39+
to install the package on common platforms (see the
40+
[official documentation](https://pandoc.org/installing.html) for more options):
41+
42+
```sh
43+
# Linux (Debian/Ubuntu)
44+
sudo apt-get install pandoc
45+
46+
# OS X / Linux (with Homebrew)
47+
brew install pandoc
48+
49+
# Windows (with Chocolatey)
50+
choco install pandoc
51+
```
52+
3853
### Release Process
3954

4055
The release consists of multiple steps which can be all done via the GitHub website:

docs/conf.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535
"sphinx.ext.autodoc",
3636
"sphinx_autodoc_typehints",
3737
# 'matplotlib.sphinxext.plot_directive'
38-
"sphinx_exec_directive"
38+
"nbsphinx"
3939
]
4040

4141
# Add any paths that contain templates here, relative to this directory.
@@ -85,3 +85,8 @@
8585

8686
# https://pypi.org/project/sphinx-autodoc-typehints/
8787
always_document_param_types = True
88+
89+
# https://nbsphinx.readthedocs.io/
90+
nbsphinx_prompt_width = 0
91+
nbsphinx_input_prompt = "%.0s"
92+
nbsphinx_output_prompt = "%.0s"

docs/data/endpoints.csv

Lines changed: 0 additions & 29 deletions
This file was deleted.

docs/getting_started.ipynb

Lines changed: 306 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,306 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Getting started with epidatpy\n",
8+
"\n",
9+
"The epidatpy package provides access to all the endpoints of the [Delphi Epidata\n",
10+
"API](https://cmu-delphi.github.io/delphi-epidata/), and can be used to make\n",
11+
"requests for specific signals on specific dates and in select geographic\n",
12+
"regions.\n",
13+
"\n",
14+
"## Setup\n",
15+
"\n",
16+
"### Installation\n",
17+
"\n",
18+
"You can install the stable version of this package from PyPi:\n",
19+
"\n",
20+
"```\n",
21+
"pip install epidatpy\n",
22+
"```\n",
23+
"\n",
24+
"Or if you want the development version, install from GitHub:\n",
25+
"\n",
26+
"```\n",
27+
"pip install -e \"git+https://github.com/cmu-delphi/epidatpy.git#egg=epidatpy\"\n",
28+
"```\n",
29+
"\n",
30+
"\n",
31+
"### API keys\n",
32+
"\n",
33+
"The Delphi API requires a (free) API key for full functionality. While most\n",
34+
"endpoints are available without one, there are\n",
35+
"[limits on API usage for anonymous users](https://cmu-delphi.github.io/delphi-epidata/api/api_keys.html),\n",
36+
"including a rate limit.\n",
37+
"\n",
38+
"To generate your key,\n",
39+
"[register for a pseudo-anonymous account](https://api.delphi.cmu.edu/epidata/admin/registration_form).\n",
40+
"\n",
41+
"*Note* that private endpoints (i.e. those prefixed with `pvt_`) require a\n",
42+
"separate key that needs to be passed as an argument. These endpoints require\n",
43+
"specific data use agreements to access.\n",
44+
"\n",
45+
"## Basic usage\n",
46+
"\n",
47+
"Fetching data from the Delphi Epidata API is simple. Suppose we are\n",
48+
"interested in the [covidcast endpoint](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html),\n",
49+
"which provides access to a [wide range of data](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html)\n",
50+
"on COVID-19. Reviewing the endpoint documentation, we see that we\n",
51+
"[need to specify](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html#constructing-api-queries)\n",
52+
"a data source name, a signal name, a geographic level, a time resolution, and\n",
53+
"the location and times of interest.\n",
54+
"\n",
55+
"The `pub_covidcast` function lets us access the `covidcast` endpoint:"
56+
]
57+
},
58+
{
59+
"cell_type": "code",
60+
"execution_count": null,
61+
"metadata": {},
62+
"outputs": [],
63+
"source": [
64+
"from epidatpy import EpiDataContext, EpiRange\n",
65+
"import pandas as pd\n",
66+
"\n",
67+
"# Set common options and context\n",
68+
"pd.set_option('display.max_columns', None)\n",
69+
"pd.set_option('display.max_rows', None)\n",
70+
"pd.set_option('display.width', 1000)\n",
71+
"\n",
72+
"epidata = EpiDataContext(use_cache=False)\n",
73+
"\n",
74+
"# Obtain the most up-to-date version of the smoothed covid-like illness (CLI)\n",
75+
"# signal from the COVID-19 Trends and Impact survey for the US\n",
76+
"apicall = epidata.pub_covidcast(\n",
77+
" data_source = \"fb-survey\",\n",
78+
" signals = \"smoothed_cli\",\n",
79+
" geo_type = \"nation\",\n",
80+
" time_type = \"day\",\n",
81+
" geo_values = \"us\",\n",
82+
" time_values = EpiRange(20210405, 20210410))\n",
83+
"\n",
84+
"print(apicall)"
85+
]
86+
},
87+
{
88+
"cell_type": "markdown",
89+
"metadata": {},
90+
"source": [
91+
"`pub_covidcast` returns an `EpiDataCall`, which is a not-yet-executed query that can be inspected. The query can be executed and converted to a DataFrame by using the `.df()` method:\n"
92+
]
93+
},
94+
{
95+
"cell_type": "code",
96+
"execution_count": null,
97+
"metadata": {},
98+
"outputs": [],
99+
"source": [
100+
"data = apicall.df()\n",
101+
"print(data.head())"
102+
]
103+
},
104+
{
105+
"cell_type": "markdown",
106+
"metadata": {},
107+
"source": [
108+
"Each row represents one observation in the US on one\n",
109+
"day. The geographical abbreviation is given in the `geo_value` column, the date in\n",
110+
"the `time_value` column. Here `value` is the requested signal -- in this\n",
111+
"case, the smoothed estimate of the percentage of people with COVID-like\n",
112+
"illness, based on the symptom surveys, and `stderr` is its standard error.\n",
113+
"\n",
114+
"The Epidata API makes signals available at different geographic levels,\n",
115+
"depending on the endpoint. To request signals for all states instead of the\n",
116+
"entire US, we use the `geo_type` argument paired with `*` for the\n",
117+
"`geo_values` argument. (Only some endpoints allow for the use of `*` to\n",
118+
"access data at all locations. Check the help for a given endpoint to see if\n",
119+
"it supports `*`.)"
120+
]
121+
},
122+
{
123+
"cell_type": "code",
124+
"execution_count": null,
125+
"metadata": {},
126+
"outputs": [],
127+
"source": [
128+
"apicall = epidata.pub_covidcast(\n",
129+
" data_source = \"fb-survey\",\n",
130+
" signals = \"smoothed_cli\",\n",
131+
" geo_type = \"state\",\n",
132+
" time_type = \"day\",\n",
133+
" geo_values = \"*\",\n",
134+
" time_values = EpiRange(20210405, 20210410))\n",
135+
"\n",
136+
"print(apicall)\n",
137+
"print(apicall.df().head())"
138+
]
139+
},
140+
{
141+
"cell_type": "markdown",
142+
"metadata": {},
143+
"source": [
144+
"Alternatively, we can fetch the full time series for a subset of states by \n",
145+
"listing out the desired locations in the `geo_value` argument and using\n",
146+
"`*` in the `time_values` argument:"
147+
]
148+
},
149+
{
150+
"cell_type": "code",
151+
"execution_count": null,
152+
"metadata": {},
153+
"outputs": [],
154+
"source": [
155+
"apicall = epidata.pub_covidcast(\n",
156+
" data_source = \"fb-survey\",\n",
157+
" signals = \"smoothed_cli\",\n",
158+
" geo_type = \"state\",\n",
159+
" time_type = \"day\",\n",
160+
" geo_values = \"pa,ca,fl\",\n",
161+
" time_values = EpiRange(20210405, 20210410))\n",
162+
"\n",
163+
"print(apicall)\n",
164+
"print(apicall.df().head())"
165+
]
166+
},
167+
{
168+
"cell_type": "markdown",
169+
"metadata": {},
170+
"source": [
171+
"## Getting versioned data\n",
172+
"\n",
173+
"The Epidata API stores a historical record of all data, including corrections\n",
174+
"and updates, which is particularly useful for accurately backtesting\n",
175+
"forecasting models. To fetch versioned data, we can use the `as_of`\n",
176+
"argument:"
177+
]
178+
},
179+
{
180+
"cell_type": "code",
181+
"execution_count": null,
182+
"metadata": {},
183+
"outputs": [],
184+
"source": [
185+
"apicall = epidata.pub_covidcast(\n",
186+
" data_source = \"fb-survey\",\n",
187+
" signals = \"smoothed_cli\",\n",
188+
" geo_type = \"state\",\n",
189+
" time_type = \"day\",\n",
190+
" geo_values = \"pa\",\n",
191+
" time_values = EpiRange(20210405, 20210410),\n",
192+
" as_of = \"2021-06-01\")\n",
193+
"\n",
194+
"print(apicall)\n",
195+
"print(apicall.df().head())"
196+
]
197+
},
198+
{
199+
"cell_type": "markdown",
200+
"metadata": {},
201+
"source": [
202+
"## Plotting\n",
203+
"\n",
204+
"Because the output data is a standard Pandas DataFrame, we can easily plot\n",
205+
"it using any of the available Python libraries:"
206+
]
207+
},
208+
{
209+
"cell_type": "code",
210+
"execution_count": null,
211+
"metadata": {},
212+
"outputs": [],
213+
"source": [
214+
"import matplotlib.pyplot as plt\n",
215+
"\n",
216+
"plt.rcParams['figure.dpi'] = 300\n",
217+
"\n",
218+
"apicall = epidata.pub_covidcast(\n",
219+
" data_source = \"fb-survey\",\n",
220+
" signals = \"smoothed_cli\", \n",
221+
" geo_type = \"state\",\n",
222+
" geo_values = \"pa,ca,fl\",\n",
223+
" time_type = \"day\",\n",
224+
" time_values = EpiRange(20210405, 20210410))\n",
225+
"\n",
226+
"data = apicall.df()\n",
227+
"\n",
228+
"fig, ax = plt.subplots(figsize=(6, 5))\n",
229+
"ax.spines[\"right\"].set_visible(False)\n",
230+
"ax.spines[\"left\"].set_visible(False)\n",
231+
"ax.spines[\"top\"].set_visible(False)\n",
232+
"\n",
233+
"data.pivot_table(values = \"value\", index = \"time_value\", columns = \"geo_value\").plot(\n",
234+
" xlabel=\"Date\",\n",
235+
" ylabel=\"CLI\",\n",
236+
" ax = ax,\n",
237+
" linewidth = 1.5\n",
238+
")\n",
239+
"\n",
240+
"plt.title(\"Smoothed CLI from Facebook Survey\", fontsize=16)\n",
241+
"plt.subplots_adjust(bottom=.2)\n",
242+
"plt.show()"
243+
]
244+
},
245+
{
246+
"cell_type": "markdown",
247+
"metadata": {},
248+
"source": [
249+
"## Finding locations of interest\n",
250+
"\n",
251+
"Most data is only available for the US. Select endpoints report other countries at the national and/or regional levels. Endpoint descriptions explicitly state when they cover non-US locations.\n",
252+
"\n",
253+
"For endpoints that report US data, see the\n",
254+
"[geographic coding documentation](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html)\n",
255+
"for available geographic levels.\n",
256+
"\n",
257+
"## International data\n",
258+
"\n",
259+
"International data is available via\n",
260+
"\n",
261+
"- `pub_dengue_nowcast` (North and South America)\n",
262+
"- `pub_ecdc_ili` (Europe)\n",
263+
"- `pub_kcdc_ili` (Korea)\n",
264+
"- `pub_nidss_dengue` (Taiwan)\n",
265+
"- `pub_nidss_flu` (Taiwan)\n",
266+
"- `pub_paho_dengue` (North and South America)\n",
267+
"- `pvt_dengue_sensors` (North and South America)\n",
268+
"\n",
269+
"## Finding data sources and signals of interest\n",
270+
"\n",
271+
"Above we used data from [Delphi’s symptom surveys](https://delphi.cmu.edu/covid19/ctis/),\n",
272+
"but the Epidata API includes numerous data streams: medical claims data, cases\n",
273+
"and deaths, mobility, and many others. This can make it a challenge to find\n",
274+
"the data stream that you are most interested in.\n",
275+
"\n",
276+
"The Epidata documentation lists all the data sources and signals available\n",
277+
"through the API for [COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html)\n",
278+
"and for [other diseases](https://cmu-delphi.github.io/delphi-epidata/api/README.html#source-specific-parameters).\n",
279+
"\n",
280+
"## Epiweeks and dates\n",
281+
"\n",
282+
"Epiweeks use the U.S. definition. That is, the first epiweek each year is the\n",
283+
"week, starting on a Sunday, containing January 4. See [this page](https://www.cmmcp.org/mosquito-surveillance-data/pages/epi-week-calendars-2008-2021)\n",
284+
"for more information.\n",
285+
"\n",
286+
"Formatting for epiweeks is YYYYWW and for dates is YYYYMMDD.\n",
287+
"\n",
288+
"Use individual values, comma-separated lists or, a hyphenated range of values to specify single or several dates.\n",
289+
"An `EpiRange` object can be also used to construct a range of epiweeks or dates. Examples include:\n",
290+
"\n",
291+
"- `param = 201530` (A single epiweek)\n",
292+
"- `param = '201401,201501,201601'` (Several epiweeks)\n",
293+
"- `param = '200501-200552'` (A range of epiweeks)\n",
294+
"- `param = '201440,201501-201510'` (Several epiweeks, including a range)\n",
295+
"- `param = EpiRange(20070101, 20071231)` (A range of dates)"
296+
]
297+
}
298+
],
299+
"metadata": {
300+
"language_info": {
301+
"name": "python"
302+
}
303+
},
304+
"nbformat": 4,
305+
"nbformat_minor": 2
306+
}

0 commit comments

Comments
 (0)