cmu-delphi
diff --git a/‎Makefile‎
Lines changed: 2 additions & 2 deletions b/‎Makefile‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎README.md‎
Lines changed: 15 additions & 0 deletions b/‎README.md‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎docs/conf.py‎
Lines changed: 6 additions & 1 deletion b/‎docs/conf.py‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎docs/data/endpoints.csv‎
Lines changed: 0 additions & 29 deletions b/‎docs/data/endpoints.csv‎
Lines changed: 0 additions & 29 deletions
diff --git a/‎docs/getting_started.ipynb‎
Lines changed: 306 additions & 0 deletions b/‎docs/getting_started.ipynb‎
Lines changed: 306 additions & 0 deletions
@@ -25,7 +25,7 @@ test:
 	env/bin/pytest .
 
 doc:
-	env/bin/python docs/preprocess.py
+	@pandoc --version >/dev/null 2>&1 || (echo "ERROR: pandoc is required (install via your platform's package manager)"; exit 1)
 	env/bin/sphinx-build -b html docs docs/_build
 	env/bin/python -m webbrowser -t "docs/_build/index.html"
 
@@ -42,7 +42,7 @@ clean_python:
 	find . -name '*.pyo' -exec rm -f {} +
 	find . -name '__pycache__' -exec rm -fr {} +
 
-clean: clean_docs clean_build clean_python
+clean: clean_doc clean_build clean_python
 
 release: clean lint test
 	env/bin/python -m build --sdist --wheel
 
@@ -35,6 +35,21 @@ make release  # upload the current version to pypi
 make clean    # clean build and docs artifacts
 ```
 
+Building the documentation additionally requires the Pandoc package. These commands can be used
+to install the package on common platforms (see the
+[official documentation](https://pandoc.org/installing.html) for more options):
+
+```sh
+# Linux (Debian/Ubuntu)
+sudo apt-get install pandoc
+
+# OS X / Linux (with Homebrew)
+brew install pandoc
+
+# Windows (with Chocolatey)
+choco install pandoc
+```
+
 ### Release Process
 
 The release consists of multiple steps which can be all done via the GitHub website:
 
@@ -35,7 +35,7 @@
     "sphinx.ext.autodoc",
     "sphinx_autodoc_typehints",
     # 'matplotlib.sphinxext.plot_directive'
-    "sphinx_exec_directive"
+    "nbsphinx"
 ]
 
 # Add any paths that contain templates here, relative to this directory.
@@ -85,3 +85,8 @@
 
 # https://pypi.org/project/sphinx-autodoc-typehints/
 always_document_param_types = True
+
+# https://nbsphinx.readthedocs.io/
+nbsphinx_prompt_width = 0
+nbsphinx_input_prompt = "%.0s"
+nbsphinx_output_prompt = "%.0s"
@@ -0,0 +1,306 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Getting started with epidatpy\n",
+    "\n",
+    "The epidatpy package provides access to all the endpoints of the [Delphi Epidata\n",
+    "API](https://cmu-delphi.github.io/delphi-epidata/), and can be used to make\n",
+    "requests for specific signals on specific dates and in select geographic\n",
+    "regions.\n",
+    "\n",
+    "## Setup\n",
+    "\n",
+    "### Installation\n",
+    "\n",
+    "You can install the stable version of this package from PyPi:\n",
+    "\n",
+    "```\n",
+    "pip install epidatpy\n",
+    "```\n",
+    "\n",
+    "Or if you want the development version, install from GitHub:\n",
+    "\n",
+    "```\n",
+    "pip install -e \"git+https://github.com/cmu-delphi/epidatpy.git#egg=epidatpy\"\n",
+    "```\n",
+    "\n",
+    "\n",
+    "### API keys\n",
+    "\n",
+    "The Delphi API requires a (free) API key for full functionality. While most\n",
+    "endpoints are available without one, there are\n",
+    "[limits on API usage for anonymous users](https://cmu-delphi.github.io/delphi-epidata/api/api_keys.html),\n",
+    "including a rate limit.\n",
+    "\n",
+    "To generate your key,\n",
+    "[register for a pseudo-anonymous account](https://api.delphi.cmu.edu/epidata/admin/registration_form).\n",
+    "\n",
+    "*Note* that private endpoints (i.e. those prefixed with `pvt_`) require a\n",
+    "separate key that needs to be passed as an argument. These endpoints require\n",
+    "specific data use agreements to access.\n",
+    "\n",
+    "## Basic usage\n",
+    "\n",
+    "Fetching data from the Delphi Epidata API is simple. Suppose we are\n",
+    "interested in the [covidcast endpoint](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html),\n",
+    "which provides access to a [wide range of data](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html)\n",
+    "on COVID-19. Reviewing the endpoint documentation, we see that we\n",
+    "[need to specify](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html#constructing-api-queries)\n",
+    "a data source name, a signal name, a geographic level, a time resolution, and\n",
+    "the location and times of interest.\n",
+    "\n",
+    "The `pub_covidcast` function lets us access the `covidcast` endpoint:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from epidatpy import EpiDataContext, EpiRange\n",
+    "import pandas as pd\n",
+    "\n",
+    "# Set common options and context\n",
+    "pd.set_option('display.max_columns', None)\n",
+    "pd.set_option('display.max_rows', None)\n",
+    "pd.set_option('display.width', 1000)\n",
+    "\n",
+    "epidata = EpiDataContext(use_cache=False)\n",
+    "\n",
+    "# Obtain the most up-to-date version of the smoothed covid-like illness (CLI)\n",
+    "# signal from the COVID-19 Trends and Impact survey for the US\n",
+    "apicall = epidata.pub_covidcast(\n",
+    "    data_source = \"fb-survey\",\n",
+    "    signals = \"smoothed_cli\",\n",
+    "    geo_type = \"nation\",\n",
+    "    time_type = \"day\",\n",
+    "    geo_values = \"us\",\n",
+    "    time_values = EpiRange(20210405, 20210410))\n",
+    "\n",
+    "print(apicall)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`pub_covidcast` returns an `EpiDataCall`, which is a not-yet-executed query that can be inspected. The query can be executed and converted to a DataFrame by using the `.df()` method:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data = apicall.df()\n",
+    "print(data.head())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Each row represents one observation in the US on one\n",
+    "day. The geographical abbreviation is given in the `geo_value` column, the date in\n",
+    "the `time_value` column. Here `value` is the requested signal -- in this\n",
+    "case, the smoothed estimate of the percentage of people with COVID-like\n",
+    "illness, based on the symptom surveys, and `stderr` is its standard error.\n",
+    "\n",
+    "The Epidata API makes signals available at different geographic levels,\n",
+    "depending on the endpoint. To request signals for all states instead of the\n",
+    "entire US, we use the `geo_type` argument paired with `*` for the\n",
+    "`geo_values` argument. (Only some endpoints allow for the use of `*` to\n",
+    "access data at all locations. Check the help for a given endpoint to see if\n",
+    "it supports `*`.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "apicall = epidata.pub_covidcast(\n",
+    "    data_source = \"fb-survey\",\n",
+    "    signals = \"smoothed_cli\",\n",
+    "    geo_type = \"state\",\n",
+    "    time_type = \"day\",\n",
+    "    geo_values = \"*\",\n",
+    "    time_values = EpiRange(20210405, 20210410))\n",
+    "\n",
+    "print(apicall)\n",
+    "print(apicall.df().head())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Alternatively, we can fetch the full time series for a subset of states by \n",
+    "listing out the desired locations in the `geo_value` argument and using\n",
+    "`*` in the `time_values` argument:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "apicall = epidata.pub_covidcast(\n",
+    "    data_source = \"fb-survey\",\n",
+    "    signals = \"smoothed_cli\",\n",
+    "    geo_type = \"state\",\n",
+    "    time_type = \"day\",\n",
+    "    geo_values = \"pa,ca,fl\",\n",
+    "    time_values = EpiRange(20210405, 20210410))\n",
+    "\n",
+    "print(apicall)\n",
+    "print(apicall.df().head())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Getting versioned data\n",
+    "\n",
+    "The Epidata API stores a historical record of all data, including corrections\n",
+    "and updates, which is particularly useful for accurately backtesting\n",
+    "forecasting models. To fetch versioned data, we can use the `as_of`\n",
+    "argument:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "apicall = epidata.pub_covidcast(\n",
+    "    data_source = \"fb-survey\",\n",
+    "    signals = \"smoothed_cli\",\n",
+    "    geo_type = \"state\",\n",
+    "    time_type = \"day\",\n",
+    "    geo_values = \"pa\",\n",
+    "    time_values = EpiRange(20210405, 20210410),\n",
+    "    as_of = \"2021-06-01\")\n",
+    "\n",
+    "print(apicall)\n",
+    "print(apicall.df().head())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Plotting\n",
+    "\n",
+    "Because the output data is a standard Pandas DataFrame, we can easily plot\n",
+    "it using any of the available Python libraries:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "plt.rcParams['figure.dpi'] = 300\n",
+    "\n",
+    "apicall = epidata.pub_covidcast(\n",
+    "    data_source = \"fb-survey\",\n",
+    "    signals = \"smoothed_cli\", \n",
+    "    geo_type = \"state\",\n",
+    "    geo_values = \"pa,ca,fl\",\n",
+    "    time_type = \"day\",\n",
+    "    time_values = EpiRange(20210405, 20210410))\n",
+    "\n",
+    "data = apicall.df()\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize=(6, 5))\n",
+    "ax.spines[\"right\"].set_visible(False)\n",
+    "ax.spines[\"left\"].set_visible(False)\n",
+    "ax.spines[\"top\"].set_visible(False)\n",
+    "\n",
+    "data.pivot_table(values = \"value\", index = \"time_value\", columns = \"geo_value\").plot(\n",
+    "    xlabel=\"Date\",\n",
+    "    ylabel=\"CLI\",\n",
+    "    ax = ax,\n",
+    "    linewidth = 1.5\n",
+    ")\n",
+    "\n",
+    "plt.title(\"Smoothed CLI from Facebook Survey\", fontsize=16)\n",
+    "plt.subplots_adjust(bottom=.2)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Finding locations of interest\n",
+    "\n",
+    "Most data is only available for the US. Select endpoints report other countries at the national and/or regional levels. Endpoint descriptions explicitly state when they cover non-US locations.\n",
+    "\n",
+    "For endpoints that report US data, see the\n",
+    "[geographic coding documentation](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html)\n",
+    "for available geographic levels.\n",
+    "\n",
+    "## International data\n",
+    "\n",
+    "International data is available via\n",
+    "\n",
+    "- `pub_dengue_nowcast` (North and South America)\n",
+    "- `pub_ecdc_ili` (Europe)\n",
+    "- `pub_kcdc_ili` (Korea)\n",
+    "- `pub_nidss_dengue` (Taiwan)\n",
+    "- `pub_nidss_flu` (Taiwan)\n",
+    "- `pub_paho_dengue` (North and South America)\n",
+    "- `pvt_dengue_sensors` (North and South America)\n",
+    "\n",
+    "## Finding data sources and signals of interest\n",
+    "\n",
+    "Above we used data from [Delphi’s symptom surveys](https://delphi.cmu.edu/covid19/ctis/),\n",
+    "but the Epidata API includes numerous data streams: medical claims data, cases\n",
+    "and deaths, mobility, and many others. This can make it a challenge to find\n",
+    "the data stream that you are most interested in.\n",
+    "\n",
+    "The Epidata documentation lists all the data sources and signals available\n",
+    "through the API for [COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html)\n",
+    "and for [other diseases](https://cmu-delphi.github.io/delphi-epidata/api/README.html#source-specific-parameters).\n",
+    "\n",
+    "## Epiweeks and dates\n",
+    "\n",
+    "Epiweeks use the U.S. definition. That is, the first epiweek each year is the\n",
+    "week, starting on a Sunday, containing January 4. See [this page](https://www.cmmcp.org/mosquito-surveillance-data/pages/epi-week-calendars-2008-2021)\n",
+    "for more information.\n",
+    "\n",
+    "Formatting for epiweeks is YYYYWW and for dates is YYYYMMDD.\n",
+    "\n",
+    "Use individual values, comma-separated lists or, a hyphenated range of values to specify single or several dates.\n",
+    "An `EpiRange` object can be also used to construct a range of epiweeks or dates. Examples include:\n",
+    "\n",
+    "- `param = 201530` (A single epiweek)\n",
+    "- `param = '201401,201501,201601'` (Several epiweeks)\n",
+    "- `param = '200501-200552'` (A range of epiweeks)\n",
+    "- `param = '201440,201501-201510'` (Several epiweeks, including a range)\n",
+    "- `param = EpiRange(20070101, 20071231)` (A range of dates)"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}