openedx · bmedx · Jan 24, 2019 · Jan 17, 2019 · Jan 23, 2019 · Jan 24, 2019
diff --git a/docs/configuration.rst b/docs/configuration.rst
@@ -7,13 +7,15 @@ Configuring Code Annotations is a pretty simple affair. Here is an example showi
 
     source_path: /path/to/be/searched/
     report_path: /path/to/write/report/to/
+    safelist_path: .annotation_safe_list.yml
+    coverage_target: 100.0
     annotations:
-        name_of_annotation: ".. annotation_token::"
-        another_annotation: ".. annotation_token2::"
-        choice_annotation:
+        ".. annotation_token::":
+        ".. annotation_token2::":
+        ".. choice_annotation::":
             choices: [choice_1, choice_2, choice_3]
         name_of_annotation_group:
-            - ".. first_group_token::"
+            - ".. first_group_token::":
             - ".. second_group_token::":
                 choices: [choice_4, choice_5]
             - ".. third_group_token::":
@@ -32,12 +34,23 @@ Configuring Code Annotations is a pretty simple affair. Here is an example showi
 ``report_path``
     The directory where the YAML report file will be written. If it does not exist, it will be created.
 
+``safelist_path``
+    The path to a safelist, used by the Django Search tool to find annotations in models that are defined outside of
+    the local source tree. See :doc:`safelist` for more information.
+
+``coverage_target``
+    A number from 0 - 100 that represents the percentage of Django models in the project that should have annotations.
+    The Django Search tool will fail when run with the ``--coverage`` option if the covered percentage is below this
+    number. See :doc:`django_coverage` for more information.
+
 ``annotations``
     The definition of annotations to be searched for. There are two types of annotations.
 
-    - Basic, or comment, annotations such as ``name_of_annotation`` and ``another_annotation`` above, allow for
-      free-form text following the annotation itself. At this time the comment must be all on one line to be included
-      in the report. Multi-line annotation comments are not yet supported.
+    - Basic, or comment, annotations such as ``annotation_token`` and ``first_group_token`` above, allow for
+      free-form text following the annotation itself. Note the colon after the annotation token! In configuration this
+      type is a mapping type, mapping to a null value.
+
+      Note: At this time the comment must be all on one line. Multi-line annotation comments are not yet supported.
 
     - Choice annotations, such as ``choice_annotation``, ``second_group_token`` and ``third_group_token``, limit the
       potential values of the annotation to the ones listed in ``choices``. This can help enforce consistency across the
@@ -46,9 +59,9 @@ Configuring Code Annotations is a pretty simple affair. Here is an example showi
 
     In addition to the two types of annotations, it is also possible to group several annotations together into a fixed
     structure. In our example ``name_of_annotation_group`` is a group consisting of 3 annotations. When grouped, all
-    of the annotations in the group **must** be present, or linting will fail. The order of the grouping does not matter
-    except that the first annotation in the group **must** come first in the comments. Only one of each annotation is
-    allowed in a group. See :doc:`writing_annotations` for more information and examples.
+    of the annotations in the group **must** be present, or linting will fail. The order of the grouping does not
+    matter as long as all of them are found before any other annotations. Only one of each annotation is allowed in a
+    group. See :doc:`writing_annotations` for more information and examples.
 
 ``extensions``
     Code Annotations uses Stevedore extensions to extend the capability of finding new language comments. Language

diff --git a/docs/django_search.rst b/docs/django_search.rst
@@ -0,0 +1,122 @@
+Django Model Search Tool
+------------------------
+
+code_annotations django_find_annotations::
+    Usage: code_annotations django_find_annotations [OPTIONS]
+
+    Subcommand for dealing with annotations in Django models.
+
+    --config_file FILE                Path to the configuration file
+    --seed_safelist
+                                      Generate an initial safelist file based on
+                                      the current Django environment.  [default:
+                                      False]
+
+    --list_local_models
+                                      List all locally defined models (in the
+                                      current repo) that require annotations.
+                                      [default: False]
+
+    --report_path TEXT              Location to write the report
+    -v                              Verbosity level (-v through -vvv)
+    --lint                          Enable or disable linting checks  [default:
+                                      False]
+    --report                        Enable or disable writing the report
+                                      [default: False]
+    --coverage                      Enable or disable coverage checks  [default:
+                                      False]
+    --help                          Show this message and exit.
+
+
+Overview
+========
+The Django Model Search Tool, or Django Tool, is written to provide more structured searching and validation in a place
+where data is often stored. Since all of the models in a package can be enumerated it is possible, though not required,
+to use this tool to positively assert that **all** concrete (non-proxy, non-abstract) models in a project are annotated
+in some way. If you do not need this functionality and simply want to find annotations and create a report, the static
+search tool is much easier to configure and can search all of your code (instead of just model docstrings).
+
+.. important::
+    To use the Django tool you must first set the ``DJANGO_SETTINGS_MODULE`` environment variable to point to
+    a valid settings file. The tool will initialize Django and use its introspection to find models. The settings file
+    should have ``INSTALLED_APPS`` configured for all Django apps that you wish to have annotated. See the
+    `Django Docs`_ for details.
+
+.. _Django Docs: https://docs.djangoproject.com/en/dev/topics/settings/#designating-the-settings
+
+The edX use case which prompted the creation of this tool is evident in many of our tests and code samples. It is to
+be able to track the storage, use, and retirement of personally identifiable information (PII) across our many projects
+and repositories. Since the majority of our information is stored via Django models, this tool helps us make sure that
+at least all of those are annotated to assert whether they contain PII or not.
+
+The tool works by actually running your Django app or project in a development-like environment. It then uses Django's
+introspection tools to find all installed apps and enumerate their models. Each model further enumerates its inheritance
+tree and all model docstrings are checked for annotations. All annotations in all models and their ancestors are
+added to the list.
+
+The Safelist
+============
+In order to assert that **all** concrete models in a project are annotated, it is also necessary to be able to annotate
+models that are otherwise installed in the Python virtual environment and are not part of your source tree. Models in
+your source tree are called "local models", and ones otherwise installed in the Python environment are "non-local"
+models. In order to annotate non-local models, which may come from other repositories or PyPI packages, use the
+"safelist" feature.
+
+"Safe" in safelist doesn't mean that the models themselves do not require annotation, but rather it gives developers a
+place to annotate those models and put them in a known state. When setting up a repository to use the Django tool, you
+should use the ``--seed_safelist`` option to generate an initial safelist template that contains empty entries for all
+non-local models. In order for those models to count as "covered", you must add annotations to them in the safelist.
+
+An freshly created safelist:
+
+.. code-block:: yaml
+
+    social_django.Association: {}
+    social_django.Code: {}
+
+And one that has been annotated:
+
+.. code-block:: yaml
+
+    social_django.Association:
+      ".. no_pii::": "This model has no PII"
+    social_django.Code:
+      ".. pii::": "Email address"
+      ".. pii_types::": other
+      ".. pii_retirement::": local_api
+
+.. note::
+    Note that each model can only have one annotation for each token type. For example, it would be invalid to add a
+    second ``.. no_pii::`` annotation to ``social_django.Association``.
+
+.. important::
+    Some types of "local" models are procedurally generated and do not have files in code, e.g. models created by
+    django-simple-history. In those unusual circumstances you can choose to annotate them in the safelist to make
+    sure they are covered.
+
+Coverage
+========
+The second unique part of the Django tool is the model coverage report and check. Since we are able to find all models
+in a project with a reasonable degree of accuracy we can target a percentage of them that must be annotated. When you
+run the tool with the ``--coverage`` option it will compare the percentage of annotated models against the configuration
+variable ``coverage_target``. If the ``coverage_target`` is not met the search will fail and a list of the un-annotated
+models will be displayed.
+
+Having annotations at any level of a model's inheritance will result in that model being considered "covered".
+
+Lint and Report
+===============
+This tool supports the same ``--lint`` and ``--report`` options as the :doc:`static_search` tool, and
+they are functionally the same. Linting will fail on malformed annotations found in model docstrings, such as bad
+choices or incomplete groups. Reporting will write out a report file in the same format as the Static Tool, but with
+some additional information in the ``extra`` key such as the ``model_id``, which is a string in the format of
+"parentApp.ModelClassName", as Django uses to represent models internally. It also has the full model docstring in
+``full_comment``.
+
+If a model inherits from another model that has annotations, those annotations will be included in the report under the
+child model's name, as well as any annotations in the model itself.
+
+Local Models
+============
+Finally, to help find models in the local source tree that still need to be annotated, the tool has a
+``--list_local_models`` option. This will output the model id of all models that still need to be annotated.
diff --git a/docs/extensions.rst b/docs/extensions.rst
@@ -3,7 +3,8 @@ Extensions
 
 Code Annotations uses `Stevedore`_ to allow new lanuages to be statically searched in an easily extensible fashion. All
 language searches, even the ones that come by default, are implemented as extensions. A language extension is
-responsible for finding all comments in files of the given type.
+responsible for finding all comments in files of the given type. Note that extensions are only used in the Static Search
+and not in Django Search, as Django models are obviously all written in Python.
 
 .. _Stevedore: https://docs.openstack.org/stevedore/latest/
 
@@ -13,20 +14,13 @@ be fully functional. This is how the Javascript and Python extensions work, see
 
 If a language has more than one single-line or multi-line comment type you may need to work at the lower level and
 inherit from ``AnnotationExtension``. ``SimpleRegexAnnotationExtension`` inherits from ``AnnotationExtension`` and
-serve as an example.
+serves as an example.
 
 When inheriting from ``AnnotationExtension`` you must override:
 
 ``extension_name`` - A unique name for your extension, usually the name of the language it supports. This must match the
     name given in ``setup.py`` or ``setup.cfg`` (see below).
 
-``_add_annotation_token`` - On construction this will be called once for each single annotation that is configured,
-    allowing you to do any setup necessary to find these tokens in your search.
-
-``_add_annotation_group`` - On construction this will be called once for each annotation group that is configured,
-    allowing you to do any setup necessary to find these tokens in your search. Note that annotations in a group are
-    *not* also sent to ``_add_annotation_token``, though you can do that yourself.
-
 ``search`` - Called to search for all annotations in a given file. Takes an open file handle, returns a list of dicts.
     Extensions do not need to worry about linting groups or choices, just returning all found annotations in the order
     they were discovered in the file.
@@ -40,7 +34,8 @@ When inheriting from ``AnnotationExtension`` you must override:
         'filename': name of the file passed in (available from file_handle.name),
         'line_number': line number of the beginning of the comment,
         'annotation_token': the annotation token,
-        'annotation_data': the rest of the text after the annotation token (choices do not need to be split out here)
+        'annotation_data': the rest of the text after the annotation token (choices do not need to be split out here),
+        'extra': a dict containing any additional information your extension would like to include in the report
     }
 
 In order to test your extension you will need to install it into your Python environment or virtualenv. First you must
@@ -58,4 +53,3 @@ define it as an entry point in your setup.py (or setup.cfg). The entry point nam
 Then you can simply ``pip install -e .`` from your project directory. If all goes well you should see your extension
 being loaded when you run the static annotation tool with the `-vv` or `-vvv` option. For your extension to work you
 will also need to add it to the ``extensions`` section of your configuration file.
-
diff --git a/docs/getting_started.rst b/docs/getting_started.rst
@@ -24,23 +24,29 @@ following is an example of a minimal configuration file. See ``.annotations_samp
 
 .. code-block:: yaml
 
-    # Path that you wish to search, can be passed on the command line.
-    # Directories will be searched recursively, can also point to a single file.
-    source_path: ../path/to/be/searched/
+    # Path that you wish to static search, can be passed on the command line
+    # Directories will be searched recursively, but this can also point to a single file
+    source_path: ../
 
-    # Directory to write the report to, can be passed on the command line.
-    report_path: /path/to/write/report/to/
+    # Directory to write the report to, can be passed on the command line
+    report_path: reports
 
-    # Definitions of the annotations to search for.
+    # Path to the Django annotation safelist file
+    safelist_path: .annotation_safe_list.yml
+
+    # Percentage of Django models which must have annotations in order to pass coverage checking
+    coverage_target: 50.0
+
+    # Definitions of the annotations to search for. Notice the trailing colon, this is a mapping type!
+    # For more information see "Writing Annotations"
     annotations:
-        name-of-annotation: ".. annotation_token::"
+        ".. annotation_token::":
 
     # Code Annotations extensions to load and the file extensions to map them to
     extensions:
         python:
             - py
-        javascript:
-            - js
+
 
 Create some annotations
 -----------------------
@@ -79,10 +85,10 @@ your favorite text editor to make sure all of your annotations were found. Diffe
 this command, try ``-v``, ``-vv``, and ``-vvv`` to assist in debugging. ``--help`` will provide information on all of
 the available options.
 
-By default the annotation search will perform linting, which makes sure that any found annotations match the structure
-listed in configuration. If any issues are found the command will fail with no report written, otherwise a YAML file
-containing the results of the search will be written to your ``report_path``. Both linting and reporting features can be
-turned off via command line flags.
+By default the static annotation search will perform linting, which makes sure that any found annotations match the
+structure listed in configuration. If any issues are found the command will fail with no report written, otherwise a
+YAML file containing the results of the search will be written to your ``report_path``. Both linting and reporting
+features can be turned off via command line flags.
 
 Add more structure to your annotations
 --------------------------------------

diff --git a/docs/index.rst b/docs/index.rst
@@ -15,6 +15,8 @@ Contents:
    readme
    getting_started
    writing_annotations
+   static_search
+   django_search
    configuration
    extensions
    testing

diff --git a/docs/static_search.rst b/docs/static_search.rst
@@ -0,0 +1,57 @@
+Static Search Tool
+------------------
+
+code_annotations static_find_annotations::
+    Usage: code_annotations static_find_annotations [OPTIONS]
+
+      Subcommand to find annotations via static file analysis.
+
+    Options:
+      --config_file FILE      Path to the configuration file
+      --source_path PATH      Location of the source code to search
+      --report_path TEXT      Location to write the report
+      -v, --verbosity         Verbosity level (-v through -vvv)
+      --lint                  Enable or disable linting checks  [default: True]
+      --report                Enable or disable writing the report file  [default: True]
+      --help                  Show this message and exit.
+
+Overview
+========
+The Static Search Tool, or Static Tool, is written as an extensible way to find annotations in code. The tool performs
+static analysis on the files themselves instead of relying on the language's runtime and introspection. It
+will optionally write a report file in YAML, and optionally check for annotation validity (linting).
+
+Linting
+=======
+When passed the ``--lint`` option, each annotation will be checked for the following:
+
+- Choice annotations must have one or more of the configured choices
+- Groups must have their annotations occur consecutively, though their order doesn't matter
+
+If any of these checks fails, all errors will be printed and the return code of the command will be non-zero. If the
+``--report`` option was also provided no report will be written.
+
+Reporting
+=========
+The YAML report is the main output of the Static Tool. It is a simple YAML document that contains a list of found
+annotations, grouped by file. Each annotation entry has the following keys:
+
+.. code-block:: yaml
+
+    {
+        'found_by': 'python',  # The name of the extension which found the annotation
+        'filename': 'foo/bar/file.py',  # The filename where the extension was found
+        'line_number': 101,  # The line number of the beginning of the comment which contained the annotation
+        'annotation_token': '.. no_pii::',  # The annotation token found
+        'annotation_data': 'This model contains no PII.',  # The comment, or choices, found with the annotation token
+    }
+
+Extensions can also send back some additional data in an ``extra`` key, if desired. The Django Model Search Tool does
+this to return the Django app and model name.
+
+Extensions
+==========
+The Static Tool uses Stevedore named extensions to allow for language-specific functionality. Python and Javascript
+extensions are included, many others can be made easily as needed. We will gladly accept pull requests for new languages
+or you can release them yourself on PyPI. For more information on extensions, see :doc:`extensions` and
+:doc:`configuration`.