Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 23 additions & 10 deletions docs/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,15 @@ Configuring Code Annotations is a pretty simple affair. Here is an example showi

source_path: /path/to/be/searched/
report_path: /path/to/write/report/to/
safelist_path: .annotation_safe_list.yml
coverage_target: 100.0
annotations:
name_of_annotation: ".. annotation_token::"
another_annotation: ".. annotation_token2::"
choice_annotation:
".. annotation_token::":
".. annotation_token2::":
".. choice_annotation::":
choices: [choice_1, choice_2, choice_3]
name_of_annotation_group:
- ".. first_group_token::"
- ".. first_group_token::":
- ".. second_group_token::":
choices: [choice_4, choice_5]
- ".. third_group_token::":
Expand All @@ -32,12 +34,23 @@ Configuring Code Annotations is a pretty simple affair. Here is an example showi
``report_path``
The directory where the YAML report file will be written. If it does not exist, it will be created.

``safelist_path``
The path to a safelist, used by the Django Search tool to find annotations in models that are defined outside of
the local source tree. See :doc:`safelist` for more information.

``coverage_target``
A number from 0 - 100 that represents the percentage of Django models in the project that should have annotations.
The Django Search tool will fail when run with the ``--coverage`` option if the covered percentage is below this
number. See :doc:`django_coverage` for more information.

``annotations``
The definition of annotations to be searched for. There are two types of annotations.

- Basic, or comment, annotations such as ``name_of_annotation`` and ``another_annotation`` above, allow for
free-form text following the annotation itself. At this time the comment must be all on one line to be included
in the report. Multi-line annotation comments are not yet supported.
- Basic, or comment, annotations such as ``annotation_token`` and ``first_group_token`` above, allow for
free-form text following the annotation itself. Note the colon after the annotation token! In configuration this
type is a mapping type, mapping to a null value.

Note: At this time the comment must be all on one line. Multi-line annotation comments are not yet supported.

- Choice annotations, such as ``choice_annotation``, ``second_group_token`` and ``third_group_token``, limit the
potential values of the annotation to the ones listed in ``choices``. This can help enforce consistency across the
Expand All @@ -46,9 +59,9 @@ Configuring Code Annotations is a pretty simple affair. Here is an example showi

In addition to the two types of annotations, it is also possible to group several annotations together into a fixed
structure. In our example ``name_of_annotation_group`` is a group consisting of 3 annotations. When grouped, all
of the annotations in the group **must** be present, or linting will fail. The order of the grouping does not matter
except that the first annotation in the group **must** come first in the comments. Only one of each annotation is
allowed in a group. See :doc:`writing_annotations` for more information and examples.
of the annotations in the group **must** be present, or linting will fail. The order of the grouping does not
matter as long as all of them are found before any other annotations. Only one of each annotation is allowed in a
group. See :doc:`writing_annotations` for more information and examples.

``extensions``
Code Annotations uses Stevedore extensions to extend the capability of finding new language comments. Language
Expand Down
122 changes: 122 additions & 0 deletions docs/django_search.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
Django Model Search Tool
------------------------

code_annotations django_find_annotations::
Usage: code_annotations django_find_annotations [OPTIONS]

Subcommand for dealing with annotations in Django models.

--config_file FILE Path to the configuration file
--seed_safelist
Generate an initial safelist file based on
the current Django environment. [default:
False]

--list_local_models
List all locally defined models (in the
current repo) that require annotations.
[default: False]

--report_path TEXT Location to write the report
-v Verbosity level (-v through -vvv)
--lint Enable or disable linting checks [default:
False]
--report Enable or disable writing the report
[default: False]
--coverage Enable or disable coverage checks [default:
False]
--help Show this message and exit.


Overview
========
The Django Model Search Tool, or Django Tool, is written to provide more structured searching and validation in a place
where data is often stored. Since all of the models in a package can be enumerated it is possible, though not required,
to use this tool to positively assert that **all** concrete (non-proxy, non-abstract) models in a project are annotated
in some way. If you do not need this functionality and simply want to find annotations and create a report, the static
search tool is much easier to configure and can search all of your code (instead of just model docstrings).

.. important::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This important should be bolded/capitalized? It seems strange as it is now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a Sphinx admonition. In the processed docs it will highlight the text and put a border around it: http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#rst-directives

To use the Django tool you must first set the ``DJANGO_SETTINGS_MODULE`` environment variable to point to
a valid settings file. The tool will initialize Django and use its introspection to find models. The settings file
should have ``INSTALLED_APPS`` configured for all Django apps that you wish to have annotated. See the
`Django Docs`_ for details.

.. _Django Docs: https://docs.djangoproject.com/en/dev/topics/settings/#designating-the-settings

The edX use case which prompted the creation of this tool is evident in many of our tests and code samples. It is to
be able to track the storage, use, and retirement of personally identifiable information (PII) across our many projects
and repositories. Since the majority of our information is stored via Django models, this tool helps us make sure that
at least all of those are annotated to assert whether they contain PII or not.

The tool works by actually running your Django app or project in a development-like environment. It then uses Django's
introspection tools to find all installed apps and enumerate their models. Each model further enumerates its inheritance
tree and all model docstrings are checked for annotations. All annotations in all models and their ancestors are
added to the list.

The Safelist
============
In order to assert that **all** concrete models in a project are annotated, it is also necessary to be able to annotate
models that are otherwise installed in the Python virtual environment and are not part of your source tree. Models in
your source tree are called "local models", and ones otherwise installed in the Python environment are "non-local"
models. In order to annotate non-local models, which may come from other repositories or PyPI packages, use the
"safelist" feature.

"Safe" in safelist doesn't mean that the models themselves do not require annotation, but rather it gives developers a
place to annotate those models and put them in a known state. When setting up a repository to use the Django tool, you
should use the ``--seed_safelist`` option to generate an initial safelist template that contains empty entries for all
non-local models. In order for those models to count as "covered", you must add annotations to them in the safelist.

An freshly created safelist:

.. code-block:: yaml

social_django.Association: {}
social_django.Code: {}

And one that has been annotated:

.. code-block:: yaml

social_django.Association:
".. no_pii::": "This model has no PII"
social_django.Code:
".. pii::": "Email address"
".. pii_types::": other
".. pii_retirement::": local_api

.. note::
Note that each model can only have one annotation for each token type. For example, it would be invalid to add a
second ``.. no_pii::`` annotation to ``social_django.Association``.

.. important::
Some types of "local" models are procedurally generated and do not have files in code, e.g. models created by
django-simple-history. In those unusual circumstances you can choose to annotate them in the safelist to make
sure they are covered.

Coverage
========
The second unique part of the Django tool is the model coverage report and check. Since we are able to find all models
in a project with a reasonable degree of accuracy we can target a percentage of them that must be annotated. When you
run the tool with the ``--coverage`` option it will compare the percentage of annotated models against the configuration
variable ``coverage_target``. If the ``coverage_target`` is not met the search will fail and a list of the un-annotated
models will be displayed.

Having annotations at any level of a model's inheritance will result in that model being considered "covered".

Lint and Report
===============
This tool supports the same ``--lint`` and ``--report`` options as the :doc:`static_search` tool, and
they are functionally the same. Linting will fail on malformed annotations found in model docstrings, such as bad
choices or incomplete groups. Reporting will write out a report file in the same format as the Static Tool, but with
some additional information in the ``extra`` key such as the ``model_id``, which is a string in the format of
"parentApp.ModelClassName", as Django uses to represent models internally. It also has the full model docstring in
``full_comment``.

If a model inherits from another model that has annotations, those annotations will be included in the report under the
child model's name, as well as any annotations in the model itself.

Local Models
============
Finally, to help find models in the local source tree that still need to be annotated, the tool has a
``--list_local_models`` option. This will output the model id of all models that still need to be annotated.
16 changes: 5 additions & 11 deletions docs/extensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ Extensions

Code Annotations uses `Stevedore`_ to allow new lanuages to be statically searched in an easily extensible fashion. All
language searches, even the ones that come by default, are implemented as extensions. A language extension is
responsible for finding all comments in files of the given type.
responsible for finding all comments in files of the given type. Note that extensions are only used in the Static Search
and not in Django Search, as Django models are obviously all written in Python.

.. _Stevedore: https://docs.openstack.org/stevedore/latest/

Expand All @@ -13,20 +14,13 @@ be fully functional. This is how the Javascript and Python extensions work, see

If a language has more than one single-line or multi-line comment type you may need to work at the lower level and
inherit from ``AnnotationExtension``. ``SimpleRegexAnnotationExtension`` inherits from ``AnnotationExtension`` and
serve as an example.
serves as an example.

When inheriting from ``AnnotationExtension`` you must override:

``extension_name`` - A unique name for your extension, usually the name of the language it supports. This must match the
name given in ``setup.py`` or ``setup.cfg`` (see below).

``_add_annotation_token`` - On construction this will be called once for each single annotation that is configured,
allowing you to do any setup necessary to find these tokens in your search.

``_add_annotation_group`` - On construction this will be called once for each annotation group that is configured,
allowing you to do any setup necessary to find these tokens in your search. Note that annotations in a group are
*not* also sent to ``_add_annotation_token``, though you can do that yourself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

``search`` - Called to search for all annotations in a given file. Takes an open file handle, returns a list of dicts.
Extensions do not need to worry about linting groups or choices, just returning all found annotations in the order
they were discovered in the file.
Expand All @@ -40,7 +34,8 @@ When inheriting from ``AnnotationExtension`` you must override:
'filename': name of the file passed in (available from file_handle.name),
'line_number': line number of the beginning of the comment,
'annotation_token': the annotation token,
'annotation_data': the rest of the text after the annotation token (choices do not need to be split out here)
'annotation_data': the rest of the text after the annotation token (choices do not need to be split out here),
'extra': a dict containing any additional information your extension would like to include in the report
}

In order to test your extension you will need to install it into your Python environment or virtualenv. First you must
Expand All @@ -58,4 +53,3 @@ define it as an entry point in your setup.py (or setup.cfg). The entry point nam
Then you can simply ``pip install -e .`` from your project directory. If all goes well you should see your extension
being loaded when you run the static annotation tool with the `-vv` or `-vvv` option. For your extension to work you
will also need to add it to the ``extensions`` section of your configuration file.

32 changes: 19 additions & 13 deletions docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,23 +24,29 @@ following is an example of a minimal configuration file. See ``.annotations_samp

.. code-block:: yaml

# Path that you wish to search, can be passed on the command line.
# Directories will be searched recursively, can also point to a single file.
source_path: ../path/to/be/searched/
# Path that you wish to static search, can be passed on the command line
# Directories will be searched recursively, but this can also point to a single file
source_path: ../
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this example use ./ or ../? Julia once copied this verbatim into the edx-notes-api pii config, and so we had to change it to ./ to only search the local source checkout.


# Directory to write the report to, can be passed on the command line.
report_path: /path/to/write/report/to/
# Directory to write the report to, can be passed on the command line
report_path: reports

# Definitions of the annotations to search for.
# Path to the Django annotation safelist file
safelist_path: .annotation_safe_list.yml

# Percentage of Django models which must have annotations in order to pass coverage checking
coverage_target: 50.0

# Definitions of the annotations to search for. Notice the trailing colon, this is a mapping type!
# For more information see "Writing Annotations"
annotations:
name-of-annotation: ".. annotation_token::"
".. annotation_token::":

# Code Annotations extensions to load and the file extensions to map them to
extensions:
python:
- py
javascript:
- js


Create some annotations
-----------------------
Expand Down Expand Up @@ -79,10 +85,10 @@ your favorite text editor to make sure all of your annotations were found. Diffe
this command, try ``-v``, ``-vv``, and ``-vvv`` to assist in debugging. ``--help`` will provide information on all of
the available options.

By default the annotation search will perform linting, which makes sure that any found annotations match the structure
listed in configuration. If any issues are found the command will fail with no report written, otherwise a YAML file
containing the results of the search will be written to your ``report_path``. Both linting and reporting features can be
turned off via command line flags.
By default the static annotation search will perform linting, which makes sure that any found annotations match the
structure listed in configuration. If any issues are found the command will fail with no report written, otherwise a
YAML file containing the results of the search will be written to your ``report_path``. Both linting and reporting
features can be turned off via command line flags.

Add more structure to your annotations
--------------------------------------
Expand Down
2 changes: 2 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ Contents:
readme
getting_started
writing_annotations
static_search
django_search
configuration
extensions
testing
Expand Down
57 changes: 57 additions & 0 deletions docs/static_search.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
Static Search Tool
------------------

code_annotations static_find_annotations::
Usage: code_annotations static_find_annotations [OPTIONS]

Subcommand to find annotations via static file analysis.

Options:
--config_file FILE Path to the configuration file
--source_path PATH Location of the source code to search
--report_path TEXT Location to write the report
-v, --verbosity Verbosity level (-v through -vvv)
--lint Enable or disable linting checks [default: True]
--report Enable or disable writing the report file [default: True]
--help Show this message and exit.

Overview
========
The Static Search Tool, or Static Tool, is written as an extensible way to find annotations in code. The tool performs
static analysis on the files themselves instead of relying on the language's runtime and introspection. It
will optionally write a report file in YAML, and optionally check for annotation validity (linting).

Linting
=======
When passed the ``--lint`` option, each annotation will be checked for the following:

- Choice annotations must have one or more of the configured choices
- Groups must have their annotations occur consecutively, though their order doesn't matter

If any of these checks fails, all errors will be printed and the return code of the command will be non-zero. If the
``--report`` option was also provided no report will be written.

Reporting
=========
The YAML report is the main output of the Static Tool. It is a simple YAML document that contains a list of found
annotations, grouped by file. Each annotation entry has the following keys:

.. code-block:: yaml
{
'found_by': 'python', # The name of the extension which found the annotation
'filename': 'foo/bar/file.py', # The filename where the extension was found
'line_number': 101, # The line number of the beginning of the comment which contained the annotation
'annotation_token': '.. no_pii::', # The annotation token found
'annotation_data': 'This model contains no PII.', # The comment, or choices, found with the annotation token
}
Extensions can also send back some additional data in an ``extra`` key, if desired. The Django Model Search Tool does
this to return the Django app and model name.

Extensions
==========
The Static Tool uses Stevedore named extensions to allow for language-specific functionality. Python and Javascript
extensions are included, many others can be made easily as needed. We will gladly accept pull requests for new languages
or you can release them yourself on PyPI. For more information on extensions, see :doc:`extensions` and
:doc:`configuration`.
Loading