diff --git a/docs/source/_static/images/tutorial_osi_toga.png b/docs/source/_static/images/tutorial_osi_toga.png
new file mode 100644
index 000000000..10f6a84a4
Binary files /dev/null and b/docs/source/_static/images/tutorial_osi_toga.png differ
diff --git a/docs/source/_static/images/tutorial_semver_7.6.2_report.png b/docs/source/_static/images/tutorial_semver_7.7.2_report.png
similarity index 100%
rename from docs/source/_static/images/tutorial_semver_7.6.2_report.png
rename to docs/source/_static/images/tutorial_semver_7.7.2_report.png
diff --git a/docs/source/_static/images/tutorial_toga_github.png b/docs/source/_static/images/tutorial_toga_github.png
new file mode 100644
index 000000000..d2c567837
Binary files /dev/null and b/docs/source/_static/images/tutorial_toga_github.png differ
diff --git a/docs/source/_static/images/tutorial_toga_local.png b/docs/source/_static/images/tutorial_toga_local.png
new file mode 100644
index 000000000..9de063eaa
Binary files /dev/null and b/docs/source/_static/images/tutorial_toga_local.png differ
diff --git a/docs/source/_static/images/tutorial_toga_pypi.png b/docs/source/_static/images/tutorial_toga_pypi.png
new file mode 100644
index 000000000..4c8ab7257
Binary files /dev/null and b/docs/source/_static/images/tutorial_toga_pypi.png differ
diff --git a/docs/source/_static/images/tutorial_urllib3_github.png b/docs/source/_static/images/tutorial_urllib3_github.png
new file mode 100644
index 000000000..01065b776
Binary files /dev/null and b/docs/source/_static/images/tutorial_urllib3_github.png differ
diff --git a/docs/source/pages/tutorials/commit_finder.rst b/docs/source/pages/tutorials/commit_finder.rst
index f0338c7d2..23dc4e46a 100644
--- a/docs/source/pages/tutorials/commit_finder.rst
+++ b/docs/source/pages/tutorials/commit_finder.rst
@@ -164,4 +164,4 @@ Future Work
Mapping artifact to commits within repositories is a challenging endeavour. Macron's Commit Finder feature relies on repositories having and using version tags in a sensible way (a tag is considered sensible if it closely matches the version it represents). An alternative, or complimentary, approach would be to make use of the information found within provenance files, where information such as the commit hash used to create the artifact can potentially be found. Additionally, it should be noted that the Commit Finder feature was modelled on the intentions of developers (in terms of tag usage) within a large quantity of Java projects. As tag formatting is "generally" language agnostic in the same way that versioning schemes are, this feature should work well for other languages. However, there may be some improvements to be made by further testing on a large number of non-Java projects.
-.. note:: Macaron now supports extracting repository URLs and commit hashes from provenance files. This is demonstrated in a new tutorial: :doc:`npm_provenance `.
+.. note:: Macaron now supports extracting repository URLs and commit hashes from provenance files. This is demonstrated in a new tutorial: :doc:`provenance `.
diff --git a/docs/source/pages/tutorials/index.rst b/docs/source/pages/tutorials/index.rst
index d16c56f70..bc6bfdb28 100644
--- a/docs/source/pages/tutorials/index.rst
+++ b/docs/source/pages/tutorials/index.rst
@@ -20,7 +20,7 @@ For the full list of supported technologies, such as CI services, registries, an
commit_finder
detect_malicious_package
detect_vulnerable_github_actions
- npm_provenance
+ provenance
detect_malicious_java_dep
generate_verification_summary_attestation
use_verification_summary_attestation
diff --git a/docs/source/pages/tutorials/npm_provenance.rst b/docs/source/pages/tutorials/npm_provenance.rst
deleted file mode 100644
index 6a591e716..000000000
--- a/docs/source/pages/tutorials/npm_provenance.rst
+++ /dev/null
@@ -1,168 +0,0 @@
-.. Copyright (c) 2024 - 2024, Oracle and/or its affiliates. All rights reserved.
-.. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
-
---------------------------------------------------
-Provenance discovery, extraction, and verification
---------------------------------------------------
-
-This tutorial demonstrates how Macaron can automatically retrieve provenance for npm artifacts, validate the contents, and verify the authenticity. Any artifact that can be analyzed and checked for these properties can then be trusted to a greater degree than would be otherwise possible, as provenance files provide verifiable information, such as the commit and build service pipeline that has triggered the release.
-
-For npm artifacts, Macaron makes use of available features provided by `npm `_. Most importantly, npm allows developers to generate provenance files when publishing their artifacts. The `semver `_ package is chosen as an example for this tutorial.
-
-******************************
-Installation and Prerequisites
-******************************
-
-Skip this section if you already know how to install Macaron.
-
-.. toggle::
-
- Please follow the instructions :ref:`here `. In summary, you need:
-
- * Docker
- * the ``run_macaron.sh`` script to run the Macaron image.
-
- .. note:: At the moment, Docker alternatives (e.g. podman) are not supported.
-
-
- You also need to provide Macaron with a GitHub token through the ``GITHUB_TOKEN`` environment variable.
-
- To obtain a GitHub Token:
-
- * Go to ``GitHub settings`` → ``Developer Settings`` (at the bottom of the left side pane) → ``Personal Access Tokens`` → ``Fine-grained personal access tokens`` → ``Generate new token``. Give your token a name and an expiry period.
- * Under ``"Repository access"``, choosing ``"Public Repositories (read-only)"`` should be good enough in most cases.
-
- Now you should be good to run Macaron. For more details, see the documentation :ref:`here `.
-
-********
-Analysis
-********
-
-To perform an analysis on the latest version of semver (when this tutorial was written), Macaron can be run with the following command:
-
-.. code-block:: shell
-
- ./run_macaron.sh analyze -purl pkg:npm/semver@7.6.2 --verify-provenance
-
-The analysis involves Macaron downloading the contents of the target repository to the configured, or default, ``output`` folder. Results from the analysis, including checks, are stored in the database found at ``output/macaron.db`` (See :ref:`Output Files Guide `). Once the analysis is complete, Macaron will also produce a report in the form of a HTML file.
-
-.. note:: If you are unfamiliar with PackageURLs (purl), see this link: `PURLs `_.
-
-During this analysis, Macaron will retrieve two provenance files from the npm registry. One is a :term:`SLSA` v1.0 provenance, while the other is a npm specific publication provenance. The SLSA provenance provides details of the artifact it relates to, the repository it was built from, and the build action used to build it. The npm specific publication provenance exists if the SLSA provenance has been verified before publication.
-
-.. note:: Most of the details from the two provenance files can be found through the links provided on the artifacts page on the npm website. In particular: `Sigstore Rekor `_. The provenance file itself can be found at: `npm registry `_.
-
-Of course to reliably say the above does what is claimed here, proof is needed. For this we can rely on the check results produced from the analysis run. In particular, we want to know the results of three checks: ``mcn_provenance_derived_repo_1``, ``mcn_provenance_derived_commit_1``, and ``mcn_provenance_verified_1``. The first two to ensure that the commit and the repository being analyzed match those found in the provenance file, and the last check to ensure that the provenance file has been verified. For the third check to succeed, you need to enable provenance verification in Macaron by using the ``--verify-provenance`` command-line argument, as demonstrated above. This verification is disabled by default because it can be slow in some cases due to I/O-bound operations.
-
-.. _fig_semver_7.6.2_report:
-
-.. figure:: ../../_static/images/tutorial_semver_7.6.2_report.png
- :alt: HTML report for ``semver 7.6.2``, summary
- :align: center
-
-
-This image shows that the report produced by the previous analysis has pass results for the three checks of interest. This can also be viewed directly by opening the report file:
-
-.. code-block:: shell
-
- open output/reports/npm/semver/semver.html
-
-*****************************
-Run ``verify-policy`` command
-*****************************
-
-Another feature of Macaron is policy verification. This allows Macaron to report on whether an artifact meets the security requirements specified by the user. Policies are written using `Soufflé Datalog `_ , a language similar to SQL. Results collected by the ``analyze`` command can be checked via declarative queries in the created policy, which Macaron can then automatically check.
-
-For this tutorial, we can create a policy that checks whether the three checks (as above) have passed. In this way we can be sure that the requirement is satisfied without having to dive into the reports directly.
-
-.. code-block:: prolog
-
- #include "prelude.dl"
-
- Policy("has-verified-provenance", component_id, "Require a verified provenance file.") :-
- check_passed(component_id, "mcn_provenance_derived_repo_1"),
- check_passed(component_id, "mcn_provenance_derived_commit_1"),
- check_passed(component_id, "mcn_provenance_verified_1").
-
- apply_policy_to("has-verified-provenance", component_id) :-
- is_component(component_id, "pkg:npm/semver@7.6.2").
-
-After including some helper rules, the above policy is defined as requiring all three of the checks to pass through the ``check_passed(, )`` mechanism. The target is then defined by the criteria applied to the policy. In this case, the artifact with a PURL that matches the version of ``semver`` used in this tutorial: ``pkg:npm/semver@7.6.2``. With this check saved to a file, say ``verified.dl``, we can run it against Macaron's local database to confirm that the analysis we performed earlier in this tutorial did indeed pass all three checks.
-
-.. code-block:: shell
-
- ./run_macaron.sh verify-policy -d output/macaron.db -f verified.dl
-
-The result of this command should show that the policy we have written succeeds on the ``semver`` library. As follows:
-
-.. code-block:: javascript
-
- component_satisfies_policy
- ['1', 'pkg:npm/semver@7.6.2', 'has-verified-provenance']
- component_violates_policy
- failed_policies
- passed_policies
- ['has-verified-provenance']
-
-Additionally, if we had happened to run some more analyses on other versions of ``semver``, we could also apply the policy to them with only a small modification:
-
-.. code-block:: prolog
-
- apply_policy_to("has-verified-provenance", component_id) :-
- is_component(component_id, purl),
- match("pkg:npm/semver@.*", purl).
-
-With this modification, all versions of ``semver`` previously analysed will show up when the policy is run again. Like so:
-
-.. code-block:: javascript
-
- component_satisfies_policy
- ['1', 'pkg:npm/semver@7.6.2', 'has-verified-provenance']
- ['2', 'pkg:npm/semver@7.6.0', 'has-verified-provenance']
- component_violates_policy
- ['3', 'pkg:npm/semver@1.0.0', 'has-verified-provenance']
- failed_policies
- ['has-verified-provenance']
-
-Here we can see that the newer versions, 7.6.2 and 7.6.0, passed the checks, meaning they have verified provenance. The much older version, 1.0.0, did not pass the checks, which is not surprising given that it was published 13 years before this tutorial was made.
-
-However, if we wanted to acknowledge that earlier versions of the artifact do not have provenance, and accept that as part of the policy, we can do that too. For this to succeed we need to extend the policy with more complicated modifications.
-
-.. code-block:: prolog
-
- #include "prelude.dl"
-
- Policy("has-verified-provenance-or-is-excluded", component_id, "Require a verified provenance file.") :-
- check_passed(component_id, "mcn_provenance_derived_repo_1"),
- check_passed(component_id, "mcn_provenance_derived_commit_1"),
- check_passed(component_id, "mcn_provenance_verified_1"),
- !exception(component_id).
-
- Policy("has-verified-provenance-or-is-excluded", component_id, "Make exception for older artifacts.") :-
- exception(component_id).
-
- .decl exception(component_id: number)
- exception(component_id) :-
- is_component(component_id, purl),
- match("pkg:npm/semver@[0-6][.].*", purl).
-
- apply_policy_to("has-verified-provenance-or-is-excluded", component_id) :-
- is_component(component_id, purl),
- match("pkg:npm/semver@.*", purl).
-
-In this final policy, we declare (``.decl``) a new rule called ``exception`` that utilises more regular expression in its ``match`` constraint to exclude artifacts that were published before provenance generation was supported. For this tutorial, we have set the exception to accept any versions of ``semver`` that starts with a number between 0 and 6 using the regular expression range component of ``[0-6]``. Then we modify the previous ``Policy`` so that it expects the same three checks to pass, but only if the exception rule is not applicable -- the exclamation mark before the exception negates the requirement. Finally, we add a new ``Policy`` that applies only to those artifacts that match the exception rule.
-
-When run, this updated policy produces the following:
-
-.. code-block:: javascript
-
- component_satisfies_policy
- ['1', 'pkg:npm/semver@7.6.2', 'has-verified-provenance-or-is-excluded']
- ['2', 'pkg:npm/semver@7.6.0', 'has-verified-provenance-or-is-excluded']
- ['3', 'pkg:npm/semver@1.0.0', 'has-verified-provenance-or-is-excluded']
- component_violates_policy
- failed_policies
- passed_policies
- ['has-verified-provenance-or-is-excluded']
-
-Now all versions pass the policy check.
diff --git a/docs/source/pages/tutorials/provenance.rst b/docs/source/pages/tutorials/provenance.rst
new file mode 100644
index 000000000..dc0642d0d
--- /dev/null
+++ b/docs/source/pages/tutorials/provenance.rst
@@ -0,0 +1,307 @@
+.. Copyright (c) 2024 - 2024, Oracle and/or its affiliates. All rights reserved.
+.. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
+
+--------------------------------------------------
+Provenance discovery, extraction, and verification
+--------------------------------------------------
+
+This tutorial demonstrates how Macaron can automatically retrieve provenance for artifacts, validate the contents, and verify the authenticity. Any artifact that can be analyzed and checked for these properties can then be trusted to a greater degree than would be otherwise possible, as provenance files provide verifiable information, such as the commit and build service pipeline that has triggered the release.
+
+Currently, Macaron supports discovery of attestation for:
+
+ * npm artifacts using features provided by `npm `_
+ * PyPI artifacts using features provided by `Open Source Insights `_
+ * Artifacts that have published attestations to, or released as assets to `GitHub `_
+
+This tutorial uses example packages to demonstrate these discovery methods: The `semver `_ npm package, the `toga `_ PyPI package, and the `urllib3 `_ PyPI package.
+
+.. note:: Macaron has a size limit imposed for downloads. For more information on this see Section :ref:`Download Limit`.
+
+.. contents:: :local:
+
+******************************
+Installation and Prerequisites
+******************************
+
+Skip this section if you already know how to install Macaron.
+
+.. toggle::
+
+ Please follow the instructions :ref:`here `. In summary, you need:
+
+ * Docker
+ * the ``run_macaron.sh`` script to run the Macaron image.
+
+ .. note:: At the moment, Docker alternatives (e.g. podman) are not supported.
+
+
+ You also need to provide Macaron with a GitHub token through the ``GITHUB_TOKEN`` environment variable.
+
+ To obtain a GitHub Token:
+
+ * Go to ``GitHub settings`` → ``Developer Settings`` (at the bottom of the left side pane) → ``Personal Access Tokens`` → ``Fine-grained personal access tokens`` → ``Generate new token``. Give your token a name and an expiry period.
+ * Under ``"Repository access"``, choosing ``"Public Repositories (read-only)"`` should be good enough in most cases.
+
+ Now you should be good to run Macaron. For more details, see the documentation :ref:`here `.
+
+The analyses in this tutorial involve downloading the contents of a target repository to the configured, or default, ``output`` folder. Results from the analyses, including checks, are stored in the database found at ``output/macaron.db`` (See :ref:`Output Files Guide `). Once the analysis is complete, Macaron will also produce a report in the form of a HTML file.
+
+.. note:: If you are unfamiliar with PackageURLs (purl), see this link: `PURLs `_.
+
+**************************************
+Attestation Discovery for semver (npm)
+**************************************
+
+To analyze a specific version of the semver package, Macaron can be run with the following command:
+
+.. code-block:: shell
+
+ ./run_macaron.sh analyze -purl pkg:npm/semver@7.7.2
+
+During this analysis, Macaron will retrieve two provenance files from the npm registry. One is a :term:`SLSA` v1.0 provenance, while the other is a npm specific publication provenance. The SLSA provenance provides details of the artifact it relates to, the repository it was built from, and the build action used to build it. The npm specific publication provenance exists if the SLSA provenance has been verified before publication.
+
+.. note:: Most of the details from the two provenance files can be found through the links provided on the artifacts page on the npm website. In particular: `Sigstore Rekor `_. The provenance file itself can be found at: `npm registry `_.
+
+Of course to reliably say the above does what is claimed here, proof is needed. For this we can rely on the check results produced from the analysis run. In particular, we want to know the results of three checks: ``mcn_provenance_derived_repo_1``, ``mcn_provenance_derived_commit_1``, and ``mcn_provenance_verified_1``. The first two to ensure that the commit and the repository being analyzed match those found in the provenance file, and the last check to ensure that the provenance file has been verified.
+
+.. _fig_semver_7.7.2_report:
+
+.. figure:: ../../_static/images/tutorial_semver_7.7.2_report.png
+ :alt: HTML report for ``semver 7.7.2``, summary
+ :align: center
+
+This image shows that the report produced by the previous analysis has pass results for the three checks of interest. This can also be viewed directly by opening the report file:
+
+.. code-block:: shell
+
+ open output/reports/npm/semver/semver.html
+
+The check results of this example (and others) can be automatically verified. A demonstration of verification for this case is provided later in this tutorial.
+
+*************************************
+Attestation Discovery for toga (PyPI)
+*************************************
+
+To analyze a specific version of the toga package, Macaron can be run with the following command:
+
+.. code-block:: shell
+
+ ./run_macaron.sh analyze -purl pkg:pypi/toga@0.5.1
+
+During this analysis, Macaron will retrieve information from two sources to attempt to discover a PyPI attestation file. Firstly, Open Source Insights will be queried for an attestation URL that can be used to access the desired information. If found, this URL can be followed to its source on the PyPI package registry, which is where the actual attestation file is hosted.
+
+As an example of these internal steps, the attestation information can be seen via the `Open Source Insights API `_. From this information the PyPI attestation URL is extracted, revealing its location: `https://pypi.org/integrity/toga/0.5.1/toga-0.5.1-py3-none-any.whl/provenance `_.
+
+.. _fig_toga_osi_api:
+
+.. figure:: ../../_static/images/tutorial_osi_toga.png
+ :alt: Open Source Insight's API result for toga package
+ :align: center
+
+This image shows the attestation URL found in the Open Source Insight API result.
+
+By using the Open Source Insights API, Macaron can check that the discovered provenance is verified, as well as being a valid match of the user provided PURL. For this we can rely on the check results produced from the analysis run. In particular, we want to know the results of three checks: ``mcn_provenance_derived_repo_1``, ``mcn_provenance_derived_commit_1``, and ``mcn_provenance_verified_1``. The first two to ensure that the commit and the repository being analyzed match those found in the provenance file, and the last check to ensure that the provenance file has been verified.
+
+.. _fig_toga_pypi_checks:
+
+.. figure:: ../../_static/images/tutorial_toga_pypi.png
+ :alt: HTML report for ``toga 0.5.1``, summary
+ :align: center
+
+All three checks show they have passed, meaning Macaron has discovered the correct provenance for the user provided PURL, and determined that it is verified. To access the full report use the following:
+
+.. code-block:: shell
+
+ open output/reports/pypi/toga/toga.html
+
+***************************************
+Attestation Discovery for toga (GitHub)
+***************************************
+
+The toga library is interesting in that it has GitHub attestation or PyPI attestation depending on which particular version of it is analyzed. To discover a GitHub attestation, we can analyze version 0.4.8:
+
+.. code-block:: shell
+
+ ./run_macaron.sh analyze -purl pkg:pypi/toga@0.4.8
+
+During this analysis, Macaron will attempt to discover a GitHub attestation by computing the hash of the relevant artifact. This is a requirement of GitHub's API to view artifact attestation, see the `GitHub Attestation API `_. The hash is computed by downloading the artifact and analysing it with the SHA256 algorithm. With the hash, the GitHub API can be called to find the related attestation.
+
+In this particular case, the SHA256 hash of the toga 0.4.8 artifact is 0814a72abb0a9a5f22c32cc9479c55041ec30cdf4b12d73a0017aee58f9a1f00. A GitHub attestation can be found for this artifact `here `_.
+
+Attestation discovered through GitHub is signed with verified signatures. As long as the repository URL and commit digest associated with the user provided PURL match what is found within the attestation, Macaron can report it as verified. Therefore, we can examine the results of three checks: ``mcn_provenance_derived_repo_1``, ``mcn_provenance_derived_commit_1``, and ``mcn_provenance_verified_1``.
+
+.. _fig_toga_github_checks:
+
+.. figure:: ../../_static/images/tutorial_toga_github.png
+ :alt: HTML report for ``toga 0.4.8``, summary
+ :align: center
+
+This image shows that all three checks have passed, confirming that the repository URL and commit digest from the provenance match those associated with the user provided PURL. To access the full report use the following command:
+
+.. code-block:: shell
+
+ open output/reports/pypi/toga/toga.html
+
+.. note:: For Maven packages, Macaron can make use of the local artifact cache before downloading occurs. Macaron will check for the existence of the home M2 cache at ``$HOME/.m2``. A different location for this cache can be specified using Macaron's ``--local-maven-repo `` command line argument.
+
+
+******************************************
+Attestation discovery for urllib3 (GitHub)
+******************************************
+
+To demonstrate GitHub attestation being found from released assets on the platform, we use the urllib3 library.
+
+.. code-block:: shell
+
+ ./run_macaron.sh analyze -purl pkg:pypi/urllib3@2.0.0a1
+
+As part of this analysis, Macaron ends up downloading three different asset files: The `attestation asset `_, the artifact's Python wheel file, and the source distribution tarball file. By examining the attestation, Macaron can verify the two other files. This analysis can then report that provenance exists, and is verified.
+
+If we look at the results of three of Macaron's checks we can validate this result: ``mcn_provenance_derived_repo_1``, ``mcn_provenance_derived_commit_1``, and ``mcn_provenance_verified_1``.
+
+.. _fig_urllib3_github_checks:
+
+.. figure:: ../../_static/images/tutorial_urllib3_github.png
+ :alt: HTML report for ``urllib3 2.0.0a1``, summary
+ :align: center
+
+This image shows that all three checks have passed, confirming that the repository URL and commit digest from the provenance match those associated with the user provided PURL, and that the provenance is verified. To access the full report use the following:
+
+.. code-block:: shell
+
+ open output/reports/pypi/urllib3/urllib3.html
+
+***************************
+Supported Attestation Types
+***************************
+
+When attestation is provided to Macaron as input, it must be of one of the supported types in order to be accepted. Support is defined by the ``predicateType`` and ``buildType`` properties within an attestation.
+
+Predicate Types
+~~~~~~~~~~~~~~~
+
+ * SLSA v0.1
+ * SLSA v0.2
+ * SLSA v1.0
+ * Witness v0.1
+
+Build Types
+~~~~~~~~~~~
+
+.. csv-table::
+ :header: "Name", "Build Type"
+
+ "SLSA GitHub Generic v0.1", "https://github.com/slsa-framework/slsa-github-generator/generic@v1"
+ "SLSA GitHub Actions v1.0", "https://slsa-framework.github.io/github-actions-buildtypes/workflow/v1"
+ "SLSA npm CLI v2.0", "https://github.com/npm/cli/gha/v2"
+ "SLSA Google Cloud Build v1.0", "https://slsa-framework.github.io/gcb-buildtypes/triggered-build/v1"
+ "SLSA Oracle Cloud Infrastructure v1.0", "https://github.com/oracle/macaron/tree/main/src/macaron/resources/provenance-buildtypes/oci/v1"
+ "Witness GitLab v0.1", "https://witness.testifysec.com/attestation-collection/v0.1"
+
+.. _Download Limit:
+
+*******************
+File Download Limit
+*******************
+
+To prevent analyses from taking too long, Macaron imposes a configurable size limit for downloads. This includes files being downloaded for provenance verification. In cases where the limit is being reached and you wish to continue analysis regardless, you can specify a new download size in the default configuration file. This value can be found under the ``slsa.verifier`` section, listed as ``max_download_size`` with a default limit of 10 megabytes. See :ref:`How to change the default configuration ` for more details on configuring values like these.
+
+**************************************
+Run ``verify-policy`` command (semver)
+**************************************
+
+Another feature of Macaron is policy verification, which allows it to assess whether an artifact meets user-defined security requirements. This feature can also be integrated into CI/CD pipelines to automatically check policy compliance by returning appropriate error codes based on pass or fail status. Policies are written using `Soufflé Datalog `_ , a language similar to SQL. Results collected by the ``analyze`` command can be checked via declarative queries in the created policy, which Macaron can then automatically check.
+
+For this tutorial, we can create a policy that checks whether the three checks relating to the semver npm example above have passed. E.g. ``mcn_provenance_derived_repo_1``, ``mcn_provenance_derived_commit_1``, and ``mcn_provenance_verified_1``. In this way we can be sure that the requirement is satisfied without having to dive into the reports directly.
+
+.. code-block:: prolog
+
+ #include "prelude.dl"
+
+ Policy("has-verified-provenance", component_id, "Require a verified provenance file.") :-
+ check_passed(component_id, "mcn_provenance_derived_repo_1"),
+ check_passed(component_id, "mcn_provenance_derived_commit_1"),
+ check_passed(component_id, "mcn_provenance_verified_1").
+
+ apply_policy_to("has-verified-provenance", component_id) :-
+ is_component(component_id, "pkg:npm/semver@7.7.2").
+
+After including some helper rules, the above policy is defined as requiring all three of the checks to pass through the ``check_passed(, )`` mechanism. The target is then defined by the criteria applied to the policy. In this case, the artifact with a PURL that matches the version of ``semver`` used in this tutorial: ``pkg:npm/semver@7.7.2``. With this check saved to a file, say ``verified.dl``, we can run it against Macaron's local database to confirm that the analysis we performed earlier in this tutorial did indeed pass all three checks.
+
+.. code-block:: shell
+
+ ./run_macaron.sh verify-policy -d output/macaron.db -f verified.dl
+
+The result of this command should show that the policy we have written succeeds on the ``semver`` library. As follows:
+
+.. code-block:: javascript
+
+ component_satisfies_policy
+ ['1', 'pkg:npm/semver@7.7.2', 'has-verified-provenance']
+ component_violates_policy
+ failed_policies
+ passed_policies
+ ['has-verified-provenance']
+
+Additionally, if we had happened to run some more analyses on other versions of ``semver``, we could also apply the policy to them with only a small modification:
+
+.. code-block:: prolog
+
+ apply_policy_to("has-verified-provenance", component_id) :-
+ is_component(component_id, purl),
+ match("pkg:npm/semver@.*", purl).
+
+With this modification, all versions of ``semver`` previously analysed will show up when the policy is run again. Like so:
+
+.. code-block:: javascript
+
+ component_satisfies_policy
+ ['1', 'pkg:npm/semver@7.7.2', 'has-verified-provenance']
+ ['2', 'pkg:npm/semver@7.6.0', 'has-verified-provenance']
+ component_violates_policy
+ ['3', 'pkg:npm/semver@1.0.0', 'has-verified-provenance']
+ failed_policies
+ ['has-verified-provenance']
+
+Here we can see that the newer versions, 7.7.2 and 7.6.0, passed the checks, meaning they have verified provenance. The much older version, 1.0.0, did not pass the checks, which is not surprising given that it was published 13 years before this tutorial was made.
+
+However, if we wanted to acknowledge that earlier versions of the artifact do not have provenance, and accept that as part of the policy, we can do that too. For this to succeed we need to extend the policy with more complicated modifications.
+
+.. code-block:: prolog
+
+ #include "prelude.dl"
+
+ Policy("has-verified-provenance-or-is-excluded", component_id, "Require a verified provenance file.") :-
+ check_passed(component_id, "mcn_provenance_derived_repo_1"),
+ check_passed(component_id, "mcn_provenance_derived_commit_1"),
+ check_passed(component_id, "mcn_provenance_verified_1"),
+ !exception(component_id).
+
+ Policy("has-verified-provenance-or-is-excluded", component_id, "Make exception for older artifacts.") :-
+ exception(component_id).
+
+ .decl exception(component_id: number)
+ exception(component_id) :-
+ is_component(component_id, purl),
+ match("pkg:npm/semver@[0-6][.].*", purl).
+
+ apply_policy_to("has-verified-provenance-or-is-excluded", component_id) :-
+ is_component(component_id, purl),
+ match("pkg:npm/semver@.*", purl).
+
+In this final policy, we declare (``.decl``) a new rule called ``exception`` that utilises more regular expression in its ``match`` constraint to exclude artifacts that were published before provenance generation was supported. For this tutorial, we have set the exception to accept any versions of ``semver`` that starts with a number between 0 and 6 using the regular expression range component of ``[0-6]``. Then we modify the previous ``Policy`` so that it expects the same three checks to pass, but only if the exception rule is not applicable -- the exclamation mark before the exception negates the requirement. Finally, we add a new ``Policy`` that applies only to those artifacts that match the exception rule.
+
+When run, this updated policy produces the following:
+
+.. code-block:: javascript
+
+ component_satisfies_policy
+ ['1', 'pkg:npm/semver@7.7.2', 'has-verified-provenance-or-is-excluded']
+ ['2', 'pkg:npm/semver@7.6.0', 'has-verified-provenance-or-is-excluded']
+ ['3', 'pkg:npm/semver@1.0.0', 'has-verified-provenance-or-is-excluded']
+ component_violates_policy
+ failed_policies
+ passed_policies
+ ['has-verified-provenance-or-is-excluded']
+
+Now all versions pass the policy check.
diff --git a/src/macaron/__main__.py b/src/macaron/__main__.py
index 9b746806e..afaabdbe5 100644
--- a/src/macaron/__main__.py
+++ b/src/macaron/__main__.py
@@ -172,7 +172,6 @@ def analyze_slsa_levels_single(analyzer_single_args: argparse.Namespace) -> None
analyzer_single_args.sbom_path,
deps_depth,
provenance_payload=prov_payload,
- verify_provenance=analyzer_single_args.verify_provenance,
force_analyze_source=analyzer_single_args.force_analyze_source,
)
sys.exit(status_code)
@@ -483,13 +482,6 @@ def main(argv: list[str] | None = None) -> None:
help=("Forces PyPI sourcecode analysis to run regardless of other heuristic results."),
)
- single_analyze_parser.add_argument(
- "--verify-provenance",
- required=False,
- action="store_true",
- help=("Allow the analysis to attempt to verify provenance files as part of its normal operations."),
- )
-
# Dump the default values.
sub_parser.add_parser(name="dump-defaults", description="Dumps the defaults.ini file to the output directory.")
diff --git a/src/macaron/config/defaults.ini b/src/macaron/config/defaults.ini
index 0c31aaca7..caaf1b120 100644
--- a/src/macaron/config/defaults.ini
+++ b/src/macaron/config/defaults.ini
@@ -482,7 +482,7 @@ provenance_extensions =
intoto.jsonl.url
intoto.jsonl.gz.url
# This is the acceptable maximum size (in bytes) to download an asset.
-max_download_size = 70000000
+max_download_size = 10000000
# This is the timeout (in seconds) to run the SLSA verifier.
timeout = 120
# The allowed hostnames for URL file links for provenance download
diff --git a/src/macaron/malware_analyzer/pypi_heuristics/sourcecode/suspicious_setup.py b/src/macaron/malware_analyzer/pypi_heuristics/sourcecode/suspicious_setup.py
index ebde2a21f..537879c21 100644
--- a/src/macaron/malware_analyzer/pypi_heuristics/sourcecode/suspicious_setup.py
+++ b/src/macaron/malware_analyzer/pypi_heuristics/sourcecode/suspicious_setup.py
@@ -11,13 +11,12 @@
import tempfile
import zipfile
-import requests
-from requests import RequestException
-
+from macaron.config.defaults import defaults
from macaron.json_tools import JsonType
from macaron.malware_analyzer.pypi_heuristics.base_analyzer import BaseHeuristicAnalyzer
from macaron.malware_analyzer.pypi_heuristics.heuristics import HeuristicResult, Heuristics
from macaron.slsa_analyzer.package_registry.pypi_registry import PyPIPackageJsonAsset
+from macaron.util import download_file_with_size_limit
logger: logging.Logger = logging.getLogger(__name__)
@@ -55,26 +54,11 @@ def _get_setup_source_code(self, pypi_package_json: PyPIPackageJsonAsset) -> str
# Create a temporary directory to store the downloaded source.
with tempfile.TemporaryDirectory() as temp_dir:
- try:
- response = requests.get(sourcecode_url, stream=True, timeout=40)
- response.raise_for_status()
- except requests.exceptions.HTTPError as http_err:
- logger.debug("HTTP error occurred when trying to download source: %s", http_err)
- return None
-
- if response.status_code != 200:
- return None
-
source_file = os.path.join(temp_dir, file_name)
- with open(source_file, "wb") as file:
- try:
- for chunk in response.iter_content():
- file.write(chunk)
- except RequestException as error:
- # Something went wrong with the request, abort.
- logger.debug("Error while streaming source file: %s", error)
- response.close()
- return None
+ timeout = defaults.getint("downloads", "timeout", fallback=120)
+ size_limit = defaults.getint("slsa.verifier", "max_download_size", fallback=10000000)
+ if not download_file_with_size_limit(sourcecode_url, {}, source_file, timeout, size_limit):
+ return None
target_file = "setup.py"
file_dir = file_name.removesuffix(".tar.gz").removesuffix(".zip")
diff --git a/src/macaron/provenance/provenance_extractor.py b/src/macaron/provenance/provenance_extractor.py
index f2c54c607..4366ab299 100644
--- a/src/macaron/provenance/provenance_extractor.py
+++ b/src/macaron/provenance/provenance_extractor.py
@@ -159,7 +159,7 @@ def _extract_from_slsa_v1(payload: InTotoV1Payload) -> tuple[str | None, str | N
repo = json_extract(predicate, ["buildDefinition", "externalParameters", "sourceToBuild", "repository"], str)
if not repo:
repo = json_extract(predicate, ["buildDefinition", "externalParameters", "configSource", "repository"], str)
- elif isinstance(build_def, SLSAGithubActionsBuildDefinitionV1):
+ elif isinstance(build_def, (SLSAGithubActionsBuildDefinitionV1, GitHubActionsBuildDefinition)):
repo = json_extract(predicate, ["buildDefinition", "externalParameters", "workflow", "repository"], str)
elif isinstance(build_def, SLSAOCIBuildDefinitionV1):
repo = json_extract(predicate, ["buildDefinition", "externalParameters", "source"], str)
@@ -504,7 +504,7 @@ def get_build_invocation(self, statement: InTotoV01Statement | InTotoV1Statement
class SLSANPMCLIBuildDefinitionV2(ProvenanceBuildDefinition):
- """Class representing the SLSA NPM CLI Build Definition (v12).
+ """Class representing the SLSA NPM CLI Build Definition (v2).
This class implements the abstract methods from the `ProvenanceBuildDefinition`
to extract build invocation details specific to the GitHub Actions build type.
@@ -684,6 +684,43 @@ def get_build_invocation(self, statement: InTotoV01Statement | InTotoV1Statement
return gha_workflow, invocation_url
+class GitHubActionsBuildDefinition(ProvenanceBuildDefinition):
+ """Class representing the GitHub Actions Build Definition (v1).
+
+ This class implements the abstract methods defined in `ProvenanceBuildDefinition`
+ to extract build invocation details specific to the GitHub actions attestation build type.
+ """
+
+ #: Determines the expected ``buildType`` field in the provenance predicate.
+ expected_build_type = "https://actions.github.io/buildtypes/workflow/v1"
+
+ def get_build_invocation(self, statement: InTotoV01Statement | InTotoV1Statement) -> tuple[str | None, str | None]:
+ """Retrieve the build invocation information from the given statement.
+
+ Parameters
+ ----------
+ statement : InTotoV1Statement | InTotoV01Statement
+ The provenance statement from which to extract the build invocation
+ details. This statement contains the metadata about the build process
+ and its associated artifacts.
+
+ Returns
+ -------
+ tuple[str | None, str | None]
+ A tuple containing two elements:
+ - The first element is the build invocation entry point (e.g., workflow name), or None if not found.
+ - The second element is the invocation URL or identifier (e.g., job URL), or None if not found.
+ """
+ if statement["predicate"] is None:
+ return None, None
+
+ gha_workflow = json_extract(
+ statement["predicate"], ["buildDefinition", "externalParameters", "workflow", "path"], str
+ )
+ invocation_url = json_extract(statement["predicate"], ["runDetails", "metadata", "invocationId"], str)
+ return gha_workflow, invocation_url
+
+
class ProvenancePredicate:
"""Class providing utility methods for handling provenance predicates.
@@ -750,6 +787,7 @@ def find_build_def(statement: InTotoV01Statement | InTotoV1Statement) -> Provena
SLSAOCIBuildDefinitionV1(),
WitnessGitLabBuildDefinitionV01(),
PyPICertificateDefinition(),
+ GitHubActionsBuildDefinition(),
]
for build_def in build_defs:
diff --git a/src/macaron/provenance/provenance_finder.py b/src/macaron/provenance/provenance_finder.py
index 715204a16..8c552a3bd 100644
--- a/src/macaron/provenance/provenance_finder.py
+++ b/src/macaron/provenance/provenance_finder.py
@@ -12,6 +12,7 @@
from packageurl import PackageURL
from pydriller import Git
+from macaron.artifact.local_artifact import get_local_artifact_hash
from macaron.config.defaults import defaults
from macaron.repo_finder.commit_finder import AbstractPurlType, determine_abstract_purl_type
from macaron.repo_finder.repo_finder_deps_dev import DepsDevRepoFinder
@@ -19,14 +20,22 @@
from macaron.slsa_analyzer.checks.provenance_available_check import ProvenanceAvailableException
from macaron.slsa_analyzer.ci_service import GitHubActions
from macaron.slsa_analyzer.ci_service.base_ci_service import NoneCIService
-from macaron.slsa_analyzer.package_registry import PACKAGE_REGISTRIES, JFrogMavenRegistry, NPMRegistry
+from macaron.slsa_analyzer.package_registry import (
+ PACKAGE_REGISTRIES,
+ JFrogMavenRegistry,
+ MavenCentralRegistry,
+ NPMRegistry,
+ PyPIRegistry,
+)
from macaron.slsa_analyzer.package_registry.npm_registry import NPMAttestationAsset
+from macaron.slsa_analyzer.package_registry.pypi_registry import find_or_create_pypi_asset
from macaron.slsa_analyzer.provenance.intoto import InTotoPayload
from macaron.slsa_analyzer.provenance.intoto.errors import LoadIntotoAttestationError
from macaron.slsa_analyzer.provenance.loader import load_provenance_payload
from macaron.slsa_analyzer.provenance.slsa import SLSAProvenanceData
from macaron.slsa_analyzer.provenance.witness import is_witness_provenance_payload, load_witness_verifier_config
from macaron.slsa_analyzer.specs.ci_spec import CIInfo
+from macaron.slsa_analyzer.specs.package_registry_spec import PackageRegistryInfo
logger: logging.Logger = logging.getLogger(__name__)
@@ -490,3 +499,96 @@ def download_provenances_from_ci_service(ci_info: CIInfo, download_path: str) ->
except OSError as error:
logger.error("Error while storing provenance in the temporary directory: %s", error)
+
+
+def get_artifact_hash(
+ purl: PackageURL,
+ local_artifact_dirs: list[str] | None,
+ package_registries_info: list[PackageRegistryInfo],
+) -> str | None:
+ """Get the hash of the artifact found from the passed PURL using local or remote files.
+
+ Provided local caches will be searched first. Artifacts will be downloaded if nothing is found within local
+ caches, or if no appropriate cache is provided for the target language.
+ Downloaded artifacts will be added to the passed package registry to prevent downloading them again.
+
+ Parameters
+ ----------
+ purl: PackageURL
+ The PURL of the artifact.
+ local_artifact_dirs: list[str] | None
+ The list of directories that may contain the artifact file.
+ package_registries_info: list[PackageRegistryInfo]
+ The list of package registry information.
+
+ Returns
+ -------
+ str | None
+ The hash of the artifact, or None if no artifact can be found locally or remotely.
+ """
+ if local_artifact_dirs:
+ # Try to get the hash from a local file.
+ artifact_hash = get_local_artifact_hash(purl, local_artifact_dirs)
+
+ if artifact_hash:
+ return artifact_hash
+
+ # Download the artifact.
+ if purl.type == "maven":
+ maven_registry = next(
+ (
+ package_registry
+ for package_registry in PACKAGE_REGISTRIES
+ if isinstance(package_registry, MavenCentralRegistry)
+ ),
+ None,
+ )
+ if not maven_registry:
+ return None
+
+ return maven_registry.get_artifact_hash(purl)
+
+ if purl.type == "pypi":
+ pypi_registry = next(
+ (package_registry for package_registry in PACKAGE_REGISTRIES if isinstance(package_registry, PyPIRegistry)),
+ None,
+ )
+ if not pypi_registry:
+ logger.debug("Missing registry for PyPI")
+ return None
+
+ registry_info = next(
+ (
+ info
+ for info in package_registries_info
+ if info.package_registry == pypi_registry and info.build_tool_name in {"pip", "poetry"}
+ ),
+ None,
+ )
+ if not registry_info:
+ logger.debug("Missing registry information for PyPI")
+ return None
+
+ if not purl.version:
+ return None
+
+ pypi_asset = find_or_create_pypi_asset(purl.name, purl.version, registry_info)
+ if not pypi_asset:
+ return None
+
+ pypi_asset.has_repository = True
+ if not pypi_asset.download(""):
+ return None
+
+ artifact_hash = pypi_asset.get_sha256()
+ if artifact_hash:
+ return artifact_hash
+
+ source_url = pypi_asset.get_sourcecode_url("bdist_wheel")
+ if not source_url:
+ return None
+
+ return pypi_registry.get_artifact_hash(source_url)
+
+ logger.debug("Purl type '%s' not yet supported for artifact hashing.", purl.type)
+ return None
diff --git a/src/macaron/provenance/provenance_verifier.py b/src/macaron/provenance/provenance_verifier.py
index f366fe127..87ac81595 100644
--- a/src/macaron/provenance/provenance_verifier.py
+++ b/src/macaron/provenance/provenance_verifier.py
@@ -250,7 +250,6 @@ def _find_subject_asset(
item_path = os.path.join(download_path, item["name"])
# Make sure to download an archive just once.
if not Path(item_path).is_file():
- # TODO: check that it's not too large.
if not ci_service.api_client.download_asset(item["url"], item_path):
logger.info("Could not download artifact %s. Skip verifying...", os.path.basename(item_path))
break
diff --git a/src/macaron/slsa_analyzer/analyzer.py b/src/macaron/slsa_analyzer/analyzer.py
index 8c0faaad8..98d4b9bfa 100644
--- a/src/macaron/slsa_analyzer/analyzer.py
+++ b/src/macaron/slsa_analyzer/analyzer.py
@@ -22,7 +22,6 @@
from macaron import __version__
from macaron.artifact.local_artifact import (
get_local_artifact_dirs,
- get_local_artifact_hash,
)
from macaron.config.global_config import global_config
from macaron.config.target_config import Configuration
@@ -44,7 +43,6 @@
ProvenanceError,
PURLNotFoundError,
)
-from macaron.json_tools import json_extract
from macaron.output_reporter.reporter import FileReporter
from macaron.output_reporter.results import Record, Report, SCMStatus
from macaron.provenance import provenance_verifier
@@ -54,7 +52,7 @@
extract_predicate_version,
extract_repo_and_commit_from_provenance,
)
-from macaron.provenance.provenance_finder import ProvenanceFinder, find_provenance_from_ci
+from macaron.provenance.provenance_finder import ProvenanceFinder, find_provenance_from_ci, get_artifact_hash
from macaron.provenance.provenance_verifier import determine_provenance_slsa_level, verify_ci_provenance
from macaron.repo_finder import repo_finder
from macaron.repo_finder.repo_finder import prepare_repo
@@ -73,16 +71,12 @@
from macaron.slsa_analyzer.git_service import GIT_SERVICES, BaseGitService, GitHub
from macaron.slsa_analyzer.git_service.base_git_service import NoneGitService
from macaron.slsa_analyzer.git_url import GIT_REPOS_DIR
-from macaron.slsa_analyzer.package_registry import PACKAGE_REGISTRIES, MavenCentralRegistry, PyPIRegistry
-from macaron.slsa_analyzer.package_registry.pypi_registry import find_or_create_pypi_asset
+from macaron.slsa_analyzer.package_registry import PACKAGE_REGISTRIES
from macaron.slsa_analyzer.provenance.expectations.expectation_registry import ExpectationRegistry
from macaron.slsa_analyzer.provenance.intoto import (
InTotoPayload,
InTotoV01Payload,
- ValidateInTotoPayloadError,
- validate_intoto_payload,
)
-from macaron.slsa_analyzer.provenance.loader import decode_provenance
from macaron.slsa_analyzer.provenance.slsa import SLSAProvenanceData
from macaron.slsa_analyzer.registry import registry
from macaron.slsa_analyzer.specs.ci_spec import CIInfo
@@ -147,7 +141,6 @@ def run(
sbom_path: str = "",
deps_depth: int = 0,
provenance_payload: InTotoPayload | None = None,
- verify_provenance: bool = False,
force_analyze_source: bool = False,
) -> int:
"""Run the analysis and write results to the output path.
@@ -165,8 +158,6 @@ def run(
The depth of dependency resolution. Default: 0.
provenance_payload : InToToPayload | None
The provenance intoto payload for the main software component.
- verify_provenance: bool
- Enable provenance verification if True.
force_analyze_source : bool
When true, enforces running source code analysis regardless of other heuristic results. Defaults to False.
@@ -201,7 +192,6 @@ def run(
main_config,
analysis,
provenance_payload=provenance_payload,
- verify_provenance=verify_provenance,
force_analyze_source=force_analyze_source,
)
@@ -320,7 +310,6 @@ def run_single(
analysis: Analysis,
existing_records: dict[str, Record] | None = None,
provenance_payload: InTotoPayload | None = None,
- verify_provenance: bool = False,
force_analyze_source: bool = False,
) -> Record:
"""Run the checks for a single repository target.
@@ -338,8 +327,6 @@ def run_single(
The mapping of existing records that the analysis has run successfully.
provenance_payload : InToToPayload | None
The provenance intoto payload for the analyzed software component.
- verify_provenance: bool
- Enable provenance verification if True.
force_analyze_source : bool
When true, enforces running source code analysis regardless of other heuristic results. Defaults to False.
@@ -378,7 +365,7 @@ def run_single(
provenance_payload = provenance_asset.payload
if provenance_payload.verified:
provenance_is_verified = True
- if verify_provenance:
+ else:
provenance_is_verified = provenance_verifier.verify_provenance(parsed_purl, provenances)
# Try to extract the repository URL and commit digest from the Provenance, if it exists.
@@ -510,9 +497,20 @@ def run_single(
# Try to find an attestation from GitHub, if applicable.
if parsed_purl and not provenance_payload and analysis_target.repo_path and isinstance(git_service, GitHub):
# Try to discover GitHub attestation for the target software component.
- artifact_hash = self.get_artifact_hash(parsed_purl, local_artifact_dirs, package_registries_info)
+ artifact_hash = get_artifact_hash(parsed_purl, local_artifact_dirs, package_registries_info)
if artifact_hash:
- provenance_payload = self.get_github_attestation_payload(analyze_ctx, git_service, artifact_hash)
+ provenance_payload = git_service.get_attestation_payload(
+ analyze_ctx.component.repository.full_name, artifact_hash
+ )
+ if provenance_payload:
+ try:
+ provenance_repo_url, provenance_commit_digest = extract_repo_and_commit_from_provenance(
+ provenance_payload
+ )
+ # Attestations found from GitHub are signed and verified.
+ provenance_is_verified = True
+ except ProvenanceError as error:
+ logger.debug("Failed to extract from provenance: %s", error)
if parsed_purl is not None:
self._verify_repository_link(parsed_purl, analyze_ctx)
@@ -546,14 +544,13 @@ def run_single(
)
# Also try to verify CI provenance contents.
- if verify_provenance:
- verified = []
- for ci_info in analyze_ctx.dynamic_data["ci_services"]:
- verified.append(verify_ci_provenance(analyze_ctx, ci_info, temp_dir))
- if not verified:
- break
- if verified and all(verified):
- provenance_l3_verified = True
+ verified = []
+ for ci_info in analyze_ctx.dynamic_data["ci_services"]:
+ verified.append(verify_ci_provenance(analyze_ctx, ci_info, temp_dir))
+ if not verified[-1]:
+ break
+ if verified and all(verified):
+ provenance_l3_verified = True
if provenance_payload:
analyze_ctx.dynamic_data["is_inferred_prov"] = False
@@ -969,144 +966,6 @@ def create_analyze_ctx(self, component: Component) -> AnalyzeContext:
return analyze_ctx
- def get_artifact_hash(
- self,
- purl: PackageURL,
- local_artifact_dirs: list[str] | None,
- package_registries_info: list[PackageRegistryInfo],
- ) -> str | None:
- """Get the hash of the artifact found from the passed PURL using local or remote files.
-
- Provided local caches will be searched first. Artifacts will be downloaded if nothing is found within local
- caches, or if no appropriate cache is provided for the target language.
- Downloaded artifacts will be added to the passed package registry to prevent downloading them again.
-
- Parameters
- ----------
- purl: PackageURL
- The PURL of the artifact.
- local_artifact_dirs: list[str] | None
- The list of directories that may contain the artifact file.
- package_registries_info: list[PackageRegistryInfo]
- The list of package registry information.
-
- Returns
- -------
- str | None
- The hash of the artifact, or None if no artifact can be found locally or remotely.
- """
- if local_artifact_dirs:
- # Try to get the hash from a local file.
- artifact_hash = get_local_artifact_hash(purl, local_artifact_dirs)
-
- if artifact_hash:
- return artifact_hash
-
- # Download the artifact.
- if purl.type == "maven":
- maven_registry = next(
- (
- package_registry
- for package_registry in PACKAGE_REGISTRIES
- if isinstance(package_registry, MavenCentralRegistry)
- ),
- None,
- )
- if not maven_registry:
- return None
-
- return maven_registry.get_artifact_hash(purl)
-
- if purl.type == "pypi":
- pypi_registry = next(
- (
- package_registry
- for package_registry in PACKAGE_REGISTRIES
- if isinstance(package_registry, PyPIRegistry)
- ),
- None,
- )
- if not pypi_registry:
- logger.debug("Missing registry for PyPI")
- return None
-
- registry_info = next(
- (
- info
- for info in package_registries_info
- if info.package_registry == pypi_registry and info.build_tool_name in {"pip", "poetry"}
- ),
- None,
- )
- if not registry_info:
- logger.debug("Missing registry information for PyPI")
- return None
-
- if not purl.version:
- return None
-
- pypi_asset = find_or_create_pypi_asset(purl.name, purl.version, registry_info)
- if not pypi_asset:
- return None
-
- pypi_asset.has_repository = True
- if not pypi_asset.download(""):
- return None
-
- artifact_hash = pypi_asset.get_sha256()
- if artifact_hash:
- return artifact_hash
-
- source_url = pypi_asset.get_sourcecode_url("bdist_wheel")
- if not source_url:
- return None
-
- return pypi_registry.get_artifact_hash(source_url)
-
- logger.debug("Purl type '%s' not yet supported for GitHub attestation discovery.", purl.type)
- return None
-
- def get_github_attestation_payload(
- self, analyze_ctx: AnalyzeContext, git_service: GitHub, artifact_hash: str
- ) -> InTotoPayload | None:
- """Get the GitHub attestation associated with the given PURL, or None if it cannot be found.
-
- The schema of GitHub attestation can be found on the API page:
- https://docs.github.com/en/rest/repos/repos?apiVersion=2022-11-28#list-attestations
-
- Parameters
- ----------
- analyze_ctx: AnalyzeContext
- The analysis context.
- git_service: GitHub
- The Git service to retrieve the attestation from.
- artifact_hash: str
- The hash of the related artifact.
-
- Returns
- -------
- InTotoPayload | None
- The attestation payload, if found.
- """
- git_attestation_dict = git_service.api_client.get_attestation(
- analyze_ctx.component.repository.full_name, artifact_hash
- )
-
- if not git_attestation_dict:
- return None
-
- git_attestation_list = json_extract(git_attestation_dict, ["attestations"], list)
- if not git_attestation_list:
- return None
-
- payload = decode_provenance(git_attestation_list[0])
-
- try:
- return validate_intoto_payload(payload)
- except ValidateInTotoPayloadError as error:
- logger.debug("Invalid attestation payload: %s", error)
- return None
-
def _determine_git_service(self, analyze_ctx: AnalyzeContext) -> BaseGitService:
"""Determine the Git service used by the software component."""
remote_path = analyze_ctx.component.repository.remote_path if analyze_ctx.component.repository else None
diff --git a/src/macaron/slsa_analyzer/git_service/api_client.py b/src/macaron/slsa_analyzer/git_service/api_client.py
index 681a1f4e0..05ca0858d 100644
--- a/src/macaron/slsa_analyzer/git_service/api_client.py
+++ b/src/macaron/slsa_analyzer/git_service/api_client.py
@@ -12,7 +12,12 @@
from macaron.config.defaults import defaults
from macaron.slsa_analyzer.asset import AssetLocator
-from macaron.util import construct_query, download_github_build_log, send_get_http, send_get_http_raw
+from macaron.util import (
+ construct_query,
+ download_file_with_size_limit,
+ download_github_build_log,
+ send_get_http,
+)
logger: logging.Logger = logging.getLogger(__name__)
@@ -637,27 +642,11 @@ def download_asset(self, url: str, download_path: str) -> bool:
"""
logger.debug("Download assets from %s at %s.", url, download_path)
- # Allow downloading binaries.
- response = send_get_http_raw(
- url,
- {
- "Accept": "application/octet-stream",
- "Authorization": self.headers["Authorization"],
- },
- timeout=defaults.getint("downloads", "timeout", fallback=120),
- )
- if not response:
- logger.error("Could not download the asset.")
- return False
-
- try:
- with open(download_path, "wb") as asset_file:
- asset_file.write(response.content)
- except OSError as error:
- logger.error(error)
- return False
-
- return True
+ timeout = defaults.getint("downloads", "timeout", fallback=120)
+ size_limit = defaults.getint("slsa.verifier", "max_download_size", fallback=10000000)
+ headers = {"Accept": "application/octet-stream", "Authorization": self.headers["Authorization"]}
+
+ return download_file_with_size_limit(url, headers, download_path, timeout, size_limit)
def get_attestation(self, full_name: str, artifact_hash: str) -> dict:
"""Download and return the attestation associated with the passed artifact hash, if any.
diff --git a/src/macaron/slsa_analyzer/git_service/github.py b/src/macaron/slsa_analyzer/git_service/github.py
index 2a049e755..ff7ecc593 100644
--- a/src/macaron/slsa_analyzer/git_service/github.py
+++ b/src/macaron/slsa_analyzer/git_service/github.py
@@ -1,15 +1,21 @@
-# Copyright (c) 2022 - 2023, Oracle and/or its affiliates. All rights reserved.
+# Copyright (c) 2022 - 2025, Oracle and/or its affiliates. All rights reserved.
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
"""This module contains the spec for the GitHub service."""
+import logging
from pydriller.git import Git
from macaron.config.global_config import global_config
from macaron.errors import ConfigurationError, RepoCheckOutError
+from macaron.json_tools import json_extract
from macaron.slsa_analyzer import git_url
from macaron.slsa_analyzer.git_service.api_client import GhAPIClient, get_default_gh_client
from macaron.slsa_analyzer.git_service.base_git_service import BaseGitService
+from macaron.slsa_analyzer.provenance.intoto import InTotoPayload, ValidateInTotoPayloadError, validate_intoto_payload
+from macaron.slsa_analyzer.provenance.loader import decode_provenance
+
+logger: logging.Logger = logging.getLogger(__name__)
class GitHub(BaseGitService):
@@ -89,3 +95,38 @@ def check_out_repo(self, git_obj: Git, branch: str, digest: str, offline_mode: b
)
return git_obj
+
+ def get_attestation_payload(self, repository_name: str, artifact_hash: str) -> InTotoPayload | None:
+ """Get the GitHub attestation associated with the given PURL, or None if it cannot be found.
+
+ The schema of GitHub attestation can be found on the API page:
+ https://docs.github.com/en/rest/repos/repos?apiVersion=2022-11-28#list-attestations
+
+ Parameters
+ ----------
+ repository_name: str
+ The name of the repository to retrieve attestation from.
+ artifact_hash: str
+ The hash of the related artifact.
+
+ Returns
+ -------
+ InTotoPayload | None
+ The attestation payload, if found.
+ """
+ git_attestation_dict = self.api_client.get_attestation(repository_name, artifact_hash)
+
+ if not git_attestation_dict:
+ return None
+
+ git_attestation_list = json_extract(git_attestation_dict, ["attestations"], list)
+ if not git_attestation_list:
+ return None
+
+ payload = decode_provenance(git_attestation_list[0])
+
+ try:
+ return validate_intoto_payload(payload)
+ except ValidateInTotoPayloadError as error:
+ logger.debug("Invalid attestation payload: %s", error)
+ return None
diff --git a/src/macaron/slsa_analyzer/package_registry/maven_central_registry.py b/src/macaron/slsa_analyzer/package_registry/maven_central_registry.py
index 2fe3c5cea..5efe4ef18 100644
--- a/src/macaron/slsa_analyzer/package_registry/maven_central_registry.py
+++ b/src/macaron/slsa_analyzer/package_registry/maven_central_registry.py
@@ -9,13 +9,12 @@
import requests
from packageurl import PackageURL
-from requests import RequestException
from macaron.artifact.maven import construct_maven_repository_path, construct_primary_jar_file_name
from macaron.config.defaults import defaults
from macaron.errors import ConfigurationError, InvalidHTTPResponseError
from macaron.slsa_analyzer.package_registry.package_registry import PackageRegistry
-from macaron.util import send_get_http_raw
+from macaron.util import send_get_http_raw, stream_file_with_size_limit
logger: logging.Logger = logging.getLogger(__name__)
@@ -285,35 +284,20 @@ def get_artifact_hash(self, purl: PackageURL) -> str | None:
# As Maven hashes are user provided and not verified they serve as a reference only.
logger.debug("Found hash of artifact: %s", retrieved_artifact_hash)
- try:
- response = requests.get(artifact_url, stream=True, timeout=40)
- response.raise_for_status()
- except requests.exceptions.HTTPError as http_err:
- logger.debug("HTTP error occurred when trying to download artifact: %s", http_err)
- return None
-
- if response.status_code != 200:
- return None
-
- # Download file and compute hash as chunks are received.
hash_algorithm = hashlib.sha256()
- try:
- for chunk in response.iter_content():
- hash_algorithm.update(chunk)
- except RequestException as error:
- # Something went wrong with the request, abort.
- logger.debug("Error while streaming target file: %s", error)
- response.close()
+ timeout = defaults.getint("downloads", "timeout", fallback=120)
+ size_limit = defaults.getint("slsa.verifier", "max_download_size", fallback=10000000)
+ if not stream_file_with_size_limit(artifact_url, {}, hash_algorithm.update, timeout, size_limit):
return None
computed_artifact_hash: str = hash_algorithm.hexdigest()
+ logger.debug("Computed hash of artifact: %s", computed_artifact_hash)
if retrieved_artifact_hash and computed_artifact_hash != retrieved_artifact_hash:
logger.debug(
- "Artifact hash and discovered hash do not match: %s != %s",
+ "Computed artifact hash and discovered hash do not match: %s != %s",
computed_artifact_hash,
retrieved_artifact_hash,
)
return None
- logger.debug("Computed hash of artifact: %s", computed_artifact_hash)
return computed_artifact_hash
diff --git a/src/macaron/slsa_analyzer/package_registry/pypi_registry.py b/src/macaron/slsa_analyzer/package_registry/pypi_registry.py
index f0cfcfbc3..44480704e 100644
--- a/src/macaron/slsa_analyzer/package_registry/pypi_registry.py
+++ b/src/macaron/slsa_analyzer/package_registry/pypi_registry.py
@@ -20,14 +20,13 @@
import requests
from bs4 import BeautifulSoup, Tag
-from requests import RequestException
from macaron.config.defaults import defaults
from macaron.errors import ConfigurationError, InvalidHTTPResponseError, SourceCodeError
from macaron.json_tools import json_extract
from macaron.malware_analyzer.datetime_parser import parse_datetime
from macaron.slsa_analyzer.package_registry.package_registry import PackageRegistry
-from macaron.util import send_get_http_raw
+from macaron.util import download_file_with_size_limit, send_get_http_raw, stream_file_with_size_limit
if TYPE_CHECKING:
from macaron.slsa_analyzer.specs.package_registry_spec import PackageRegistryInfo
@@ -172,6 +171,44 @@ def download_package_json(self, url: str) -> dict:
return res_obj
+ @staticmethod
+ def cleanup_sourcecode_directory(
+ directory: str, error_message: str | None = None, error: Exception | None = None
+ ) -> None:
+ """Remove the target directory, and report the passed error if present.
+
+ Parameters
+ ----------
+ directory: str
+ The directory to remove.
+ error_message: str | None
+ The error message to report.
+ error: Exception | None
+ The error to inherit from.
+
+ Raises
+ ------
+ InvalidHTTPResponseError
+ If there was an error during the operation.
+ """
+ if error_message:
+ logger.debug(error_message)
+ try:
+ shutil.rmtree(directory, onerror=_handle_temp_dir_clean)
+ except SourceCodeError as tempdir_exception:
+ tempdir_exception_msg = (
+ f"Unable to cleanup temporary directory {directory} for source code: {tempdir_exception}"
+ )
+ logger.debug(tempdir_exception_msg)
+ raise InvalidHTTPResponseError(error_message) from tempdir_exception
+
+ if not error_message:
+ return
+
+ if error:
+ raise InvalidHTTPResponseError(error_message) from error
+ raise InvalidHTTPResponseError(error_message)
+
def download_package_sourcecode(self, url: str) -> str:
"""Download the package source code from pypi registry.
@@ -194,77 +231,29 @@ def download_package_sourcecode(self, url: str) -> str:
_, _, file_name = url.rpartition("/")
package_name = re.sub(r"\.tar\.gz$", "", file_name)
- # temporary directory to unzip and read all source files
+ # Temporary directory to unzip and read all source files.
temp_dir = tempfile.mkdtemp(prefix=f"{package_name}_")
- response = send_get_http_raw(url, stream=True)
- if response is None:
- error_msg = f"Unable to find package source code using URL: {url}"
- logger.debug(error_msg)
- try:
- shutil.rmtree(temp_dir, onerror=_handle_temp_dir_clean)
- except SourceCodeError as tempdir_exception:
- tempdir_exception_msg = (
- f"Unable to cleanup temporary directory {temp_dir} for source code: {tempdir_exception}"
- )
- logger.debug(tempdir_exception_msg)
- raise InvalidHTTPResponseError(error_msg) from tempdir_exception
+ source_file = os.path.join(temp_dir, file_name)
+ timeout = defaults.getint("downloads", "timeout", fallback=120)
+ size_limit = defaults.getint("slsa.verifier", "max_download_size", fallback=10000000)
+ if not download_file_with_size_limit(url, {}, source_file, timeout, size_limit):
+ self.cleanup_sourcecode_directory(temp_dir, "Could not download the file.")
- raise InvalidHTTPResponseError(error_msg)
+ if not tarfile.is_tarfile(source_file):
+ self.cleanup_sourcecode_directory(temp_dir, f"Unable to extract source code from file {file_name}")
- with tempfile.NamedTemporaryFile("+wb", delete=True) as source_file:
- try:
- for chunk in response.iter_content():
- source_file.write(chunk)
- source_file.flush()
- except RequestException as stream_error:
- error_msg = f"Error while streaming source file: {stream_error}"
- logger.debug(error_msg)
- try:
- shutil.rmtree(temp_dir, onerror=_handle_temp_dir_clean)
- except SourceCodeError as tempdir_exception:
- tempdir_exception_msg = (
- f"Unable to cleanup temporary directory {temp_dir} for source code: {tempdir_exception}"
- )
- logger.debug(tempdir_exception_msg)
-
- raise InvalidHTTPResponseError(error_msg) from RequestException
+ try:
+ with tarfile.open(source_file, "r:gz") as sourcecode_tar:
+ sourcecode_tar.extractall(temp_dir, filter="data")
+ except tarfile.ReadError as read_error:
+ self.cleanup_sourcecode_directory(temp_dir, f"Error reading source code tar file: {read_error}", read_error)
- if tarfile.is_tarfile(source_file.name):
- try:
- with tarfile.open(source_file.name, "r:gz") as sourcecode_tar:
- sourcecode_tar.extractall(temp_dir, filter="data")
-
- except tarfile.ReadError as read_error:
- error_msg = f"Error reading source code tar file: {read_error}"
- logger.debug(error_msg)
- try:
- shutil.rmtree(temp_dir, onerror=_handle_temp_dir_clean)
- except SourceCodeError as tempdir_exception:
- tempdir_exception_msg = (
- f"Unable to cleanup temporary directory {temp_dir} for source code: {tempdir_exception}"
- )
- logger.debug(tempdir_exception_msg)
-
- raise InvalidHTTPResponseError(error_msg) from read_error
-
- extracted_dir = os.listdir(temp_dir)
- if len(extracted_dir) == 1 and package_name == extracted_dir[0]:
- # structure used package name and version as top-level directory
- temp_dir = os.path.join(temp_dir, extracted_dir[0])
-
- else:
- error_msg = f"Unable to extract source code from file {file_name}"
- logger.debug(error_msg)
- try:
- shutil.rmtree(temp_dir, onerror=_handle_temp_dir_clean)
- except SourceCodeError as tempdir_exception:
- tempdir_exception_msg = (
- f"Unable to cleanup temporary directory {temp_dir} for source code: {tempdir_exception}"
- )
- logger.debug(tempdir_exception_msg)
- raise InvalidHTTPResponseError(error_msg) from tempdir_exception
+ os.remove(source_file)
- raise InvalidHTTPResponseError(error_msg)
+ extracted_dir = os.listdir(temp_dir)
+ if len(extracted_dir) == 1 and extracted_dir[0] == package_name:
+ # Structure used package name and version as top-level directory.
+ temp_dir = os.path.join(temp_dir, extracted_dir[0])
logger.debug("Temporary download and unzip of %s stored in %s", file_name, temp_dir)
return temp_dir
@@ -282,25 +271,10 @@ def get_artifact_hash(self, artifact_url: str) -> str | None:
str | None
The hash of the artifact, or None if not found.
"""
- try:
- response = requests.get(artifact_url, stream=True, timeout=40)
- response.raise_for_status()
- except requests.exceptions.HTTPError as http_err:
- logger.debug("HTTP error occurred when trying to download artifact: %s", http_err)
- return None
-
- if response.status_code != 200:
- logger.debug("Invalid response: %s", response.status_code)
- return None
-
hash_algorithm = hashlib.sha256()
- try:
- for chunk in response.iter_content():
- hash_algorithm.update(chunk)
- except RequestException as error:
- # Something went wrong with the request, abort.
- logger.debug("Error while streaming source file: %s", error)
- response.close()
+ timeout = defaults.getint("downloads", "timeout", fallback=120)
+ size_limit = defaults.getint("slsa.verifier", "max_download_size", fallback=10000000)
+ if not stream_file_with_size_limit(artifact_url, {}, hash_algorithm.update, timeout, size_limit):
return None
artifact_hash: str = hash_algorithm.hexdigest()
@@ -609,7 +583,8 @@ def sourcecode(self) -> Generator[None]:
if not self.download_sourcecode():
raise SourceCodeError("Unable to download package source code.")
yield
- self.cleanup_sourcecode()
+ if self.package_sourcecode_path:
+ PyPIRegistry.cleanup_sourcecode_directory(self.package_sourcecode_path)
def download_sourcecode(self) -> bool:
"""Get the source code of the package and store it in a temporary directory.
@@ -628,25 +603,6 @@ def download_sourcecode(self) -> bool:
logger.debug(error)
return False
- def cleanup_sourcecode(self) -> None:
- """
- Delete the temporary directory created when downloading the source code.
-
- The package source code is no longer accessible after this, and the package_sourcecode_path
- attribute is set to an empty string.
- """
- if self.package_sourcecode_path:
- try:
- shutil.rmtree(self.package_sourcecode_path, onerror=_handle_temp_dir_clean)
- self.package_sourcecode_path = ""
- except SourceCodeError as tempdir_exception:
- tempdir_exception_msg = (
- f"Unable to cleanup temporary directory {self.package_sourcecode_path}"
- f" for source code: {tempdir_exception}"
- )
- logger.debug(tempdir_exception_msg)
- raise tempdir_exception
-
def get_sourcecode_file_contents(self, path: str) -> bytes:
"""
Get the contents of a single source code file specified by the path.
@@ -696,7 +652,7 @@ def iter_sourcecode(self) -> Iterator[tuple[str, bytes]]:
Returns
-------
tuple[str, bytes]
- The source code file path, and the the raw contents of the source code file.
+ The source code file path, and the raw contents of the source code file.
Raises
------
diff --git a/src/macaron/slsa_analyzer/provenance/loader.py b/src/macaron/slsa_analyzer/provenance/loader.py
index 106cc03b5..3e9d9b1b0 100644
--- a/src/macaron/slsa_analyzer/provenance/loader.py
+++ b/src/macaron/slsa_analyzer/provenance/loader.py
@@ -123,6 +123,9 @@ def decode_provenance(provenance: dict) -> dict[str, JsonType]:
# GitHub Attestation.
# TODO Check if old method (above) actually works.
provenance_payload = json_extract(provenance, ["bundle", "dsseEnvelope", "payload"], str)
+ if not provenance_payload and "attestations" in provenance:
+ # Check for multiple attestation and return the first instance.
+ provenance_payload = json_extract(provenance, ["attestations", 0, "bundle", "dsseEnvelope", "payload"], str)
if not provenance_payload:
raise LoadIntotoAttestationError(
'Cannot find the "payload" field in the decoded provenance.',
diff --git a/src/macaron/util.py b/src/macaron/util.py
index 96af86991..0d01dfc43 100644
--- a/src/macaron/util.py
+++ b/src/macaron/util.py
@@ -8,7 +8,9 @@
import shutil
import time
import urllib.parse
+from collections.abc import Callable
from datetime import datetime
+from typing import BinaryIO
import requests
from requests.models import Response
@@ -272,6 +274,105 @@ def send_post_http_raw(
return response
+class StreamWriteDownloader:
+ """A class to handle writing a streamed download to a file."""
+
+ def __init__(self, file: BinaryIO) -> None:
+ """Initialise the class with the file path."""
+ self.file = file
+
+ def chunk_function(self, chunk: bytes) -> None:
+ """Write the chunk to the file."""
+ self.file.write(chunk)
+
+
+def download_file_with_size_limit(
+ url: str, headers: dict, file_path: str, timeout: int = 40, size_limit: int = 0
+) -> bool:
+ """Download a file with a size limit that will abort the operation if exceeded.
+
+ Parameters
+ ----------
+ url: str
+ The target of the request.
+ headers: dict
+ The headers to use in the request.
+ file_path: str
+ The path to download the file to.
+ timeout: int
+ The timeout in seconds for the request.
+ size_limit: int
+ The size limit in bytes of the downloaded file.
+ A download will terminate if it reaches beyond this amount.
+
+ Returns
+ -------
+ bool
+ True if the operation succeeded, False otherwise.
+ """
+ try:
+ with open(file_path, "wb") as file:
+ downloader = StreamWriteDownloader(file)
+ return stream_file_with_size_limit(url, headers, downloader.chunk_function, timeout, size_limit)
+ except OSError as error:
+ logger.error(error)
+ return False
+
+
+def stream_file_with_size_limit(
+ url: str, headers: dict, chunk_function: Callable[[bytes], None], timeout: int = 40, size_limit: int = 0
+) -> bool:
+ """Stream a file download and perform the passed function on the chunks of its data.
+
+ If data in excess of the size limit is received, this operation will be aborted.
+
+ Parameters
+ ----------
+ url: str
+ The target of the request.
+ headers: dict
+ The headers to use in the request.
+ chunk_function: Callable[[bytes], None]
+ The function to use with each downloaded chunk.
+ timeout: int
+ The timeout in seconds for the request.
+ size_limit: int
+ The size limit in bytes of the downloaded file.
+ A download will terminate if it reaches beyond this amount.
+ The default value of zero disables the limit.
+
+ Returns
+ -------
+ bool
+ True if the operation succeeded, False otherwise.
+ """
+ try:
+ response = requests.get(url, headers=headers, stream=True, timeout=timeout)
+ response.raise_for_status()
+ except requests.exceptions.HTTPError as http_err:
+ logger.debug("HTTP error occurred when trying to stream source: %s", http_err)
+ return False
+
+ if response.status_code != 200:
+ return False
+
+ data_processed = 0
+ for chunk in response.iter_content(chunk_size=512):
+ if data_processed >= size_limit > 0:
+ response.close()
+ logger.warning(
+ "The download of file '%s' has been unsuccessful due to the configured size limit. "
+ "To be able to download this file, increase the size limit and try again.",
+ url,
+ )
+ return False
+
+ chunk_function(chunk)
+ data_processed += len(chunk)
+
+ return True
+
+
def check_rate_limit(response: Response) -> None:
"""Check the remaining calls limit to GitHub API and wait accordingly.
diff --git a/tests/integration/cases/django_with_dep_resolution_virtual_env_as_input/config.ini b/tests/integration/cases/django_with_dep_resolution_virtual_env_as_input/config.ini
new file mode 100644
index 000000000..5162fb14e
--- /dev/null
+++ b/tests/integration/cases/django_with_dep_resolution_virtual_env_as_input/config.ini
@@ -0,0 +1,5 @@
+# Copyright (c) 2024 - 2025, Oracle and/or its affiliates. All rights reserved.
+# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
+
+[slsa.verifier]
+max_download_size = 15000000
diff --git a/tests/integration/cases/django_with_dep_resolution_virtual_env_as_input/test.yaml b/tests/integration/cases/django_with_dep_resolution_virtual_env_as_input/test.yaml
index 52a821ff4..b871fd880 100644
--- a/tests/integration/cases/django_with_dep_resolution_virtual_env_as_input/test.yaml
+++ b/tests/integration/cases/django_with_dep_resolution_virtual_env_as_input/test.yaml
@@ -25,6 +25,7 @@ steps:
- name: Run macaron analyze
kind: analyze
options:
+ ini: config.ini
command_args:
- -purl
- pkg:pypi/django@5.0.6
@@ -48,6 +49,7 @@ steps:
- name: Run macaron analyze on deps recursively
kind: analyze
options:
+ ini: config.ini
command_args:
- -purl
- pkg:pypi/django@5.0.6
@@ -74,6 +76,7 @@ steps:
- name: Run macaron analyze with forced sourcecode analysis
kind: analyze
options:
+ ini: config.ini
command_args:
- -purl
- pkg:pypi/django@5.0.6
diff --git a/tests/integration/cases/micronaut-projects_micronaut-test/test.yaml b/tests/integration/cases/micronaut-projects_micronaut-test/test.yaml
index 004958361..c7cda9fc2 100644
--- a/tests/integration/cases/micronaut-projects_micronaut-test/test.yaml
+++ b/tests/integration/cases/micronaut-projects_micronaut-test/test.yaml
@@ -15,7 +15,6 @@ steps:
command_args:
- -purl
- pkg:maven/io.micronaut.test/micronaut-test-junit5@4.5.0
- - --verify-provenance
- name: Validate JSON report schema
kind: validate_schema
options:
diff --git a/tests/integration/cases/ossf_scorecard/config.ini b/tests/integration/cases/ossf_scorecard/config.ini
index ec1d945d4..7db45173b 100644
--- a/tests/integration/cases/ossf_scorecard/config.ini
+++ b/tests/integration/cases/ossf_scorecard/config.ini
@@ -7,3 +7,6 @@ include =
mcn_provenance_expectation_1
mcn_provenance_verified_1
mcn_trusted_builder_level_three_1
+
+[slsa.verifier]
+max_download_size = 15000000
diff --git a/tests/integration/cases/ossf_scorecard/test.yaml b/tests/integration/cases/ossf_scorecard/test.yaml
index c3d64f980..653140505 100644
--- a/tests/integration/cases/ossf_scorecard/test.yaml
+++ b/tests/integration/cases/ossf_scorecard/test.yaml
@@ -17,7 +17,6 @@ steps:
command_args:
- --package-url
- pkg:github/ossf/scorecard@v4.13.1
- - --verify-provenance
- name: Run macaron verify-policy to verify passed/failed checks
kind: verify
options:
diff --git a/tests/integration/cases/semver/test.yaml b/tests/integration/cases/semver/test.yaml
index fa6a1b174..bd7fa464e 100644
--- a/tests/integration/cases/semver/test.yaml
+++ b/tests/integration/cases/semver/test.yaml
@@ -17,7 +17,6 @@ steps:
command_args:
- -purl
- pkg:npm/semver@7.6.2
- - --verify-provenance
- name: Run macaron verify-policy to verify passed/failed checks
kind: verify
options:
diff --git a/tests/integration/cases/sigstore_mock/test.yaml b/tests/integration/cases/sigstore_mock/test.yaml
index bd635febe..669224504 100644
--- a/tests/integration/cases/sigstore_mock/test.yaml
+++ b/tests/integration/cases/sigstore_mock/test.yaml
@@ -21,7 +21,6 @@ steps:
- main
- -d
- ebdcfdfbdfeb9c9aeee6df53674ef230613629f5
- - --verify-provenance
- name: Run macaron verify-policy to verify passed/failed checks
kind: verify
options:
diff --git a/tests/integration/cases/tutorial_npm_verify_provenance_semver/policy_7_6_2.dl b/tests/integration/cases/tutorial_npm_verify_provenance_semver/policy_7_7_2.dl
similarity index 80%
rename from tests/integration/cases/tutorial_npm_verify_provenance_semver/policy_7_6_2.dl
rename to tests/integration/cases/tutorial_npm_verify_provenance_semver/policy_7_7_2.dl
index 60c4a4df2..5279c569e 100644
--- a/tests/integration/cases/tutorial_npm_verify_provenance_semver/policy_7_6_2.dl
+++ b/tests/integration/cases/tutorial_npm_verify_provenance_semver/policy_7_7_2.dl
@@ -1,4 +1,4 @@
-/* Copyright (c) 2024 - 2024, Oracle and/or its affiliates. All rights reserved. */
+/* Copyright (c) 2024 - 2025, Oracle and/or its affiliates. All rights reserved. */
/* Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/. */
#include "prelude.dl"
@@ -9,4 +9,4 @@ Policy("has-verified-provenance", component_id, "Require a verified provenance f
check_passed(component_id, "mcn_provenance_verified_1").
apply_policy_to("has-verified-provenance", component_id) :-
- is_component(component_id, "pkg:npm/semver@7.6.2").
+ is_component(component_id, "pkg:npm/semver@7.7.2").
diff --git a/tests/integration/cases/tutorial_npm_verify_provenance_semver/policy_7_6_x.dl b/tests/integration/cases/tutorial_npm_verify_provenance_semver/policy_7_x.dl
similarity index 83%
rename from tests/integration/cases/tutorial_npm_verify_provenance_semver/policy_7_6_x.dl
rename to tests/integration/cases/tutorial_npm_verify_provenance_semver/policy_7_x.dl
index bbfdf3db9..0a59f84d4 100644
--- a/tests/integration/cases/tutorial_npm_verify_provenance_semver/policy_7_6_x.dl
+++ b/tests/integration/cases/tutorial_npm_verify_provenance_semver/policy_7_x.dl
@@ -1,4 +1,4 @@
-/* Copyright (c) 2024 - 2024, Oracle and/or its affiliates. All rights reserved. */
+/* Copyright (c) 2024 - 2025, Oracle and/or its affiliates. All rights reserved. */
/* Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/. */
#include "prelude.dl"
@@ -10,4 +10,4 @@ Policy("has-verified-provenance", component_id, "Require a verified provenance f
apply_policy_to("has-verified-provenance", component_id) :-
is_component(component_id, purl),
- match("pkg:npm/semver@7.6.*", purl).
+ match("pkg:npm/semver@7.*", purl).
diff --git a/tests/integration/cases/tutorial_npm_verify_provenance_semver/test.yaml b/tests/integration/cases/tutorial_npm_verify_provenance_semver/test.yaml
index ac11642f4..a37e2b8bf 100644
--- a/tests/integration/cases/tutorial_npm_verify_provenance_semver/test.yaml
+++ b/tests/integration/cases/tutorial_npm_verify_provenance_semver/test.yaml
@@ -8,35 +8,32 @@ tags:
- tutorial
steps:
-- name: Run macaron analyze semver@7.6.2
+- name: Run macaron analyze semver@7.7.2
kind: analyze
options:
command_args:
- -purl
- - pkg:npm/semver@7.6.2
- - --verify-provenance
-- name: Verify checks for semver@7.6.2
+ - pkg:npm/semver@7.7.2
+- name: Verify checks for semver@7.7.2
kind: verify
options:
- policy: policy_7_6_2.dl
+ policy: policy_7_7_2.dl
- name: Run macaron analyze@semver@7.6.0
kind: analyze
options:
command_args:
- -purl
- pkg:npm/semver@7.6.0
- - --verify-provenance
-- name: Verify checks for all 7.6.x semver runs
+- name: Verify checks for all 7.x semver runs
kind: verify
options:
- policy: policy_7_6_x.dl
+ policy: policy_7_x.dl
- name: Run macaron analyze semver@1.0.0
kind: analyze
options:
command_args:
- -purl
- pkg:npm/semver@1.0.0
- - --verify-provenance
- name: Verify checks for all semver runs
kind: verify
options:
diff --git a/tests/integration/cases/tutorial_toga_provenance/policy.dl b/tests/integration/cases/tutorial_toga_provenance/policy.dl
new file mode 100644
index 000000000..7e18aed36
--- /dev/null
+++ b/tests/integration/cases/tutorial_toga_provenance/policy.dl
@@ -0,0 +1,13 @@
+/* Copyright (c) 2024 - 2025, Oracle and/or its affiliates. All rights reserved. */
+/* Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/. */
+
+#include "prelude.dl"
+
+Policy("has-verified-provenance", component_id, "Require a verified provenance file.") :-
+ check_passed(component_id, "mcn_provenance_derived_repo_1"),
+ check_passed(component_id, "mcn_provenance_derived_commit_1"),
+ check_passed(component_id, "mcn_provenance_verified_1").
+
+apply_policy_to("has-verified-provenance", component_id) :-
+ is_component(component_id, purl),
+ match("pkg:pypi/toga@*", purl).
diff --git a/tests/integration/cases/tutorial_toga_provenance/test.yaml b/tests/integration/cases/tutorial_toga_provenance/test.yaml
new file mode 100644
index 000000000..fe4e0b42f
--- /dev/null
+++ b/tests/integration/cases/tutorial_toga_provenance/test.yaml
@@ -0,0 +1,26 @@
+# Copyright (c) 2024 - 2025, Oracle and/or its affiliates. All rights reserved.
+# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
+
+description: |
+ Analysing the toga library at different versions to find PyPI and GitHub attestation.
+
+tags:
+- tutorial
+
+steps:
+- name: Run macaron analyze toga with PyPI attestation
+ kind: analyze
+ options:
+ command_args:
+ - -purl
+ - pkg:pypi/toga@0.5.1
+- name: Run macaron analyze toga with GitHub attestation
+ kind: analyze
+ options:
+ command_args:
+ - -purl
+ - pkg:pypi/toga@0.4.8
+- name: Verify provenance exists for both analyses
+ kind: verify
+ options:
+ policy: policy.dl
diff --git a/tests/integration/cases/urllib3_expectation_dir/test.yaml b/tests/integration/cases/urllib3_expectation_dir/test.yaml
index ec9e2739d..8646c8edd 100644
--- a/tests/integration/cases/urllib3_expectation_dir/test.yaml
+++ b/tests/integration/cases/urllib3_expectation_dir/test.yaml
@@ -18,7 +18,6 @@ steps:
- pkg:pypi/urllib3@2.0.0a1
- --provenance-expectation
- expectation
- - --verify-provenance
- name: Run macaron verify-policy to verify passed/failed checks
kind: verify
options:
diff --git a/tests/integration/cases/urllib3_expectation_file/test.yaml b/tests/integration/cases/urllib3_expectation_file/test.yaml
index 21441d0a5..5b204387b 100644
--- a/tests/integration/cases/urllib3_expectation_file/test.yaml
+++ b/tests/integration/cases/urllib3_expectation_file/test.yaml
@@ -8,6 +8,7 @@ description: |
tags:
- macaron-python-package
- macaron-docker-image
+- tutorial
steps:
- name: Run macaron analyze with expectation file
@@ -17,7 +18,6 @@ steps:
command_args:
- -purl
- pkg:pypi/urllib3@2.0.0a1
- - --verify-provenance
- name: Run macaron verify-policy to verify passed/failed checks
kind: verify
options:
diff --git a/tests/integration/cases/urllib3_invalid_expectation/test.yaml b/tests/integration/cases/urllib3_invalid_expectation/test.yaml
index f50aefebf..960e10ebe 100644
--- a/tests/integration/cases/urllib3_invalid_expectation/test.yaml
+++ b/tests/integration/cases/urllib3_invalid_expectation/test.yaml
@@ -17,7 +17,6 @@ steps:
command_args:
- -purl
- pkg:pypi/urllib3@2.0.0a1
- - --verify-provenance
- name: Run macaron verify-policy to verify passed/failed checks
kind: verify
options: