- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 60
Added Phase 2 processing for GitHub license data (summary by license and totals) #203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Added Phase 2 processing for GitHub license data (summary by license and totals) #203
Conversation
| wordcloud = "*" | ||
|  | ||
| [dev-packages] | ||
| black = "*" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is not to be included in your PR.
        
          
                Pipfile.lock
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file too is not to be part of your PR.
| Hello @Babi-B , I have made the highlighed changes. Please, check it out. | 
| Hi @Ramses-Njasap! I noticed your PR hasn’t been reviewed yet. You could drop a message on Zulip tagging @TimidRobot to share what you’ve done and request feedback. | 
| Hello @Babi-B , Thank you for the advice . I'll tag him in the Zulip group . As at now I can't continue with the other task when this has not been validated (the tasks are connected) | 
| This pull request (PR) is unacceptable due to a failure to follow the PR template instructions. The Checklist instructions include: <!-- DON'T remove this section or any of the lines. -->
<!-- Leave incomplete or inapplicable lines unchecked. -->
<!-- Replace the [ ] with [x] to check the boxes (there is no space between x and square brackets). -->The template is located here: creativecommons/.github/blob/main/.github/PULL_REQUEST_TEMPLATE.md Pull requests without the Developer Certificate of Origin section won't be accepted 🙅🏻 | 
| @TimidRobot , | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The processing script should be worked on as the same time as the reporting script. There should be a 1:1 relationship between the CSV files and the plots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be removed from the pull request (PR) and NOT deleted (removed form the project)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be removed from the pull request (PR) and NOT deleted (removed form the project)
| SPDX_TO_CC_LICENSE = { | ||
| "CC0-1.0": "zero_1.0", | ||
| "CC-BY-4.0": "by_4.0", | ||
| "CC-BY-SA-4.0": "by-sa_4.0", | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not CC license identifiers
| } | ||
|  | ||
| # Licenses outside Creative Commons are kept unchanged | ||
| NON_CC_LICENSES = {"0BSD", "MIT-0", "Unlicense", "N/A"} | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'N/A' stands for "not applicable", it is not a license.
| save_summary(summary, args) | ||
|  | ||
|  | ||
| if __name__ == "__main__": | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section should match:
quantifying/scripts/1-fetch/github_fetch.py
Lines 180 to 206 in d95a928
| if __name__ == "__main__": | |
| try: | |
| main() | |
| except shared.QuantifyingException as e: | |
| if e.exit_code == 0: | |
| LOGGER.info(e.message) | |
| else: | |
| LOGGER.error(e.message) | |
| sys.exit(e.exit_code) | |
| except SystemExit as e: | |
| if e.code != 0: | |
| LOGGER.error(f"System exit with code: {e.code}") | |
| sys.exit(e.code) | |
| except KeyboardInterrupt: | |
| LOGGER.info("(130) Halted via KeyboardInterrupt.") | |
| sys.exit(130) | |
| except Exception: | |
| traceback_formatted = textwrap.indent( | |
| highlight( | |
| traceback.format_exc(), | |
| PythonTracebackLexer(), | |
| TerminalFormatter(), | |
| ), | |
| " ", | |
| ) | |
| LOGGER.critical(f"(1) Unhandled exception:\n{traceback_formatted}") | |
| sys.exit(1) | 
Fixes
Fixes #166 by @TimidRobot
Description
This pull request implements Phase 2 processing for GitHub data by adding the
github_process.pyscript. It reads GitHub CC license usage data collected in Phase 1, applies cleaning and transformation, maps LICENSE identifier to official Creative Commons legal tool identifiers, and generates a summary CSV file for reporting.This processing step prepares GitHub license statistics for use in Phase 3 reporting and future quarterly comparisons.
Technical details
scripts/2-process/github_process.pydata/{year}Q{quarter}/1-fetch/github_1_count.csvdata/{year}Q{quarter}/2-process/github_summary.csv--enable-save: writes the summary file to disk--enable-git: optionally commits and pushes the generated fileQuantifyingExceptionif input data is missingTests
Steps to test:
data/{year}Q{quarter}/2-process/github_summary.csvLicense identifiers use CC legal tool format (e.g. CC-BY-4.0)
Totals are correct and a TOTAL row is included
Checklist
Update index.md).mainormaster).visible errors.
Developer Certificate of Origin
Developer Certificate of Origin
vbnet Copy code Developer Certificate of Origin Version 1.1Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129
Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.