-
-
Notifications
You must be signed in to change notification settings - Fork 287
Changed the logic of fetching leader's data #990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed the logic of fetching leader's data #990
Conversation
Summary by CodeRabbit
WalkthroughThis pull request removes direct assignments to the Changes
Possibly related PRs
Suggested reviewers
Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (17)
💤 Files with no reviewable changes (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (5)
⏰ Context from checks skipped due to timeout of 90000ms (5)
🔇 Additional comments (29)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
backend/apps/owasp/scraper.py (2)
70-73: Consider more robust regex pattern and result handling.The current regex pattern works for simple cases but may miss leaders with special characters or alternate Markdown link formats.
Consider a more robust implementation:
for line in lines: logger.debug("Processing line: %s", line) - match = re.findall(r"\* \[([^\]]+)\]", line) + # Match both standard Markdown list items with links and variations + match = re.findall(r"\*\s*\[([^\]]+)\](?:\([^)]*\))?", line) leaders.extend(match)This pattern also handles optional spaces after the asterisk and optional link URLs in parentheses after the name.
69-69: Consider limiting debug logging for large content.Logging the entire content may flood logs if the file is large.
Consider limiting the logged content:
- logger.debug("Content: %s", content) + logger.debug("Content length: %d characters", len(content)) + if len(content) < 500: # Only log full content if it's reasonably small + logger.debug("Content: %s", content)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/apps/owasp/management/commands/owasp_scrape_chapters.py(1 hunks)backend/apps/owasp/management/commands/owasp_scrape_committees.py(1 hunks)backend/apps/owasp/management/commands/owasp_scrape_projects.py(1 hunks)backend/apps/owasp/scraper.py(2 hunks)backend/tests/owasp/scraper_test.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: CodeQL (python)
- GitHub Check: Run frontend unit tests
- GitHub Check: Run frontend e2e tests
- GitHub Check: Run backend tests
- GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (7)
backend/apps/owasp/management/commands/owasp_scrape_chapters.py (1)
66-67:✅ Verification successful
Updated leader fetching to use repository context - good approach.
The change correctly implements the new pattern of passing the repository context to the
get_leadersmethod. This is consistent with the changes in other scraper commands and ensures that leader data is retrieved within the proper repository context.
🏁 Script executed:
#!/bin/bash # Check if all scraper commands consistently use the repository parameter # Check if other files implement the same pattern for fetching leaders echo "Checking scraper command implementations for consistency:" rg -l "\.owasp_repository" --type py backend/apps/owasp/management/commands/ rg "get_leaders\(.*repository" --type py backend/apps/owasp/management/commands/Length of output: 941
Consistency Verified: Repository Parameter Use in Leader Fetching
The updated implementation in
owasp_scrape_chapters.pyis consistent with similar patterns in other scraper commands. All files now correctly pass the repository context toget_leaders, ensuring uniform behavior.
- File:
backend/apps/owasp/management/commands/owasp_scrape_chapters.py(Lines 66-67)- Verified in:
owasp_scrape_committees.py&owasp_scrape_projects.pybackend/tests/owasp/scraper_test.py (1)
117-122:✅ Verification successful
Test updated correctly with mock repository.
The test has been properly updated to accommodate the signature change in
get_leaders(). The mock repository includes all the essential properties (key,default_branch, andname) that the updated method likely depends on. This ensures that the test accurately validates the new implementation.
🏁 Script executed:
#!/bin/bash # Check if the get_leaders implementation relies on these repository properties # Look at the implementation of get_leaders to verify if it uses these properties rg -A 10 "def get_leaders" --type py backend/apps/owasp/Length of output: 865
Action: Approved – Test correctly reflects new implementation expectations.
The updated test in
backend/tests/owasp/scraper_test.pyuses a mock repository with propertieskey,default_branch, andname. Verification confirms that whileget_leaders()(located inbackend/apps/owasp/scraper.py) relies onrepository.keyandrepository.default_branch(to construct the URL for fetching the leaders file), the inclusion ofrepository.nameis extraneous but harmless. The test accurately validates the method's behavior by asserting an empty list is returned.backend/apps/owasp/management/commands/owasp_scrape_projects.py (1)
78-79:✅ Verification successful
Leader fetching updated to use repository context - ensures consistency.
The change aligns with the updated approach of fetching leaders with repository context, consistent with the implementations in other scraper commands. This provides a more contextual way to retrieve leader information.
🏁 Script executed:
#!/bin/bash # Verify that owasp_repository is properly implemented in the Project model # Check if the Project model has owasp_repository property/method implemented echo "Checking Project model implementation:" rg -A 5 "owasp_repository" --type py backend/apps/owasp/models/project.py # Also check for any error handling around this property access rg -C 3 "owasp_repository" --type py backend/apps/owasp/models/Length of output: 6676
Confirmed: Leader fetching now utilizes the repository context consistently.
The change in
backend/apps/owasp/management/commands/owasp_scrape_projects.py(lines 78-79) correctly fetchesproject.owasp_repositorybefore retrieving the leaders. Verification shows that theowasp_repositoryproperty is properly implemented in the Project model—and similar patterns are applied in other models—ensuring a uniform approach across the codebase.backend/apps/owasp/management/commands/owasp_scrape_committees.py (1)
66-67:✅ Verification successful
Leader fetching updated to use repository context - maintains consistent approach.
The code now correctly retrieves the repository from the committee object before passing it to
get_leaders(). This change is consistent with the implementation in other scraper commands and ensures a uniform approach to fetching leader data across different entity types.
🏁 Script executed:
#!/bin/bash # Check if the Committee model properly implements owasp_repository # and look for any implementation details of get_leaders in the scraper # Check Committee model implementation echo "Checking Committee model implementation:" rg -A 5 "owasp_repository" --type py backend/apps/owasp/models/committee.py # Look at how get_leaders uses the repository parameter echo -e "\nChecking how get_leaders method uses the repository:" rg -A 15 "def get_leaders" --type py backend/apps/owasp/scraper.pyLength of output: 1327
Consistent Repository Context for Leader Fetching
Verified that the committee’s
owasp_repositoryis correctly set (seebackend/apps/owasp/models/committee.py) and thatscraper.get_leaders()properly uses the repository context (seebackend/apps/owasp/scraper.py). The update inbackend/apps/owasp/management/commands/owasp_scrape_committees.py(lines 66-67) now aligns perfectly with similar implementations across the codebase. No further action is needed.backend/apps/owasp/scraper.py (3)
4-4: Appropriate imports for the new implementation.Good job adding the necessary imports for the new leader fetching logic. The regex module for pattern matching and the GitHub utility function are well-placed.
Also applies to: 12-13
61-65: Method signature change with clear implementation.Good refactoring to change from HTML parsing to a more structured approach using the GitHub-hosted Markdown file. The method signature change is appropriate as it now requires repository context.
63-65:❓ Verification inconclusive
Verify GitHub raw content URL format.
The URL construction assumes a specific GitHub raw content URL format. Make sure this follows GitHub's stable URL scheme for accessing raw file content.
🏁 Script executed:
#!/bin/bash # Check if the code already handles repositories with spaces or special characters in names cd $(git rev-parse --show-toplevel) url_pattern="https://raw.githubusercontent.com/OWASP/.*/.*/.*/leaders.md" rg -l "https://raw.githubusercontent.com/OWASP/" | xargs rg -l "leaders.md" | xargs cat | rg "$url_pattern" -I --color=alwaysLength of output: 303
Review the GitHub Raw URL Construction
The URL built in the code:
content = get_repository_file_content( f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md" )correctly follows GitHub’s stable raw content URL scheme. However, the verification shell script’s regex pattern expects one more dynamic segment than actually exists (it uses three ".*" groups after
/OWASP/instead of the two provided by{repository.key}and{repository.default_branch}). This mismatch likely explains why the script produced no output.Action Items:
- Confirm that the URL format remains consistent with GitHub’s spec (i.e.,
https://raw.githubusercontent.com/OWASP/<repository_key>/<default_branch>/leaders.md).- If you intend to validate this URL using a regex, update the pattern (for example, to
https://raw.githubusercontent.com/OWASP/[^/]+/[^/]+/leaders.md) so it accurately reflects the URL’s structure.
backend/apps/owasp/scraper.py
Outdated
| def get_leaders(self, repository): | ||
| """Get leaders from leaders.md file on GitHub.""" | ||
| content = get_repository_file_content( | ||
| f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md" | ||
| ) | ||
| leaders = [] | ||
| try: | ||
| lines = content.split("\n") | ||
| logger.debug("Content: %s", content) | ||
| for line in lines: | ||
| logger.debug("Processing line: %s", line) | ||
| match = re.findall(r"\* \[([^\]]+)\]", line) | ||
| leaders.extend(match) | ||
| except AttributeError: | ||
| logger.exception( | ||
| "Unable to parse leaders.md content", extra={"repository": repository.name} | ||
| ) | ||
| return leaders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Additional error handling needed for network/GitHub API failures.
The current error handling only catches AttributeError from content parsing, but there's no handling for potential network issues or GitHub API failures that could occur in get_repository_file_content().
Consider adding error handling for network-related exceptions:
def get_leaders(self, repository):
"""Get leaders from leaders.md file on GitHub."""
- content = get_repository_file_content(
- f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md"
- )
+ try:
+ content = get_repository_file_content(
+ f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md"
+ )
+ except (requests.exceptions.RequestException, ValueError) as e:
+ logger.exception(
+ "Failed to fetch leaders.md file",
+ extra={"repository": repository.name, "error": str(e)}
+ )
+ return []
leaders = []
try:
lines = content.split("\n")
logger.debug("Content: %s", content)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get_leaders(self, repository): | |
| """Get leaders from leaders.md file on GitHub.""" | |
| content = get_repository_file_content( | |
| f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md" | |
| ) | |
| leaders = [] | |
| try: | |
| lines = content.split("\n") | |
| logger.debug("Content: %s", content) | |
| for line in lines: | |
| logger.debug("Processing line: %s", line) | |
| match = re.findall(r"\* \[([^\]]+)\]", line) | |
| leaders.extend(match) | |
| except AttributeError: | |
| logger.exception( | |
| "Unable to parse leaders.md content", extra={"repository": repository.name} | |
| ) | |
| return leaders | |
| def get_leaders(self, repository): | |
| """Get leaders from leaders.md file on GitHub.""" | |
| try: | |
| content = get_repository_file_content( | |
| f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md" | |
| ) | |
| except (requests.exceptions.RequestException, ValueError) as e: | |
| logger.exception( | |
| "Failed to fetch leaders.md file", | |
| extra={"repository": repository.name, "error": str(e)} | |
| ) | |
| return [] | |
| leaders = [] | |
| try: | |
| lines = content.split("\n") | |
| logger.debug("Content: %s", content) | |
| for line in lines: | |
| logger.debug("Processing line: %s", line) | |
| match = re.findall(r"\* \[([^\]]+)\]", line) | |
| leaders.extend(match) | |
| except AttributeError: | |
| logger.exception( | |
| "Unable to parse leaders.md content", extra={"repository": repository.name} | |
| ) | |
| return leaders |
backend/apps/owasp/scraper.py
Outdated
| try: | ||
| lines = content.split("\n") | ||
| logger.debug("Content: %s", content) | ||
| for line in lines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Check for empty content before processing.
There's no check for empty content before attempting to split and process it, which could lead to errors if the file is empty or not found.
Add a check for empty content:
leaders = []
try:
+ if not content:
+ logger.warning("Empty leaders.md content", extra={"repository": repository.name})
+ return leaders
lines = content.split("\n")
logger.debug("Content: %s", content)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| lines = content.split("\n") | |
| logger.debug("Content: %s", content) | |
| for line in lines: | |
| leaders = [] | |
| try: | |
| if not content: | |
| logger.warning("Empty leaders.md content", extra={"repository": repository.name}) | |
| return leaders | |
| lines = content.split("\n") | |
| logger.debug("Content: %s", content) | |
| for line in lines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move the logic to RepositoryBasedEntityMode::get_leaders
upd: also some tests would be great 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/apps/owasp/models/common.py (1)
152-183: Consider optimizing the logging verbosity and pattern matching.The implementation of
get_leaderslooks solid overall. It properly handles error cases and extracts leader names from the markdown file.Few suggestions for improvement:
- The logging is quite verbose, especially for debugging. Consider using more targeted logging:
- logger.debug("Content length: %d characters", len(content)) - small_size = 500 - if len(content) < small_size: # Only log full content if it's reasonably small - logger.debug("Content: %s", content) - for line in lines: - logger.debug("Processing line: %s", line) + logger.debug("Processing leaders.md with %d characters", len(content))
- The regex pattern could be improved to be more robust against different markdown variations:
- match = re.findall(r"\*\s*\[([^\]]+)\](?:\([^)]*\))?", line) + match = re.findall(r"(?:[\*\-+]|\d+\.)\s*\[([^\]]+)\](?:\([^)]*\))?", line)This updated pattern would match Markdown list items starting with
*,-,+, or numbered lists like1..
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
backend/apps/owasp/management/commands/owasp_scrape_chapters.py(1 hunks)backend/apps/owasp/management/commands/owasp_scrape_committees.py(1 hunks)backend/apps/owasp/management/commands/owasp_scrape_projects.py(1 hunks)backend/apps/owasp/models/common.py(2 hunks)backend/apps/owasp/scraper.py(0 hunks)backend/tests/owasp/management/commands/owasp_scrape_chapters_test.py(0 hunks)backend/tests/owasp/management/commands/owasp_scrape_committees_test.py(0 hunks)backend/tests/owasp/management/commands/owasp_scrape_projects_test.py(0 hunks)backend/tests/owasp/models/chapter_test.py(1 hunks)backend/tests/owasp/models/committee_test.py(1 hunks)backend/tests/owasp/models/project_test.py(1 hunks)backend/tests/owasp/scraper_test.py(0 hunks)
💤 Files with no reviewable changes (5)
- backend/apps/owasp/scraper.py
- backend/tests/owasp/management/commands/owasp_scrape_committees_test.py
- backend/tests/owasp/management/commands/owasp_scrape_projects_test.py
- backend/tests/owasp/scraper_test.py
- backend/tests/owasp/management/commands/owasp_scrape_chapters_test.py
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/apps/owasp/management/commands/owasp_scrape_projects.py
- backend/apps/owasp/management/commands/owasp_scrape_chapters.py
- backend/apps/owasp/management/commands/owasp_scrape_committees.py
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: CodeQL (python)
- GitHub Check: Run backend tests
- GitHub Check: CodeQL (javascript-typescript)
- GitHub Check: Run frontend e2e tests
- GitHub Check: Run frontend unit tests
🔇 Additional comments (4)
backend/tests/owasp/models/committee_test.py (1)
101-116: Good test implementation for the new get_leaders method.The test correctly mocks the
get_repository_file_contentfunction and verifies that theget_leadersmethod properly extracts leader names from the markdown format. This aligns with the PR's objective of changing the logic for fetching leader data.backend/tests/owasp/models/chapter_test.py (1)
197-212: Good test implementation with consistent approach.The test properly validates the extraction of leader names from the markdown format, consistent with the same approach used in the committee_test.py file.
backend/tests/owasp/models/project_test.py (1)
132-147: Good test implementation maintaining consistency across model tests.This test follows the same pattern as the other model tests, ensuring that the
get_leadersmethod correctly extracts leader names from markdown content. The consistency across all three test files demonstrates a systematic approach to testing this new functionality.backend/apps/owasp/models/common.py (1)
7-7: Import addition matches the functionality.Adding the
requestsimport is appropriate for the new functionality that fetches leader data from GitHub and needs to handle request exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/apps/owasp/models/common.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: Run backend tests
- GitHub Check: CodeQL (python)
- GitHub Check: CodeQL (javascript-typescript)
- GitHub Check: Run frontend unit tests
- GitHub Check: Run frontend e2e tests
🔇 Additional comments (1)
backend/apps/owasp/models/common.py (1)
7-7: LGTM: Required import additionThe
requestsimport is necessary for catching therequests.exceptions.RequestExceptionin the newget_leadersmethod.
arkid15r
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I didn't run it yet. Let's take care of the structural inconsistency first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all entities it's no longer a part of scraping process. It's now within GitHub processing part. We need to structure it properly.
|
|
||
| assert scraper.page_tree is None | ||
|
|
||
| def test_get_leaders_no_leaders(self, mock_session): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this test needs to be refactored too based on the new logic and the code location.
backend/apps/owasp/models/common.py
Outdated
| .order_by("-total_contributions")[:TOP_CONTRIBUTORS_LIMIT] | ||
| ] | ||
|
|
||
| def get_leaders(self, repository): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests for the new method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/apps/github/graphql/nodes/release.py (1)
16-16: Add 'url' to the fields tuple in Meta classThe 'url' field is added to the ReleaseNode class, but it's not included in the fields tuple in the Meta class (lines 20-26). For consistency and explicit field declaration, it should be added there as well.
class Meta: model = Release fields = ( "author", "is_pre_release", "name", "published_at", "tag_name", + "url", )frontend/src/components/UserCard.tsx (1)
30-32: Consider enhancing the display logic for user detailsCurrently, the implementation uses logical OR (
||) which means ifcompanyexists, it will never displaylocationor- <p className="mt-1 max-w-[250px] truncate text-sm text-gray-600 dark:text-gray-400 sm:text-base"> - {company || location || email} - </p> + <p className="mt-1 max-w-[250px] text-sm text-gray-600 dark:text-gray-400 sm:text-base"> + {[company, location, email].filter(Boolean).join(' • ')} + </p>Or if you prefer to show just one piece of information but want to prioritize differently:
- <p className="mt-1 max-w-[250px] truncate text-sm text-gray-600 dark:text-gray-400 sm:text-base"> - {company || location || email} - </p> + <p className="mt-1 max-w-[250px] truncate text-sm text-gray-600 dark:text-gray-400 sm:text-base"> + {email || location || company} + </p>
🛑 Comments failed to post (1)
backend/apps/github/models/release.py (1)
53-57:
⚠️ Potential issueAdd null check for repository to prevent NullPointerException
The
urlproperty assumes thatself.repositoryis notNone, but therepositoryfield is defined withnull=True(lines 36-42), which means it could be null. Add a null check to prevent potential null pointer dereference.@property def url(self): """Return release URL.""" + if not self.repository: + return None return f"{self.repository.url}/releases/tag/{self.tag_name}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.@property def url(self): """Return release URL.""" if not self.repository: return None return f"{self.repository.url}/releases/tag/{self.tag_name}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/tests/owasp/models/common_test.py (1)
15-35: Well-structured test for the new leader extraction functionality.The parameterized test provides good coverage for the
get_leadersmethod, testing single leader, multiple leaders, and empty content scenarios. This ensures the reliability of the new leader data extraction approach.You might consider adding one more test case for unusual formatting, such as:
("* [Leader with (parentheses)](https://example.com)", ["Leader with (parentheses)"])to ensure the extraction is robust against different leader name formats.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
backend/apps/owasp/management/commands/owasp_scrape_chapters.py(0 hunks)backend/apps/owasp/management/commands/owasp_scrape_committees.py(0 hunks)backend/apps/owasp/management/commands/owasp_scrape_projects.py(0 hunks)backend/apps/owasp/models/chapter.py(1 hunks)backend/apps/owasp/models/committee.py(1 hunks)backend/apps/owasp/models/common.py(4 hunks)backend/apps/owasp/models/project.py(1 hunks)backend/tests/owasp/models/chapter_test.py(2 hunks)backend/tests/owasp/models/committee_test.py(2 hunks)backend/tests/owasp/models/common_test.py(1 hunks)backend/tests/owasp/models/project_test.py(2 hunks)
💤 Files with no reviewable changes (3)
- backend/apps/owasp/management/commands/owasp_scrape_chapters.py
- backend/apps/owasp/management/commands/owasp_scrape_committees.py
- backend/apps/owasp/management/commands/owasp_scrape_projects.py
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/tests/owasp/models/project_test.py
- backend/tests/owasp/models/chapter_test.py
- backend/tests/owasp/models/committee_test.py
🔇 Additional comments (11)
backend/apps/owasp/models/chapter.py (1)
81-81: Looks good! Consistent mapping for leader data.Adding the
"leaders_raw": "leaders"mapping ensures that leader data is properly extracted from the GitHub repository files. This approach aligns with the PR objective of changing how leader data is fetched.backend/apps/owasp/models/project.py (1)
190-190: Looks good! Consistent mapping for leader data.The
"leaders_raw": "leaders"mapping follows the same pattern implemented in the Chapter model, ensuring a consistent approach to leader data extraction across different entity types.backend/apps/owasp/models/committee.py (1)
39-39: Looks good! Consistent mapping for leader data.The addition of
"leaders_raw": "leaders"completes the implementation across all three entity models (Chapter, Project, and Committee), providing a unified approach to leader data handling.backend/apps/owasp/models/common.py (8)
7-7: New import request addedGood addition of the
requestslibrary which is necessary for exception handling in the newget_leadersmethod.
99-99: Shifted to new leaders fetching approachThis change correctly integrates the new method of fetching leaders from the
leaders.mdfile instead of relying on the previous approach. This is well-aligned with the PR objective of changing the logic for fetching leader's data.
131-139: Well-structured URL builder methodThe
get_leaders_md_raw_urlmethod follows the same pattern as the existingget_index_md_raw_urlmethod, which maintains code consistency. It correctly handles the optional repository parameter and returns the appropriate URL or None.
161-173: Repository parameter should be optional for consistencyThe
get_leadersmethod should have an optional repository parameter to maintain consistency withget_leaders_md_raw_urland other similar methods.-def get_leaders(self, repository): +def get_leaders(self, repository=None): """Get leaders from leaders.md file on GitHub.""" + owasp_repository = repository or self.owasp_repository + if not owasp_repository: + return [] try: content = get_repository_file_content( - self.get_leaders_md_raw_url(repository=repository) + self.get_leaders_md_raw_url(repository=owasp_repository) ) except (requests.exceptions.RequestException, ValueError) as e: logger.exception( "Failed to fetch leaders.md file", - extra={"repository": repository.name, "error": str(e)}, + extra={"repository": owasp_repository.name if owasp_repository else None, "error": str(e)}, ) return []
175-187: Reduce debug logging verbosityThe current implementation logs every line being processed, which could lead to excessive log output. Consider reducing the verbosity or adding a conditional flag for this detailed logging.
if len(content) < small_size: # Only log full content if it's reasonably small logger.debug("Content: %s", content) for line in lines: - logger.debug("Processing line: %s", line) + # Only log lines that match our pattern or conditionally enable verbose logging + if re.search(r"\*\s*\[", line): + logger.debug("Processing leader line: %s", line) # Match both standard Markdown list items with links and variations match = re.findall(r"\*\s*\[([^\]]+)\](?:\([^)]*\))?", line) leaders.extend(match)
186-186: Consider enhancing regex pattern robustnessThe current regex pattern (
\*\s*\[([^\]]+)\](?:\([^)]*\))?) handles standard Markdown list items with links, but might miss other variations. Consider enhancing it to handle more Markdown formatting variations.- match = re.findall(r"\*\s*\[([^\]]+)\](?:\([^)]*\))?", line) + # Enhanced pattern to handle more list styles (*, -, +) and optional link formatting + match = re.findall(r"[*\-+]\s*\[([^\]]+)\](?:\([^)]*\))?|\b(?:Leader|Leaders?)\s*:\s*([^,]+)(?:,|$)", line) + # Extract matched groups and clean up + for m in match: + if isinstance(m, tuple): + leaders.extend([name.strip() for name in m if name.strip()]) + else: + leaders.append(m.strip())
175-192: Add tests for the new methodThere's a request to add tests for this new method in the previous review. Please ensure you have test coverage for both successful and error scenarios of the
get_leadersmethod.Run the following script to check if tests have been added for the new method:
#!/bin/bash # Check if tests for get_leaders method exist echo "Searching for tests for get_leaders method in the test files..." rg -i "test.*get_leaders" --type py
163-167: Improve 404 error handlingThe current implementation relies on
get_repository_file_contentwhich doesn't explicitly check for HTTP status codes. Consider improving error handling for 404 (file not found) and other specific HTTP errors.Since you're already catching exceptions, you could add more specific information about 404 errors in your logging. If possible, consider updating the
get_repository_file_contentfunction to explicitly handle HTTP status codes or return a more specific error for 404s.
|
arkid15r
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Changed the logic of fetching leader's data * Moved logic to RepositoryBasedEntityModel * Moved leader's data logic * Update code --------- Co-authored-by: Kate Golovanova <[email protected]> Co-authored-by: Arkadii Yakovets <[email protected]> Co-authored-by: Arkadii Yakovets <[email protected]>



Fixes: #972