Skip to content

Conversation

@samyak003
Copy link
Contributor

Fixes: #972

@samyak003 samyak003 requested a review from arkid15r as a code owner March 3, 2025 20:34
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 3, 2025

Summary by CodeRabbit

  • New Features

    • The Home page now highlights leader information for chapters and projects. A dedicated icon and refreshed details (such as formatted dates and suggested locations) make it easier to identify key contributors.
  • Refactor

    • Improved and centralized text formatting enhances the overall presentation and consistency of leader data across the application.

Walkthrough

This pull request removes direct assignments to the leaders_raw attribute from several command modules and eliminates the get_leaders method from the OwaspScraper class. In its place, leader data is now processed via a new implementation added to the RepositoryBasedEntityModel in the models layer, with enhanced error handling and logging. Corresponding tests have been updated or removed. On the frontend side, mock data, GraphQL queries, and UI components have been modified to include and display leader information. Additionally, npm commands have been replaced with pnpm commands in the Makefile.

Changes

File(s) Change Summary
backend/apps/owasp/management/commands/owasp_scrape_chapters.py
backend/apps/owasp/management/commands/owasp_scrape_committees.py
backend/apps/owasp/management/commands/owasp_scrape_projects.py
Removed the line(s) assigning leaders_raw (with a variable renaming in committees) in command modules.
backend/apps/owasp/scraper.py Removed the get_leaders method, which parsed HTML for leader names.
backend/tests/owasp/scraper_test.py Deleted the test method test_get_leaders_no_leaders.
backend/tests/owasp/management/commands/owasp_scrape_chapters_test.py
backend/tests/owasp/management/commands/owasp_scrape_committees_test.py
backend/tests/owasp/management/commands/owasp_scrape_projects_test.py
Removed mocking and assertions related to the leaders_raw attribute.
backend/apps/owasp/models/common.py Added a new JSON field leaders_raw, a property leaders_md_raw_url, and a new get_leaders method with error handling to fetch and parse leader data from a Markdown file.
backend/tests/owasp/models/chapter_test.py
backend/tests/owasp/models/committee_test.py
backend/tests/owasp/models/project_test.py
Updated tests to add repository_mock.leaders and repository_mock.owner attributes.
backend/tests/owasp/models/common_test.py Added a parameterized test for the new get_leaders functionality in RepositoryBasedEntityModel.
frontend/Makefile Updated test commands from npm run ... to pnpm run ... for both end-to-end and unit tests.
frontend/__tests__/e2e/data/mockHomeData.ts
frontend/__tests__/unit/data/mockHomeData.ts
Revised mock data: added leaders, updated timestamps, and removed unused properties in projects and chapters.
frontend/__tests__/e2e/pages/Home.spec.ts Updated expected text content for the chapters and projects sections to reflect leader names and new dates.
frontend/src/api/queries/homeQueries.ts Modified the GET_MAIN_PAGE_DATA query to include a leaders field and adjusted field ordering for recentProjects and recentChapters.
frontend/src/components/CardDetailsPage.tsx
frontend/src/pages/Home.tsx
Introduced the use of a centralized capitalize function and updated the UI to display leader information with a FontAwesome icon.
frontend/src/types/chapter.ts
frontend/src/types/home.ts
Reordered properties in the chapter type and updated the MainPageData type by removing region/topContributors and adding a leaders array.

Possibly related PRs

  • Improved the "/sponsors" command for slack  #630: In this PR, a new Sponsor model is introduced and sponsor data processing is enhanced, which is closely related to the removal of the leaders_raw attribute in this pull request, indicating a coordinated change in how leadership-related data is handled.

Suggested reviewers

  • kasya

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)
  • We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
    - To enable this feature, set early_access to true under in the settings.

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8b47a0 and 4dc1062.

📒 Files selected for processing (17)
  • backend/apps/owasp/management/commands/owasp_scrape_chapters.py (0 hunks)
  • backend/apps/owasp/management/commands/owasp_scrape_committees.py (2 hunks)
  • backend/apps/owasp/management/commands/owasp_scrape_projects.py (0 hunks)
  • backend/apps/owasp/models/common.py (5 hunks)
  • backend/tests/owasp/models/chapter_test.py (2 hunks)
  • backend/tests/owasp/models/committee_test.py (2 hunks)
  • backend/tests/owasp/models/common_test.py (1 hunks)
  • backend/tests/owasp/models/project_test.py (2 hunks)
  • frontend/Makefile (1 hunks)
  • frontend/__tests__/e2e/data/mockHomeData.ts (1 hunks)
  • frontend/__tests__/e2e/pages/Home.spec.ts (1 hunks)
  • frontend/__tests__/unit/data/mockHomeData.ts (3 hunks)
  • frontend/src/api/queries/homeQueries.ts (1 hunks)
  • frontend/src/components/CardDetailsPage.tsx (4 hunks)
  • frontend/src/pages/Home.tsx (4 hunks)
  • frontend/src/types/chapter.ts (1 hunks)
  • frontend/src/types/home.ts (1 hunks)
💤 Files with no reviewable changes (2)
  • backend/apps/owasp/management/commands/owasp_scrape_chapters.py
  • backend/apps/owasp/management/commands/owasp_scrape_projects.py
✅ Files skipped from review due to trivial changes (1)
  • frontend/src/types/chapter.ts
🚧 Files skipped from review as they are similar to previous changes (5)
  • backend/apps/owasp/management/commands/owasp_scrape_committees.py
  • backend/tests/owasp/models/project_test.py
  • backend/tests/owasp/models/chapter_test.py
  • backend/tests/owasp/models/common_test.py
  • backend/tests/owasp/models/committee_test.py
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: CodeQL (python)
  • GitHub Check: CodeQL (javascript-typescript)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run backend tests
🔇 Additional comments (29)
backend/apps/owasp/models/common.py (8)

44-46: New field for storing leaders data

The addition of the leaders_raw JSONField is well-structured with appropriate field options. This supports the PR objective of changing the leader data fetching logic by providing a dedicated storage field.


69-77: Good implementation of leaders_md_raw_url property

This property effectively constructs the raw URL to the leaders.md file, with proper handling for the case when owasp_repository is None. This is a clean implementation that supports the new leader data fetching approach.


96-97: Streamlined data population in from_github method

The update to the from_github method now populates leaders data from the GitHub repository, which aligns with the PR objective to change how leader data is fetched. This is an appropriate place to fetch this data during entity initialization.


146-172: Robust implementation with some improvement opportunities

The get_leaders method effectively retrieves and parses leader information from a leaders.md file with proper error handling and logging.

A few suggestions for improvement:

  1. The debug logging for each line (line 164) is very verbose and could impact performance for large files. Consider reducing it or making it conditional.

  2. The regex handles standard Markdown list items with links but might miss other variations like numbered lists (1. [Name](link)). Consider enhancing it if needed.

-            for line in content.split("\n"):
-                logger.debug("Processing line: %s", line)
-                # Match both standard Markdown list items with links and variations.
-                leaders.extend(re.findall(r"\*\s*\[([^\]]+)\](?:\([^)]*\))?", line))
+            logger.debug("Processing leaders.md content with %d lines", content.count("\n") + 1)
+            for line in content.split("\n"):
+                # Match both standard Markdown list items with links and variations.
+                # This regex matches both bullet points (*) and numbered lists (1.)
+                leaders.extend(re.findall(r"(?:\*|\d+\.)\s*\[([^\]]+)\](?:\([^)]*\))?", line))
+            logger.debug("Extracted %d leaders from content", len(leaders))

146-172: Add unit tests for the new method

Please add unit tests for the get_leaders method to ensure it correctly handles various scenarios:

  1. Successfully parsing leaders from a valid file
  2. Handling empty or malformed content
  3. Handling exceptions when fetching the file

This will help maintain the reliability of this feature as the codebase evolves.

#!/bin/bash
# Check if tests exist for the get_leaders method

echo "Searching for tests that might cover the get_leaders method..."
rg -i "test.*get_leaders" --type py

echo "Checking for potential test files for RepositoryBasedEntityModel..."
fd ".*test.*" --type f --extension py | grep -i "repository.*model\|common"

150-157: Good error handling for network requests

The implementation properly catches both RequestException and ValueError with appropriate logging, which is essential for robust operation when fetching remote content. The logged information includes both the URL and the specific error message, which will be valuable for debugging.


159-161: Proper empty content check

The code correctly handles the case when the content is empty, returning an empty list. This prevents potential issues from trying to process empty content and maintains consistent behavior.


172-172: Consistent ordering of results

Sorting the leaders list before returning it ensures consistent ordering regardless of how the data appears in the source file. This is good practice for stability in the API.

frontend/Makefile (2)

26-26: Good move to standardize on pnpm!

Updating the test-frontend-e2e command to use pnpm instead of npm ensures consistency with the rest of the Makefile, which already uses pnpm for other commands. This standardization improves maintainability and avoids potential issues that might arise from using different package managers.


30-30: Consistent package manager standardization.

Successfully updated the test-frontend-unit command to use pnpm, maintaining consistency with the other commands in the Makefile and completing the standardization to a single package manager.

frontend/__tests__/e2e/pages/Home.spec.ts (2)

35-36: Updated test to verify project leader information.

Similar to the chapter test update, this test has been properly updated to verify the project leader's name and creation date display. The test changes maintain consistency and ensure that the UI displays the expected leader information.

Same note about the comma after "Project Leader1," - verify that this matches the intended UI display format.


27-28:

✅ Verification successful

Updated test to verify leader information display.

The test has been appropriately updated to verify that leader information is now displayed on the home page. The test now checks for both the leader name and the date, which aligns with the PR's purpose of changing how leader data is fetched and displayed.

Note: The test expects "Chapter Leader1," with a comma, suggesting that the UI displays leader names with trailing commas. Ensure this matches the intended UI display format.


🏁 Script executed:

#!/bin/bash
# Check if the UI component in Home page displays leader names with commas
rg -A 5 -B 5 "leaders.+map|leaders.+join" frontend/src/pages/Home.tsx

Length of output: 1022


Leader Information Display Verified

The UI code now correctly formats leader names by joining them with a comma and a space (join(', ')), as seen in frontend/src/pages/Home.tsx. This confirms that when multiple leader names are displayed, the expected substring (e.g., "Chapter Leader1,") will be present. Please verify that this behavior is intended for scenarios with single-leader arrays as well, since standard join won’t append a trailing comma if there’s only one item.

  • Verified the Home page displays leader names using chapter.leaders.join(', ').
  • Confirmed that the substring check in the test (Chapter Leader1,) will match when there are multiple leaders.
  • The date visibility check (Feb 20, 2025) remains unaffected.
frontend/__tests__/unit/data/mockHomeData.ts (3)

6-8: Added leader information to project mock data.

The mock data has been updated to include leader information, which aligns with the PR's purpose of changing how leader data is fetched and displayed. The structure looks good and provides appropriate test data for the updated UI components.


19-22: Added leader information to chapter mock data.

The chapter mock data has been appropriately updated to include leader information, consistent with the project data updates. The addition of createdAt and key fields also ensures the mock data structure is complete for testing purposes.


68-70: Added additional fields to event mock data.

The addition of summary and suggestedLocation fields to the event mock data enhances the completeness of the test data. While not directly related to the leader data changes, these additions ensure that the mock data better mirrors the actual API response structure.

frontend/src/components/CardDetailsPage.tsx (4)

4-4: Good use of utility function for capitalization.

Adding a dedicated utility function for capitalization is a good practice for code maintainability. This centralizes the capitalization logic, making it easier to update if needed and ensuring consistency across the application.


35-35: Applied capitalize utility function.

Good implementation of the imported capitalize utility function to standardize text formatting for the title. This improves code readability and maintainability by using a centralized utility rather than inline string manipulation.


45-46: Applied capitalize utility function for type.

Consistently using the capitalize utility function for the type text as well. This ensures a uniform approach to string capitalization throughout the component.


71-73: Improved style organization.

The style properties have been nicely reorganized for better readability. The changes to borderRadius and boxShadow maintain the same visual appearance while improving code structure.

frontend/src/types/home.ts (2)

13-14: Type definition change looks good

The addition of leaders: string[] property to the recentChapters type is a clean implementation that improves the data structure. This change aligns well with the PR objective of changing the logic for fetching leader's data.


20-20: Type definition enhancement approved

Adding the leaders: string[] property to recentProjects creates consistency between chapter and project data structures, making the codebase more maintainable.

frontend/src/api/queries/homeQueries.ts (2)

8-9: GraphQL query fields properly updated

The addition of the leaders field and reordering of fields in the recentProjects query is appropriately implemented to match the updated type definition.

Also applies to: 12-12


17-19: GraphQL query structure properly modified

The updated fields in the recentChapters query correctly fetch the necessary data including the new leaders field, which aligns with the PR objective of changing leader data fetching logic.

frontend/__tests__/e2e/data/mockHomeData.ts (2)

5-7: Mock project data properly updated with leaders

The mock data for recentProjects has been correctly updated to include the new leaders property, with appropriate test values for each project. This ensures consistent test coverage for the new data structure.

Also applies to: 12-14, 19-21, 26-28, 35-37


44-46: Mock chapter data properly updated with leaders

The mock data for recentChapters has been correctly updated to include the new leaders property and other required fields. The test data now properly reflects the updated data structure from the GraphQL query.

Also applies to: 51-53, 58-60, 65-67, 72-74

frontend/src/pages/Home.tsx (4)

10-10: Appropriate imports added

The imports of the faUsers icon and the capitalize utility function are necessary and appropriate for the new feature implementation.

Also applies to: 21-21


193-198: Well-implemented UI for chapter leaders

Good implementation of conditional rendering for chapter leaders. The code checks if leaders exist before displaying them and uses a consistent UI pattern with other metadata. The FontAwesome icon provides a clear visual indication of the leaders information.


222-222: Good use of utility function

Replacing direct string manipulation with the capitalize utility function improves code consistency and maintainability.


225-230: Well-implemented UI for project leaders

The conditional rendering for project leaders matches the pattern used for chapter leaders, maintaining UI consistency throughout the application. The implementation correctly handles cases where there might be no leaders.

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
backend/apps/owasp/scraper.py (2)

70-73: Consider more robust regex pattern and result handling.

The current regex pattern works for simple cases but may miss leaders with special characters or alternate Markdown link formats.

Consider a more robust implementation:

        for line in lines:
            logger.debug("Processing line: %s", line)
-           match = re.findall(r"\* \[([^\]]+)\]", line)
+           # Match both standard Markdown list items with links and variations
+           match = re.findall(r"\*\s*\[([^\]]+)\](?:\([^)]*\))?", line)
            leaders.extend(match)

This pattern also handles optional spaces after the asterisk and optional link URLs in parentheses after the name.


69-69: Consider limiting debug logging for large content.

Logging the entire content may flood logs if the file is large.

Consider limiting the logged content:

-           logger.debug("Content: %s", content)
+           logger.debug("Content length: %d characters", len(content))
+           if len(content) < 500:  # Only log full content if it's reasonably small
+               logger.debug("Content: %s", content)
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c55e7ee and 198f9e3.

📒 Files selected for processing (5)
  • backend/apps/owasp/management/commands/owasp_scrape_chapters.py (1 hunks)
  • backend/apps/owasp/management/commands/owasp_scrape_committees.py (1 hunks)
  • backend/apps/owasp/management/commands/owasp_scrape_projects.py (1 hunks)
  • backend/apps/owasp/scraper.py (2 hunks)
  • backend/tests/owasp/scraper_test.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: CodeQL (python)
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (7)
backend/apps/owasp/management/commands/owasp_scrape_chapters.py (1)

66-67:

✅ Verification successful

Updated leader fetching to use repository context - good approach.

The change correctly implements the new pattern of passing the repository context to the get_leaders method. This is consistent with the changes in other scraper commands and ensures that leader data is retrieved within the proper repository context.


🏁 Script executed:

#!/bin/bash
# Check if all scraper commands consistently use the repository parameter

# Check if other files implement the same pattern for fetching leaders
echo "Checking scraper command implementations for consistency:"
rg -l "\.owasp_repository" --type py backend/apps/owasp/management/commands/
rg "get_leaders\(.*repository" --type py backend/apps/owasp/management/commands/

Length of output: 941


Consistency Verified: Repository Parameter Use in Leader Fetching

The updated implementation in owasp_scrape_chapters.py is consistent with similar patterns in other scraper commands. All files now correctly pass the repository context to get_leaders, ensuring uniform behavior.

  • File: backend/apps/owasp/management/commands/owasp_scrape_chapters.py (Lines 66-67)
  • Verified in: owasp_scrape_committees.py & owasp_scrape_projects.py
backend/tests/owasp/scraper_test.py (1)

117-122:

✅ Verification successful

Test updated correctly with mock repository.

The test has been properly updated to accommodate the signature change in get_leaders(). The mock repository includes all the essential properties (key, default_branch, and name) that the updated method likely depends on. This ensures that the test accurately validates the new implementation.


🏁 Script executed:

#!/bin/bash
# Check if the get_leaders implementation relies on these repository properties

# Look at the implementation of get_leaders to verify if it uses these properties
rg -A 10 "def get_leaders" --type py backend/apps/owasp/

Length of output: 865


Action: Approved – Test correctly reflects new implementation expectations.

The updated test in backend/tests/owasp/scraper_test.py uses a mock repository with properties key, default_branch, and name. Verification confirms that while get_leaders() (located in backend/apps/owasp/scraper.py) relies on repository.key and repository.default_branch (to construct the URL for fetching the leaders file), the inclusion of repository.name is extraneous but harmless. The test accurately validates the method's behavior by asserting an empty list is returned.

backend/apps/owasp/management/commands/owasp_scrape_projects.py (1)

78-79:

✅ Verification successful

Leader fetching updated to use repository context - ensures consistency.

The change aligns with the updated approach of fetching leaders with repository context, consistent with the implementations in other scraper commands. This provides a more contextual way to retrieve leader information.


🏁 Script executed:

#!/bin/bash
# Verify that owasp_repository is properly implemented in the Project model

# Check if the Project model has owasp_repository property/method implemented
echo "Checking Project model implementation:"
rg -A 5 "owasp_repository" --type py backend/apps/owasp/models/project.py

# Also check for any error handling around this property access
rg -C 3 "owasp_repository" --type py backend/apps/owasp/models/

Length of output: 6676


Confirmed: Leader fetching now utilizes the repository context consistently.

The change in backend/apps/owasp/management/commands/owasp_scrape_projects.py (lines 78-79) correctly fetches project.owasp_repository before retrieving the leaders. Verification shows that the owasp_repository property is properly implemented in the Project model—and similar patterns are applied in other models—ensuring a uniform approach across the codebase.

backend/apps/owasp/management/commands/owasp_scrape_committees.py (1)

66-67:

✅ Verification successful

Leader fetching updated to use repository context - maintains consistent approach.

The code now correctly retrieves the repository from the committee object before passing it to get_leaders(). This change is consistent with the implementation in other scraper commands and ensures a uniform approach to fetching leader data across different entity types.


🏁 Script executed:

#!/bin/bash
# Check if the Committee model properly implements owasp_repository
# and look for any implementation details of get_leaders in the scraper

# Check Committee model implementation
echo "Checking Committee model implementation:"
rg -A 5 "owasp_repository" --type py backend/apps/owasp/models/committee.py

# Look at how get_leaders uses the repository parameter
echo -e "\nChecking how get_leaders method uses the repository:"
rg -A 15 "def get_leaders" --type py backend/apps/owasp/scraper.py

Length of output: 1327


Consistent Repository Context for Leader Fetching

Verified that the committee’s owasp_repository is correctly set (see backend/apps/owasp/models/committee.py) and that scraper.get_leaders() properly uses the repository context (see backend/apps/owasp/scraper.py). The update in backend/apps/owasp/management/commands/owasp_scrape_committees.py (lines 66-67) now aligns perfectly with similar implementations across the codebase. No further action is needed.

backend/apps/owasp/scraper.py (3)

4-4: Appropriate imports for the new implementation.

Good job adding the necessary imports for the new leader fetching logic. The regex module for pattern matching and the GitHub utility function are well-placed.

Also applies to: 12-13


61-65: Method signature change with clear implementation.

Good refactoring to change from HTML parsing to a more structured approach using the GitHub-hosted Markdown file. The method signature change is appropriate as it now requires repository context.


63-65:

❓ Verification inconclusive

Verify GitHub raw content URL format.

The URL construction assumes a specific GitHub raw content URL format. Make sure this follows GitHub's stable URL scheme for accessing raw file content.


🏁 Script executed:

#!/bin/bash
# Check if the code already handles repositories with spaces or special characters in names
cd $(git rev-parse --show-toplevel)
url_pattern="https://raw.githubusercontent.com/OWASP/.*/.*/.*/leaders.md"
rg -l "https://raw.githubusercontent.com/OWASP/" | xargs rg -l "leaders.md" | xargs cat | rg "$url_pattern" -I --color=always

Length of output: 303


Review the GitHub Raw URL Construction

The URL built in the code:

content = get_repository_file_content(
    f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md"
)

correctly follows GitHub’s stable raw content URL scheme. However, the verification shell script’s regex pattern expects one more dynamic segment than actually exists (it uses three ".*" groups after /OWASP/ instead of the two provided by {repository.key} and {repository.default_branch}). This mismatch likely explains why the script produced no output.

Action Items:

  • Confirm that the URL format remains consistent with GitHub’s spec (i.e., https://raw.githubusercontent.com/OWASP/<repository_key>/<default_branch>/leaders.md).
  • If you intend to validate this URL using a regex, update the pattern (for example, to https://raw.githubusercontent.com/OWASP/[^/]+/[^/]+/leaders.md) so it accurately reflects the URL’s structure.

Comment on lines 61 to 78
def get_leaders(self, repository):
"""Get leaders from leaders.md file on GitHub."""
content = get_repository_file_content(
f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md"
)
leaders = []
try:
lines = content.split("\n")
logger.debug("Content: %s", content)
for line in lines:
logger.debug("Processing line: %s", line)
match = re.findall(r"\* \[([^\]]+)\]", line)
leaders.extend(match)
except AttributeError:
logger.exception(
"Unable to parse leaders.md content", extra={"repository": repository.name}
)
return leaders
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Additional error handling needed for network/GitHub API failures.

The current error handling only catches AttributeError from content parsing, but there's no handling for potential network issues or GitHub API failures that could occur in get_repository_file_content().

Consider adding error handling for network-related exceptions:

    def get_leaders(self, repository):
        """Get leaders from leaders.md file on GitHub."""
-       content = get_repository_file_content(
-           f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md"
-       )
+       try:
+           content = get_repository_file_content(
+               f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md"
+           )
+       except (requests.exceptions.RequestException, ValueError) as e:
+           logger.exception(
+               "Failed to fetch leaders.md file", 
+               extra={"repository": repository.name, "error": str(e)}
+           )
+           return []
        leaders = []
        try:
            lines = content.split("\n")
            logger.debug("Content: %s", content)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_leaders(self, repository):
"""Get leaders from leaders.md file on GitHub."""
content = get_repository_file_content(
f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md"
)
leaders = []
try:
lines = content.split("\n")
logger.debug("Content: %s", content)
for line in lines:
logger.debug("Processing line: %s", line)
match = re.findall(r"\* \[([^\]]+)\]", line)
leaders.extend(match)
except AttributeError:
logger.exception(
"Unable to parse leaders.md content", extra={"repository": repository.name}
)
return leaders
def get_leaders(self, repository):
"""Get leaders from leaders.md file on GitHub."""
try:
content = get_repository_file_content(
f"https://raw.githubusercontent.com/OWASP/{repository.key}/{repository.default_branch}/leaders.md"
)
except (requests.exceptions.RequestException, ValueError) as e:
logger.exception(
"Failed to fetch leaders.md file",
extra={"repository": repository.name, "error": str(e)}
)
return []
leaders = []
try:
lines = content.split("\n")
logger.debug("Content: %s", content)
for line in lines:
logger.debug("Processing line: %s", line)
match = re.findall(r"\* \[([^\]]+)\]", line)
leaders.extend(match)
except AttributeError:
logger.exception(
"Unable to parse leaders.md content", extra={"repository": repository.name}
)
return leaders

Comment on lines 67 to 70
try:
lines = content.split("\n")
logger.debug("Content: %s", content)
for line in lines:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Check for empty content before processing.

There's no check for empty content before attempting to split and process it, which could lead to errors if the file is empty or not found.

Add a check for empty content:

        leaders = []
        try:
+           if not content:
+               logger.warning("Empty leaders.md content", extra={"repository": repository.name})
+               return leaders
            lines = content.split("\n")
            logger.debug("Content: %s", content)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
lines = content.split("\n")
logger.debug("Content: %s", content)
for line in lines:
leaders = []
try:
if not content:
logger.warning("Empty leaders.md content", extra={"repository": repository.name})
return leaders
lines = content.split("\n")
logger.debug("Content: %s", content)
for line in lines:

Copy link
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move the logic to RepositoryBasedEntityMode::get_leaders

upd: also some tests would be great 👍

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/apps/owasp/models/common.py (1)

152-183: Consider optimizing the logging verbosity and pattern matching.

The implementation of get_leaders looks solid overall. It properly handles error cases and extracts leader names from the markdown file.

Few suggestions for improvement:

  1. The logging is quite verbose, especially for debugging. Consider using more targeted logging:
-            logger.debug("Content length: %d characters", len(content))
-            small_size = 500
-            if len(content) < small_size:  # Only log full content if it's reasonably small
-                logger.debug("Content: %s", content)
-            for line in lines:
-                logger.debug("Processing line: %s", line)
+            logger.debug("Processing leaders.md with %d characters", len(content))
  1. The regex pattern could be improved to be more robust against different markdown variations:
-                match = re.findall(r"\*\s*\[([^\]]+)\](?:\([^)]*\))?", line)
+                match = re.findall(r"(?:[\*\-+]|\d+\.)\s*\[([^\]]+)\](?:\([^)]*\))?", line)

This updated pattern would match Markdown list items starting with *, -, +, or numbered lists like 1..

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 198f9e3 and 584c1d1.

📒 Files selected for processing (12)
  • backend/apps/owasp/management/commands/owasp_scrape_chapters.py (1 hunks)
  • backend/apps/owasp/management/commands/owasp_scrape_committees.py (1 hunks)
  • backend/apps/owasp/management/commands/owasp_scrape_projects.py (1 hunks)
  • backend/apps/owasp/models/common.py (2 hunks)
  • backend/apps/owasp/scraper.py (0 hunks)
  • backend/tests/owasp/management/commands/owasp_scrape_chapters_test.py (0 hunks)
  • backend/tests/owasp/management/commands/owasp_scrape_committees_test.py (0 hunks)
  • backend/tests/owasp/management/commands/owasp_scrape_projects_test.py (0 hunks)
  • backend/tests/owasp/models/chapter_test.py (1 hunks)
  • backend/tests/owasp/models/committee_test.py (1 hunks)
  • backend/tests/owasp/models/project_test.py (1 hunks)
  • backend/tests/owasp/scraper_test.py (0 hunks)
💤 Files with no reviewable changes (5)
  • backend/apps/owasp/scraper.py
  • backend/tests/owasp/management/commands/owasp_scrape_committees_test.py
  • backend/tests/owasp/management/commands/owasp_scrape_projects_test.py
  • backend/tests/owasp/scraper_test.py
  • backend/tests/owasp/management/commands/owasp_scrape_chapters_test.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • backend/apps/owasp/management/commands/owasp_scrape_projects.py
  • backend/apps/owasp/management/commands/owasp_scrape_chapters.py
  • backend/apps/owasp/management/commands/owasp_scrape_committees.py
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: CodeQL (python)
  • GitHub Check: Run backend tests
  • GitHub Check: CodeQL (javascript-typescript)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run frontend unit tests
🔇 Additional comments (4)
backend/tests/owasp/models/committee_test.py (1)

101-116: Good test implementation for the new get_leaders method.

The test correctly mocks the get_repository_file_content function and verifies that the get_leaders method properly extracts leader names from the markdown format. This aligns with the PR's objective of changing the logic for fetching leader data.

backend/tests/owasp/models/chapter_test.py (1)

197-212: Good test implementation with consistent approach.

The test properly validates the extraction of leader names from the markdown format, consistent with the same approach used in the committee_test.py file.

backend/tests/owasp/models/project_test.py (1)

132-147: Good test implementation maintaining consistency across model tests.

This test follows the same pattern as the other model tests, ensuring that the get_leaders method correctly extracts leader names from markdown content. The consistency across all three test files demonstrates a systematic approach to testing this new functionality.

backend/apps/owasp/models/common.py (1)

7-7: Import addition matches the functionality.

Adding the requests import is appropriate for the new functionality that fetches leader data from GitHub and needs to handle request exceptions.

@samyak003 samyak003 requested a review from arkid15r March 5, 2025 21:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 584c1d1 and 3c0adbe.

📒 Files selected for processing (1)
  • backend/apps/owasp/models/common.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: Run backend tests
  • GitHub Check: CodeQL (python)
  • GitHub Check: CodeQL (javascript-typescript)
  • GitHub Check: Run frontend unit tests
  • GitHub Check: Run frontend e2e tests
🔇 Additional comments (1)
backend/apps/owasp/models/common.py (1)

7-7: LGTM: Required import addition

The requests import is necessary for catching the requests.exceptions.RequestException in the new get_leaders method.

Copy link
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I didn't run it yet. Let's take care of the structural inconsistency first.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all entities it's no longer a part of scraping process. It's now within GitHub processing part. We need to structure it properly.


assert scraper.page_tree is None

def test_get_leaders_no_leaders(self, mock_session):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this test needs to be refactored too based on the new logic and the code location.

.order_by("-total_contributions")[:TOP_CONTRIBUTORS_LIMIT]
]

def get_leaders(self, repository):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for the new method.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

🧹 Nitpick comments (2)
backend/apps/github/graphql/nodes/release.py (1)

16-16: Add 'url' to the fields tuple in Meta class

The 'url' field is added to the ReleaseNode class, but it's not included in the fields tuple in the Meta class (lines 20-26). For consistency and explicit field declaration, it should be added there as well.

class Meta:
    model = Release
    fields = (
        "author",
        "is_pre_release",
        "name",
        "published_at",
        "tag_name",
+       "url",
    )
frontend/src/components/UserCard.tsx (1)

30-32: Consider enhancing the display logic for user details

Currently, the implementation uses logical OR (||) which means if company exists, it will never display location or email, even if those might be more relevant to the user. Consider either concatenating these fields with appropriate separators or implementing a prioritization logic based on what information is most valuable to users.

-          <p className="mt-1 max-w-[250px] truncate text-sm text-gray-600 dark:text-gray-400 sm:text-base">
-            {company || location || email}
-          </p>
+          <p className="mt-1 max-w-[250px] text-sm text-gray-600 dark:text-gray-400 sm:text-base">
+            {[company, location, email].filter(Boolean).join(' • ')}
+          </p>

Or if you prefer to show just one piece of information but want to prioritize differently:

-          <p className="mt-1 max-w-[250px] truncate text-sm text-gray-600 dark:text-gray-400 sm:text-base">
-            {company || location || email}
-          </p>
+          <p className="mt-1 max-w-[250px] truncate text-sm text-gray-600 dark:text-gray-400 sm:text-base">
+            {email || location || company}
+          </p>
🛑 Comments failed to post (1)
backend/apps/github/models/release.py (1)

53-57: ⚠️ Potential issue

Add null check for repository to prevent NullPointerException

The url property assumes that self.repository is not None, but the repository field is defined with null=True (lines 36-42), which means it could be null. Add a null check to prevent potential null pointer dereference.

@property
def url(self):
    """Return release URL."""
+   if not self.repository:
+       return None
    return f"{self.repository.url}/releases/tag/{self.tag_name}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    @property
    def url(self):
        """Return release URL."""
        if not self.repository:
            return None
        return f"{self.repository.url}/releases/tag/{self.tag_name}"

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/tests/owasp/models/common_test.py (1)

15-35: Well-structured test for the new leader extraction functionality.

The parameterized test provides good coverage for the get_leaders method, testing single leader, multiple leaders, and empty content scenarios. This ensures the reliability of the new leader data extraction approach.

You might consider adding one more test case for unusual formatting, such as:

("* [Leader with (parentheses)](https://example.com)", ["Leader with (parentheses)"])

to ensure the extraction is robust against different leader name formats.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1f23fe5 and b8b47a0.

📒 Files selected for processing (11)
  • backend/apps/owasp/management/commands/owasp_scrape_chapters.py (0 hunks)
  • backend/apps/owasp/management/commands/owasp_scrape_committees.py (0 hunks)
  • backend/apps/owasp/management/commands/owasp_scrape_projects.py (0 hunks)
  • backend/apps/owasp/models/chapter.py (1 hunks)
  • backend/apps/owasp/models/committee.py (1 hunks)
  • backend/apps/owasp/models/common.py (4 hunks)
  • backend/apps/owasp/models/project.py (1 hunks)
  • backend/tests/owasp/models/chapter_test.py (2 hunks)
  • backend/tests/owasp/models/committee_test.py (2 hunks)
  • backend/tests/owasp/models/common_test.py (1 hunks)
  • backend/tests/owasp/models/project_test.py (2 hunks)
💤 Files with no reviewable changes (3)
  • backend/apps/owasp/management/commands/owasp_scrape_chapters.py
  • backend/apps/owasp/management/commands/owasp_scrape_committees.py
  • backend/apps/owasp/management/commands/owasp_scrape_projects.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • backend/tests/owasp/models/project_test.py
  • backend/tests/owasp/models/chapter_test.py
  • backend/tests/owasp/models/committee_test.py
🔇 Additional comments (11)
backend/apps/owasp/models/chapter.py (1)

81-81: Looks good! Consistent mapping for leader data.

Adding the "leaders_raw": "leaders" mapping ensures that leader data is properly extracted from the GitHub repository files. This approach aligns with the PR objective of changing how leader data is fetched.

backend/apps/owasp/models/project.py (1)

190-190: Looks good! Consistent mapping for leader data.

The "leaders_raw": "leaders" mapping follows the same pattern implemented in the Chapter model, ensuring a consistent approach to leader data extraction across different entity types.

backend/apps/owasp/models/committee.py (1)

39-39: Looks good! Consistent mapping for leader data.

The addition of "leaders_raw": "leaders" completes the implementation across all three entity models (Chapter, Project, and Committee), providing a unified approach to leader data handling.

backend/apps/owasp/models/common.py (8)

7-7: New import request added

Good addition of the requests library which is necessary for exception handling in the new get_leaders method.


99-99: Shifted to new leaders fetching approach

This change correctly integrates the new method of fetching leaders from the leaders.md file instead of relying on the previous approach. This is well-aligned with the PR objective of changing the logic for fetching leader's data.


131-139: Well-structured URL builder method

The get_leaders_md_raw_url method follows the same pattern as the existing get_index_md_raw_url method, which maintains code consistency. It correctly handles the optional repository parameter and returns the appropriate URL or None.


161-173: Repository parameter should be optional for consistency

The get_leaders method should have an optional repository parameter to maintain consistency with get_leaders_md_raw_url and other similar methods.

-def get_leaders(self, repository):
+def get_leaders(self, repository=None):
     """Get leaders from leaders.md file on GitHub."""
+    owasp_repository = repository or self.owasp_repository
+    if not owasp_repository:
+        return []
     try:
         content = get_repository_file_content(
-            self.get_leaders_md_raw_url(repository=repository)
+            self.get_leaders_md_raw_url(repository=owasp_repository)
         )
     except (requests.exceptions.RequestException, ValueError) as e:
         logger.exception(
             "Failed to fetch leaders.md file",
-            extra={"repository": repository.name, "error": str(e)},
+            extra={"repository": owasp_repository.name if owasp_repository else None, "error": str(e)},
         )
         return []

175-187: Reduce debug logging verbosity

The current implementation logs every line being processed, which could lead to excessive log output. Consider reducing the verbosity or adding a conditional flag for this detailed logging.

    if len(content) < small_size:  # Only log full content if it's reasonably small
        logger.debug("Content: %s", content)
    for line in lines:
-        logger.debug("Processing line: %s", line)
+        # Only log lines that match our pattern or conditionally enable verbose logging
+        if re.search(r"\*\s*\[", line):
+            logger.debug("Processing leader line: %s", line)
        # Match both standard Markdown list items with links and variations
        match = re.findall(r"\*\s*\[([^\]]+)\](?:\([^)]*\))?", line)
        leaders.extend(match)

186-186: Consider enhancing regex pattern robustness

The current regex pattern (\*\s*\[([^\]]+)\](?:\([^)]*\))?) handles standard Markdown list items with links, but might miss other variations. Consider enhancing it to handle more Markdown formatting variations.

-        match = re.findall(r"\*\s*\[([^\]]+)\](?:\([^)]*\))?", line)
+        # Enhanced pattern to handle more list styles (*, -, +) and optional link formatting
+        match = re.findall(r"[*\-+]\s*\[([^\]]+)\](?:\([^)]*\))?|\b(?:Leader|Leaders?)\s*:\s*([^,]+)(?:,|$)", line)
+        # Extract matched groups and clean up
+        for m in match:
+            if isinstance(m, tuple):
+                leaders.extend([name.strip() for name in m if name.strip()])
+            else:
+                leaders.append(m.strip())

175-192: Add tests for the new method

There's a request to add tests for this new method in the previous review. Please ensure you have test coverage for both successful and error scenarios of the get_leaders method.

Run the following script to check if tests have been added for the new method:

#!/bin/bash
# Check if tests for get_leaders method exist
echo "Searching for tests for get_leaders method in the test files..."
rg -i "test.*get_leaders" --type py

163-167: Improve 404 error handling

The current implementation relies on get_repository_file_content which doesn't explicitly check for HTTP status codes. Consider improving error handling for 404 (file not found) and other specific HTTP errors.

Since you're already catching exceptions, you could add more specific information about 404 errors in your logging. If possible, consider updating the get_repository_file_content function to explicitly handle HTTP status codes or return a more specific error for 404s.

@samyak003 samyak003 requested a review from arkid15r March 12, 2025 18:24
@sonarqubecloud
Copy link

Copy link
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arkid15r arkid15r added this pull request to the merge queue Mar 13, 2025
Merged via the queue into OWASP:main with commit 1e27b6a Mar 13, 2025
18 checks passed
shdwcodr pushed a commit to shdwcodr/Nest that referenced this pull request Jun 5, 2025
* Changed the logic of fetching leader's data

* Moved logic to RepositoryBasedEntityModel

* Moved leader's data logic

* Update code

---------

Co-authored-by: Kate Golovanova <[email protected]>
Co-authored-by: Arkadii Yakovets <[email protected]>
Co-authored-by: Arkadii Yakovets <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move leaders data fetch logic from scraping to github files parsing

3 participants