Skip to content

Conversation

@elzik
Copy link
Owner

@elzik elzik commented May 25, 2025

Summary by CodeRabbit

  • New Features

    • Reddit integration: fetch and present Reddit posts and subreddit content with automatic image selection and fallback branding.
    • HTTP client settings: configurable user-agent and request timeout; improved downloader behavior.
    • Content extractor strategy: composite extractor picks the appropriate handler for a URL.
  • Tests

    • Extensive new unit and integration tests covering Reddit flows, transformers, converters and extractor strategy.
  • Documentation

    • README updated with Reddit and HTTP client examples.
  • Chores

    • Added test data, updated .gitignore, and added analyzer tooling.

elzik added 30 commits May 25, 2025 16:21
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)

18-22: URI parsing optimization implemented.

The previous review comment about double parsing has been addressed. The current implementation parses each URI once using TryCreate with an out parameter, then filters and extracts hosts.

tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (1)

12-12: Class name vs filename mismatch.

This issue was previously flagged. If the filename is RedditPostContentExtractorIntegrationTests.cs, the class name should match.

🧹 Nitpick comments (2)
tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (1)

160-179: LGTM! Timeout handling is well-implemented.

The timeout mechanism using CancellationTokenSource with proper exception handling is solid. The descriptive TimeoutException message will aid debugging container startup issues.

Optionally, consider making the timeout configurable via an environment variable for flexibility in different CI/CD environments:

-    private const int ContainerStartTimeoutSeconds = 30;
+    private static readonly int ContainerStartTimeoutSeconds = 
+        int.TryParse(Environment.GetEnvironmentVariable("BREEF_CONTAINER_TIMEOUT_SECONDS"), out var timeout) 
+            ? timeout 
+            : 30;
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/IRawRedditPostClient.cs (1)

7-9: Consider externalizing the version string.

The User-Agent header contains a hardcoded version "1.0.0" which could become stale as the project evolves. Consider extracting this to a configuration option or deriving it from assembly metadata.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9673297 and 2a1c6d5.

📒 Files selected for processing (18)
  • README.md (3 hunks)
  • src/Elzik.Breef.Application/BreefGenerator.cs (0 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/IRedditPostClient.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/NewInSubreddit.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/IRawRedditPostClient.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/Wallabag/TokenResponse.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/Wallabag/WallabagEntry.cs (0 hunks)
  • src/Elzik.Breef.Infrastructure/Wallabag/WallabagEntryCreateRequest.cs (1 hunks)
  • tests/Elzik.Breef.Api.Tests.Functional/BreefTestsBase.cs (0 hunks)
  • tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (3 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/Wallabag/WallabagClientTests.cs (0 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/Wallabag/WallabagOptionsTests.cs (0 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawRedditPostTransformerTests.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/RedditOptionsTests.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (1 hunks)
💤 Files with no reviewable changes (5)
  • tests/Elzik.Breef.Api.Tests.Functional/BreefTestsBase.cs
  • src/Elzik.Breef.Application/BreefGenerator.cs
  • src/Elzik.Breef.Infrastructure/Wallabag/WallabagEntry.cs
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/Wallabag/WallabagOptionsTests.cs
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/Wallabag/WallabagClientTests.cs
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs
  • README.md
  • tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawRedditPostTransformerTests.cs
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/NewInSubreddit.cs
🧰 Additional context used
🧬 Code graph analysis (6)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (7)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (1)
  • RedditPostContentExtractor (8-74)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (1)
  • RawRedditPostTransformer (6-178)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPostClient.cs (1)
  • RedditPostClient (5-12)
tests/Elzik.Breef.Infrastructure.Tests.Integration/TestOutputFakeLogger.cs (1)
  • TestOutputFakeLogger (7-43)
src/Elzik.Breef.Infrastructure/HttpDownloaderOptions.cs (1)
  • HttpDownloaderOptions (5-11)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
  • RedditOptions (5-33)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1)
  • SubRedditContentExtractor (7-75)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (2)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
  • RedditOptions (5-33)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (5)
  • SubRedditContentExtractor (7-75)
  • Task (30-40)
  • Task (42-46)
  • Task (48-74)
  • CanHandle (12-28)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/RedditOptionsTests.cs (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
  • RedditOptions (5-33)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (3)
  • List (107-130)
  • List (132-163)
  • List (165-177)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/IRawRedditPostClient.cs (3)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/IRedditPostClient.cs (1)
  • Task (5-5)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPostClient.cs (1)
  • Task (7-11)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPost.cs (1)
  • RawRedditPost (5-7)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/IRedditPostClient.cs (2)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (1)
  • RedditPost (8-41)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPost.cs (1)
  • RedditPost (3-7)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test report
  • GitHub Check: build-ubuntu
  • GitHub Check: build-ubuntu
🔇 Additional comments (9)
src/Elzik.Breef.Infrastructure/Wallabag/TokenResponse.cs (1)

1-1: LGTM! Good cleanup.

Removing the unused Refit import is appropriate cleanup. The TokenResponse DTO remains correct.

src/Elzik.Breef.Infrastructure/Wallabag/WallabagEntryCreateRequest.cs (1)

1-1: LGTM! Good cleanup.

Removing the unused Refit import is appropriate cleanup. The WallabagEntryCreateRequest DTO remains correct.

tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (3)

11-11: LGTM! Timeout constant improves clarity.

Adding a named constant for the container startup timeout enhances code readability and maintainability.


83-83: LGTM! Wait strategy update is more precise.

Using UntilInternalTcpPortIsAvailable(8080) provides a clearer and more direct readiness check for the containerized service.


162-166: LGTM! Null checks are appropriate defensive programming.

The null checks for _testContainer are correct defensive programming for a nullable field. While the current logic ensures the container is initialized when !_dockerIsUnavailable, these checks satisfy null safety requirements and guard against future code modifications.

Also applies to: 186-189

src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/IRedditPostClient.cs (1)

1-7: LGTM!

Clean interface definition following async conventions and maintaining single responsibility.

tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (1)

32-207: LGTM!

Comprehensive integration test coverage with proper skip logic for GitHub Actions, varied URL format testing, and thorough validation of the Reddit post extraction pipeline.

tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/RedditOptionsTests.cs (1)

1-180: LGTM!

Well-structured unit tests with comprehensive coverage of RedditOptions behavior, including default values, configuration binding, domain extraction, and edge cases. Tests follow AAA pattern and use appropriate assertions.

tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (1)

1-509: LGTM!

Excellent test coverage with comprehensive validation of SubRedditContentExtractor behavior, including URL validation, image extraction with fallbacks, edge cases for invalid/inaccessible images, and proper exception handling. The mocking strategy is clean and the helper method improves test maintainability.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/Client/SubredditClientTests.cs (2)

44-44: Hard-coded base URL should use configuration.

Same issue as in the first test - the base URL is hard-coded. Apply the same refactor suggested for line 17.


12-12: Typo in method name: 'SubReddit' should be 'Subreddit'.

The method name contains inconsistent casing: GetNewInSubReddit_ValidSubreddit_ReturnsNewInSubreddit. The first instance uses "SubReddit" (capital R) while the second uses "Subreddit" (lowercase r). The correct casing should be "Subreddit" throughout to match C# naming conventions.

Apply this diff:

-    public async Task GetNewInSubReddit_ValidSubreddit_ReturnsNewInSubreddit()
+    public async Task GetNewInSubreddit_ValidSubreddit_ReturnsNewInSubreddit()
🧹 Nitpick comments (14)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (2)

62-111: Consider documenting brittleness of hardcoded URLs and expected image URLs.

These tests validate the image resolution fallback chain correctly. However, the hardcoded Reddit post URLs (lines 68, 85, 102) and exact image URL assertions (lines 76, 93, 110) are brittle and may break if:

  • Reddit changes image CDN URLs or query parameters
  • Posts or subreddit images are modified or deleted

This is acceptable for integration tests, but consider adding comments explaining why these specific posts were chosen and documenting that these tests may require updates if the external data changes.


141-169: Good error handling coverage.

The tests properly validate error scenarios for invalid URLs and non-existent posts.

Note: Line 168 asserts the exact Refit error message format, which could be brittle if Refit updates its error messages. Consider asserting only the status code or exception type if this becomes problematic in the future.

src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/IRawNewInSubredditTransformer.cs (1)

3-6: Async naming + cancellation on public API.

Rename to TransformAsync and accept CancellationToken to allow upstream cancellation. Non-breaking within this PR scope if propagated to implementer and call sites.

Apply:

-public interface IRawNewInSubredditTransformer
+public interface IRawNewInSubredditTransformer
 {
-    Task<NewInSubreddit> Transform(RawNewInSubreddit rawNewInSubreddit);
+    Task<NewInSubreddit> TransformAsync(RawNewInSubreddit rawNewInSubreddit, CancellationToken cancellationToken = default);
 }
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RawNewInSubredditTransformer.cs (1)

16-26: Harden transform: dedupe IDs, skip nulls, tolerate per‑item failures.

Avoid failing the whole batch if a single post fetch throws; also filter nulls and remove duplicates. Keeps order stable.

Apply:

-        var postIds = rawNewInSubreddit.Data.Children
+        var postIds = rawNewInSubreddit.Data.Children
             .Where(child => child.Data?.Id != null)
-            .Select(child => child.Data!.Id!)
-            .ToList();
+            .Select(child => child.Data!.Id!)
+            .Distinct()
+            .ToList();
 
-        var postTasks = postIds.Select(id => redditPostClient.GetPost(id));
-        var posts = await Task.WhenAll(postTasks);
+        var postTasks = postIds.Select(async id =>
+        {
+            try
+            {
+                return await redditPostClient.GetPost(id);
+            }
+            catch
+            {
+                return null;
+            }
+        });
+        var posts = (await Task.WhenAll(postTasks)).Where(p => p is not null)!;
 
-        newInSubreddit.Posts.AddRange(posts);
+        newInSubreddit.Posts.AddRange(posts!);

If you adopt CancellationToken on the interface, thread it through with redditPostClient support.

src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/ISubredditClient.cs (1)

3-6: Follow async naming and support cancellation.

Use GetNewInSubredditAsync and add CancellationToken.

Apply:

-public interface ISubredditClient
+public interface ISubredditClient
 {
-    Task<NewInSubreddit> GetNewInSubreddit(string subRedditName);
+    Task<NewInSubreddit> GetNewInSubredditAsync(string subredditName, CancellationToken cancellationToken = default);
 }
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/IRawSubredditClient.cs (1)

7-13: Expose query params and cancellation on Refit API.

Add raw_json=1 to avoid HTML escaping issues; surface limit for paging; include CancellationToken for resilience.

Apply:

 public interface IRawSubredditClient
 {
-    [Get("/r/{subRedditName}/new.json")]
+    [Get("/r/{subredditName}/new.json?raw_json=1&limit={limit}")]
     [Headers("User-Agent: breef/1.0.0 (https://github.com/elzik/breef)")]
-    Task<RawNewInSubreddit> GetNewInSubreddit(string subRedditName);
+    Task<RawNewInSubreddit> GetNewInSubreddit(string subredditName, int limit = 25, CancellationToken cancellationToken = default);
 
-    [Get("/r/{subRedditName}/about.json")]
+    [Get("/r/{subredditName}/about.json?raw_json=1")]
     [Headers("User-Agent: breef/1.0.0 (https://github.com/elzik/breef)")]
-    Task<AboutSubreddit> GetAboutSubreddit(string subRedditName);
+    Task<AboutSubreddit> GetAboutSubreddit(string subredditName, CancellationToken cancellationToken = default);
 }
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/SubredditClient.cs (1)

7-12: Validate input and thread cancellation.

Guard empty subreddit names, and plumb CancellationToken (if you adopt it on interfaces).

Apply:

-    public async Task<NewInSubreddit> GetNewInSubreddit(string subRedditName)
+    public async Task<NewInSubreddit> GetNewInSubreddit(string subRedditName)
     {
-        var rawNewInSubreddit = await rawSubredditClient.GetNewInSubreddit(subRedditName);
+        ArgumentException.ThrowIfNullOrWhiteSpace(subRedditName);
+        var rawNewInSubreddit = await rawSubredditClient.GetNewInSubreddit(subRedditName);
 
         return await transformer.Transform(rawNewInSubreddit);
     }

If you rename to GetNewInSubredditAsync/TransformAsync and add CancellationToken, update calls accordingly.

src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RawNewInSubreddit.cs (1)

5-39: Limit raw DTO visibility.

These are transport-layer shapes; consider making them internal to reduce public API surface.

tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/SubredditClientTests.cs (1)

62-74: Verify interactions with mocks.

Add Received assertions to ensure the pipeline is exercised once and with expected args.

Apply:

         var result = await _client.GetNewInSubreddit(subRedditName);
 
         // Assert
         result.ShouldNotBeNull();
         result.Posts.ShouldNotBeNull();
         result.Posts.Count.ShouldBe(1);
         result.Posts[0].Post.Id.ShouldBe("post1");
         result.Posts[0].Post.Title.ShouldBe("Test Post");
+        await _mockRawClient.Received(1).GetNewInSubreddit(subRedditName);
+        await _mockTransformer.Received(1).Transform(Arg.Is<RawNewInSubreddit>(x =>
+            x.Data?.Children?.Count == 1 &&
+            x.Data.Children[0].Data?.Id == "post1"));
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1)

53-79: Make image extraction resilient (network/JSON errors) and prefer Uri constants.

Guard DownloadAsync/JSON parsing, use TryGetProperty for data, keep fallback on any failure, and use Uri.UriScheme* constants.

Apply:

-    private async Task<string> ExtractImageUrlAsync(Uri subRedditBaseUri)
+    private async Task<string> ExtractImageUrlAsync(Uri subRedditBaseUri)
     {
-        Uri subRedditAboutUri = new(subRedditBaseUri, "about.json");
-        var jsonContent = await _httpDownloader.DownloadAsync(subRedditAboutUri.AbsoluteUri);
-
-        string[] imageKeys = ["banner_background_image", "banner_img", "mobile_banner_image", "icon_img", "community_icon"];
-
-        using var doc = JsonDocument.Parse(jsonContent);
-        var data = doc.RootElement.GetProperty("data");
-
-        foreach (var imageKey in imageKeys)
-        {
-            if (data.TryGetProperty(imageKey, out var prop))
-            {
-                var imageUrl = prop.GetString();
-                if (!string.IsNullOrWhiteSpace(imageUrl) && 
-                    Uri.TryCreate(imageUrl, UriKind.Absolute, out var uri) &&
-                    (uri.Scheme == "http" || uri.Scheme == "https") &&
-                    await _httpDownloader.TryGet(imageUrl))
-                {
-                    return imageUrl;
-                }
-            }
-        }
-
-        return _redditOptions.FallbackImageUrl;
+        try
+        {
+            Uri subRedditAboutUri = new(subRedditBaseUri, "about.json");
+            var jsonContent = await _httpDownloader.DownloadAsync(subRedditAboutUri.AbsoluteUri);
+
+            string[] imageKeys = ["banner_background_image", "banner_img", "mobile_banner_image", "icon_img", "community_icon"];
+
+            using var doc = JsonDocument.Parse(jsonContent);
+            if (!doc.RootElement.TryGetProperty("data", out var data) || data.ValueKind != JsonValueKind.Object)
+            {
+                return _redditOptions.FallbackImageUrl;
+            }
+
+            foreach (var imageKey in imageKeys)
+            {
+                if (data.TryGetProperty(imageKey, out var prop))
+                {
+                    var imageUrl = prop.GetString();
+                    if (!string.IsNullOrWhiteSpace(imageUrl) &&
+                        Uri.TryCreate(imageUrl, UriKind.Absolute, out var uri) &&
+                        (uri.Scheme == Uri.UriSchemeHttp || uri.Scheme == Uri.UriSchemeHttps) &&
+                        await _httpDownloader.TryGet(imageUrl))
+                    {
+                        return imageUrl;
+                    }
+                }
+            }
+        }
+        catch (Exception) // Http/JSON/cancellation—fallback below
+        {
+        }
+        return _redditOptions.FallbackImageUrl;
     }
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (1)

11-11: Consider renaming for clarity.

The class name SubRedditExtractorTests could be more explicit as SubRedditContentExtractorTests to clearly indicate the class under test.

-    public class SubRedditExtractorTests
+    public class SubRedditContentExtractorTests
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/Client/SubredditClientTests.cs (3)

17-17: Hard-coded base URL should use configuration.

The test directly instantiates the Refit client with a hard-coded URL. For consistency with the production setup (see Program.cs which uses RedditOptions.DefaultBaseAddress), consider extracting the base URL to a test configuration or constant.

Example refactor:

+    private const string RedditBaseUrl = "https://www.reddit.com/";
+
     [SkippableFact]
     public async Task GetNewInSubreddit_ValidSubreddit_ReturnsNewInSubreddit()
     {
         // Arrange
         Skip.If(IsRunningInGitHubWorkflow, "Skipped because requests to reddit.com from GitHub workflows are " +
             "always blocked meaning this test case always fails. This must be run locally instead.");
-        var client = RestService.For<IRawSubredditClient>("https://www.reddit.com/");
+        var client = RestService.For<IRawSubredditClient>(RedditBaseUrl);

11-36: Consider adding error case tests.

The test only validates the happy path with a valid subreddit name. Consider adding test cases for error scenarios such as:

  • Invalid/non-existent subreddit names
  • Network failures or timeouts
  • Rate limiting responses from Reddit's API

53-57: Image URL assertions may be overly strict.

The test asserts that all image URLs (IconImg, BannerImg, BannerBackgroundImage, MobileBannerImage, CommunityIcon) are not null. However, not all subreddits have all these images configured. This could cause the test to fail for subreddits without complete image metadata.

Consider changing to:

         aboutSubreddit.ShouldNotBeNull();
         aboutSubreddit.Data.ShouldNotBeNull();
         aboutSubreddit.Data.PublicDescription.ShouldNotBeNull();
-        aboutSubreddit.Data.IconImg.ShouldNotBeNull();
-        aboutSubreddit.Data.BannerImg.ShouldNotBeNull();
-        aboutSubreddit.Data.BannerBackgroundImage.ShouldNotBeNull();
-        aboutSubreddit.Data.MobileBannerImage.ShouldNotBeNull();
-        aboutSubreddit.Data.CommunityIcon.ShouldNotBeNull();
+        // Image URLs are optional and may be null or empty for some subreddits
+        // Just verify the properties exist (no assertion on values)

Or verify that at least one image URL is populated if the test specifically requires it.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a1c6d5 and 7615d11.

📒 Files selected for processing (14)
  • src/Elzik.Breef.Api/Program.cs (2 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/IRawNewInSubredditTransformer.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/ISubredditClient.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/NewInSubreddit.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/IRawSubredditClient.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RawNewInSubreddit.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RawNewInSubredditTransformer.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/SubredditClient.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/Client/SubredditClientTests.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawNewInSubredditTransformerTests.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/SubredditClientTests.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/NewInSubreddit.cs
🧰 Additional context used
🧬 Code graph analysis (12)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/IRawNewInSubredditTransformer.cs (2)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/NewInSubreddit.cs (1)
  • NewInSubreddit (3-6)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RawNewInSubreddit.cs (1)
  • RawNewInSubreddit (5-9)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RawNewInSubredditTransformer.cs (3)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/IRawNewInSubredditTransformer.cs (1)
  • Task (5-5)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/NewInSubreddit.cs (1)
  • NewInSubreddit (3-6)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RawNewInSubreddit.cs (1)
  • RawNewInSubreddit (5-9)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/ISubredditClient.cs (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/NewInSubreddit.cs (1)
  • NewInSubreddit (3-6)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/Client/SubredditClientTests.cs (2)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/Client/RedditPostClientTests.cs (1)
  • SkippableFact (12-72)
src/Elzik.Breef.Api/Program.cs (1)
  • Task (26-134)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (6)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (1)
  • RedditPostContentExtractor (8-74)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (1)
  • RawRedditPostTransformer (6-178)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPostClient.cs (1)
  • RedditPostClient (5-12)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
  • RedditOptions (5-33)
src/Elzik.Breef.Infrastructure/HttpDownloaderOptions.cs (1)
  • HttpDownloaderOptions (5-11)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (5)
  • SubRedditContentExtractor (8-80)
  • Task (34-45)
  • Task (47-51)
  • Task (53-79)
  • CanHandle (16-32)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (4)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
  • RedditOptions (5-33)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (2)
  • CanHandle (15-32)
  • Task (34-73)
src/Elzik.Breef.Api/Program.cs (1)
  • Task (26-134)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/ISubredditImageExtractor.cs (1)
  • Task (5-5)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawNewInSubredditTransformerTests.cs (4)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RawNewInSubredditTransformer.cs (2)
  • RawNewInSubredditTransformer (3-28)
  • Task (5-27)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/IRawNewInSubredditTransformer.cs (1)
  • Task (5-5)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RawNewInSubreddit.cs (4)
  • RawNewInSubreddit (5-9)
  • RawListingData (11-15)
  • RawChild (17-21)
  • RawPostData (23-39)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPost.cs (2)
  • RedditPostContent (9-19)
  • RedditComment (21-29)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (5)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (2)
  • RedditOptions (5-33)
  • List (24-32)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (5)
  • SubRedditContentExtractor (8-80)
  • Task (34-45)
  • Task (47-51)
  • Task (53-79)
  • CanHandle (16-32)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/NewInSubreddit.cs (1)
  • NewInSubreddit (3-6)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPost.cs (3)
  • RedditPost (3-7)
  • RedditPostContent (9-19)
  • RedditComment (21-29)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/ISubredditImageExtractor.cs (1)
  • Task (5-5)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/IRawSubredditClient.cs (5)
src/Elzik.Breef.Api/Program.cs (1)
  • Task (26-134)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/ISubredditClient.cs (1)
  • Task (5-5)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/SubredditClient.cs (1)
  • Task (7-12)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RawNewInSubreddit.cs (1)
  • RawNewInSubreddit (5-9)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/AboutSubreddit.cs (1)
  • AboutSubreddit (5-9)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/SubredditClient.cs (3)
src/Elzik.Breef.Api/Program.cs (1)
  • Task (26-134)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/ISubredditClient.cs (1)
  • Task (5-5)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/NewInSubreddit.cs (1)
  • NewInSubreddit (3-6)
src/Elzik.Breef.Api/Program.cs (11)
src/Elzik.Breef.Infrastructure/HttpDownloaderOptions.cs (1)
  • HttpDownloaderOptions (5-11)
src/Elzik.Breef.Infrastructure/HttpDownloader.cs (2)
  • HttpDownloader (7-39)
  • HttpDownloader (11-19)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
  • RedditOptions (5-33)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (1)
  • RawRedditPostTransformer (6-178)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPostClient.cs (1)
  • RedditPostClient (5-12)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RawNewInSubredditTransformer.cs (1)
  • RawNewInSubredditTransformer (3-28)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/SubredditClient.cs (1)
  • SubredditClient (5-13)
src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (1)
  • HtmlContentExtractor (6-83)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1)
  • SubRedditContentExtractor (8-80)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (1)
  • RedditPostContentExtractor (8-74)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorStrategy.cs (2)
  • ContentExtractorStrategy (6-36)
  • ContentExtractorStrategy (11-24)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/SubredditClientTests.cs (5)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/SubredditClient.cs (2)
  • SubredditClient (5-13)
  • Task (7-12)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/ISubredditClient.cs (1)
  • Task (5-5)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RawNewInSubreddit.cs (4)
  • RawNewInSubreddit (5-9)
  • RawListingData (11-15)
  • RawChild (17-21)
  • RawPostData (23-39)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/NewInSubreddit.cs (1)
  • NewInSubreddit (3-6)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPost.cs (1)
  • RedditPostContent (9-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-ubuntu
  • GitHub Check: build-ubuntu
  • GitHub Check: Analyze (csharp)
🔇 Additional comments (17)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (5)

1-17: LGTM! Clean test setup.

The imports are appropriate, and the GitHub Actions detection pattern is a good approach for conditionally skipping integration tests in CI environments.


36-60: Excellent test coverage for URL format variations.

The test properly validates the complete extraction pipeline, from URL parsing through to deserialized content structure. Good use of multiple URL formats in the theory data.


113-139: LGTM! Comprehensive structure validation.

The test thoroughly validates that the complete Reddit data structure is properly extracted and serialized, including defensive handling for posts that may have no comments.


171-204: Excellent URL validation coverage!

These tests comprehensively validate the CanHandle method logic, covering:

  • Case insensitivity
  • Trailing slash variations
  • Optional title segment
  • Invalid domains (including old.reddit.com and non-Reddit domains)
  • Invalid path structures

Correctly uses non-skippable [Theory] since these tests don't require network calls.


206-210: LGTM! Clean skip helper.

The helper method provides a clear, reusable way to skip tests in CI with good documentation explaining why Reddit requests fail in GitHub workflows.

src/Elzik.Breef.Api/Program.cs (5)

7-10: LGTM!

The new using statements properly support the Reddit content extractor components registered in the DI container.


64-68: LGTM!

HttpDownloader configuration and registration are properly set up with validation.


70-87: LGTM!

Reddit options configuration and Refit client registrations are properly configured with consistent base address setup.


89-92: LGTM!

Transformers and typed Reddit clients are properly registered with appropriate dependencies.


94-107: LGTM!

Content extractor registrations properly implement the strategy pattern with specific and default extractors. The dual registration of SubRedditContentExtractor (as concrete type and as ISubredditImageExtractor) is intentional and correct for supporting different injection scenarios.

tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (3)

41-113: Excellent test coverage!

The CanHandle tests comprehensively validate URL parsing, domain matching, and configurable base address support across various scenarios.


115-254: LGTM!

The ExtractAsync tests thoroughly validate image resolution, content serialization, title formatting, and URL parsing with clear test scenarios.


256-487: Excellent comprehensive test coverage!

The GetSubredditImageUrlAsync tests are exceptionally thorough, covering image priority, fallback logic, URL validation, and importantly, security concerns by explicitly rejecting non-HTTP schemes (javascript:, file://, data: URIs). The test coverage includes edge cases and error propagation.

tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/Client/RawNewInSubredditTransformerTests.cs (3)

18-162: LGTM!

The transformation tests thoroughly validate both successful transformations with complete data and graceful handling of null/empty edge cases.


164-357: LGTM!

The filtering tests properly validate that invalid children (null data or null IDs) are skipped, and the single post test comprehensively validates all fields including nested comments.


359-406: Excellent testing of critical scenarios!

The null argument validation is properly tested with ParamName verification. The concurrent fetching test is particularly well-designed, using TaskCompletionSource to coordinate and verify that posts are fetched concurrently via Task.WhenAll, ensuring both correctness and performance.

tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/Client/SubredditClientTests.cs (1)

7-59: AI summary describes different content than what exists in the code.

The AI summary mentions a test class SubredditClientTests with a test method GetNewInSubreddit_ValidSubredditName_ReturnsTransformedResult(), but the actual code contains a class RawSubredditClientTests with test methods GetNewInSubReddit_ValidSubreddit_ReturnsNewInSubreddit() and GetAboutSubreddit_ValidSubreddit_ReturnsAboutSubreddit(). The summary appears to be describing a different file or an earlier version.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (7)
src/Elzik.Breef.Api/Program.cs (1)

75-87: Refit clients: add User-Agent and a timeout; dedupe config.

Reddit expects a UA, and lack of a timeout can hang request threads. Reuse HttpDownloaderOptions.UserAgent and set a reasonable timeout. Also, both clients share identical setup; factor to a local helper to avoid drift.

Apply this diff:

 builder.Services.AddRefitClient<IRawRedditPostClient>()
     .ConfigureHttpClient((provider, client) =>
     {
         var redditOptions = provider.GetRequiredService<IOptions<RedditOptions>>().Value;
-        client.BaseAddress = new Uri(redditOptions.DefaultBaseAddress);
+        var dlOpts = provider.GetRequiredService<IOptions<HttpDownloaderOptions>>().Value;
+        client.BaseAddress = new Uri(redditOptions.DefaultBaseAddress);
+        client.DefaultRequestHeaders.UserAgent.ParseAdd(dlOpts.UserAgent);
+        client.Timeout = TimeSpan.FromSeconds(30);
     });
 
 builder.Services.AddRefitClient<IRawSubredditClient>()
     .ConfigureHttpClient((provider, client) =>
     {
         var redditOptions = provider.GetRequiredService<IOptions<RedditOptions>>().Value;
-        client.BaseAddress = new Uri(redditOptions.DefaultBaseAddress);
+        var dlOpts = provider.GetRequiredService<IOptions<HttpDownloaderOptions>>().Value;
+        client.BaseAddress = new Uri(redditOptions.DefaultBaseAddress);
+        client.DefaultRequestHeaders.UserAgent.ParseAdd(dlOpts.UserAgent);
+        client.Timeout = TimeSpan.FromSeconds(30);
     });

Optional: extract the lambda into a local method:

static void ConfigureRedditHttpClient(IServiceProvider provider, HttpClient client)
{
    var reddit = provider.GetRequiredService<IOptions<RedditOptions>>().Value;
    var dl = provider.GetRequiredService<IOptions<HttpDownloaderOptions>>().Value;
    client.BaseAddress = new Uri(reddit.DefaultBaseAddress);
    client.DefaultRequestHeaders.UserAgent.ParseAdd(dl.UserAgent);
    client.Timeout = TimeSpan.FromSeconds(30);
}

Then reference with .ConfigureHttpClient(ConfigureRedditHttpClient).

tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (6)

1-9: Add Http namespaces and drop unused using.

You add HttpStatusCode and HttpClient; also remove the unused Xunit.Abstractions import.

 using Elzik.Breef.Infrastructure.ContentExtractors.Reddit;
 using Elzik.Breef.Infrastructure.ContentExtractors.Reddit.Client;
 using Elzik.Breef.Infrastructure.ContentExtractors.Reddit.Client.Raw;
 using Microsoft.Extensions.Options;
+using System.Net;
+using System.Net.Http;
 using Refit;
 using Shouldly;
 using System.Text.Json;
-using Xunit.Abstractions;

16-17: Keep clients as fields for assertions in tests.

Having these as fields lets you compute expected images without hard-coding URLs.

-        private readonly RedditPostContentExtractor _extractor;
+        private readonly RedditPostContentExtractor _extractor;
+        private readonly IRedditPostClient _redditPostClient;
+        private readonly ISubredditImageExtractor _subredditImageExtractor;

18-34: Set explicit User‑Agent and timeout for Refit clients (reduces flakiness/429s).

Reddit often requires a UA; add UA and sane timeouts to Refit HttpClients. Also assign fields for later assertions.

         public RedditPostContentExtractorTests()
         {
-            var rawRedditClient = RestService.For<IRawRedditPostClient>("https://www.reddit.com/");
+            var redditHttpClient = new HttpClient
+            {
+                BaseAddress = new Uri("https://www.reddit.com/"),
+                Timeout = TimeSpan.FromSeconds(15)
+            };
+            redditHttpClient.DefaultRequestHeaders.UserAgent.ParseAdd("breef-integration-tests/1.0 (+https://github.com/elzik/breef)");
+            var rawRedditClient = RestService.For<IRawRedditPostClient>(redditHttpClient);
             var transformer = new RawRedditPostTransformer();
             var redditPostClient = new RedditPostClient(rawRedditClient, transformer);
             
-            var rawSubredditClient = RestService.For<IRawSubredditClient>("https://www.reddit.com/");
+            var subredditHttpClient = new HttpClient
+            {
+                BaseAddress = new Uri("https://www.reddit.com/"),
+                Timeout = TimeSpan.FromSeconds(15)
+            };
+            subredditHttpClient.DefaultRequestHeaders.UserAgent.ParseAdd("breef-integration-tests/1.0 (+https://github.com/elzik/breef)");
+            var rawSubredditClient = RestService.For<IRawSubredditClient>(subredditHttpClient);
             var rawNewInSubredditTransformer = new RawNewInSubredditTransformer(redditPostClient);
             var subredditClient = new SubredditClient(rawSubredditClient, rawNewInSubredditTransformer);
             
             var redditOptions = Options.Create(new RedditOptions());
             var httpDownloaderOptions = Options.Create(new HttpDownloaderOptions { UserAgent = "breef-integration-tests" });
             var httpDownloader = new HttpDownloader(new Microsoft.Extensions.Logging.Abstractions.NullLogger<HttpDownloader>(), httpDownloaderOptions);
             var subredditImageExtractor = new SubredditContentExtractor(subredditClient, httpDownloader, redditOptions);
             
+            _redditPostClient = redditPostClient;
+            _subredditImageExtractor = subredditImageExtractor;
             _extractor = new RedditPostContentExtractor(redditPostClient, subredditImageExtractor, redditOptions);
         }

96-111: Use the configured fallback constant instead of duplicating the URL literal.

Reduces brittleness if the fallback URL changes in options.

             // Act
             var result = await _extractor.ExtractAsync(urlWithNoImage);
 
             // Assert
-            result.ShouldNotBeNull();
-            result.PreviewImageUrl.ShouldNotBeNull();
-            result.PreviewImageUrl.ShouldBe("https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Lockup_Logo.svg");
+            var expected = new RedditOptions().FallbackImageUrl;
+
+            result.ShouldNotBeNull();
+            result.PreviewImageUrl.ShouldNotBeNull();
+            result.PreviewImageUrl.ShouldBe(expected);

156-169: Assert ApiException.StatusCode, not Message.

Error messages vary by Refit/version; StatusCode is stable.

             // Assert
-            ex.Message.ShouldBe("Response status code does not indicate success: 404 (Not Found).");
+            ex.StatusCode.ShouldBe(HttpStatusCode.NotFound);

14-15: Generalize CI skip detection beyond GitHub Actions.

Also honor CI=true to avoid failures on other CI providers.

-        private static bool IsRunningInGitHubWorkflow => Environment.GetEnvironmentVariable("GITHUB_ACTIONS") == "true";
+        private static bool IsRunningInCi =>
+            string.Equals(Environment.GetEnvironmentVariable("GITHUB_ACTIONS"), "true", StringComparison.OrdinalIgnoreCase) ||
+            string.Equals(Environment.GetEnvironmentVariable("CI"), "true", StringComparison.OrdinalIgnoreCase);
@@
-            Skip.If(IsRunningInGitHubWorkflow, reason);
+            Skip.If(IsRunningInCi, reason);

Also applies to: 206-210

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7615d11 and 66cf3de.

📒 Files selected for processing (5)
  • src/Elzik.Breef.Api/Program.cs (2 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/Client/RawSubredditClientTests.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs
🧰 Additional context used
🧬 Code graph analysis (4)
src/Elzik.Breef.Api/Program.cs (11)
src/Elzik.Breef.Infrastructure/HttpDownloaderOptions.cs (1)
  • HttpDownloaderOptions (5-11)
src/Elzik.Breef.Infrastructure/HttpDownloader.cs (2)
  • HttpDownloader (7-39)
  • HttpDownloader (11-19)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
  • RedditOptions (5-33)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (1)
  • RawRedditPostTransformer (6-178)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPostClient.cs (1)
  • RedditPostClient (5-12)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RawNewInSubredditTransformer.cs (1)
  • RawNewInSubredditTransformer (3-28)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/SubredditClient.cs (1)
  • SubredditClient (5-13)
src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (1)
  • HtmlContentExtractor (6-83)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1)
  • SubredditContentExtractor (8-80)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (1)
  • RedditPostContentExtractor (8-74)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorStrategy.cs (2)
  • ContentExtractorStrategy (6-36)
  • ContentExtractorStrategy (11-24)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/Client/RawSubredditClientTests.cs (2)
src/Elzik.Breef.Api/Program.cs (1)
  • Task (26-134)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (3)
  • Task (34-45)
  • Task (47-51)
  • Task (53-79)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (3)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (2)
  • RedditOptions (5-33)
  • List (24-32)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (5)
  • SubredditContentExtractor (8-80)
  • Task (34-45)
  • Task (47-51)
  • Task (53-79)
  • CanHandle (16-32)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/NewInSubreddit.cs (1)
  • NewInSubreddit (3-6)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (5)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (1)
  • RedditPostContentExtractor (8-74)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (1)
  • RawRedditPostTransformer (6-178)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPostClient.cs (1)
  • RedditPostClient (5-12)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
  • RedditOptions (5-33)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (5)
  • SubredditContentExtractor (8-80)
  • Task (34-45)
  • Task (47-51)
  • Task (53-79)
  • CanHandle (16-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-ubuntu
  • GitHub Check: build-ubuntu
  • GitHub Check: Analyze (csharp)
🔇 Additional comments (3)
src/Elzik.Breef.Api/Program.cs (3)

7-10: Imports look good and scoped to the new Reddit pipeline.


70-74: RedditOptions binding/validation is solid.


94-107: Strategy wiring is correct; default extractor last. Add light tests to lock behavior.

Order places specific extractors before the catch‑all HtmlContentExtractor; good. Consider a couple of unit tests asserting selection for:

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (2)

69-84: Hard-coded image URL makes test brittle.

Line 83 asserts against a specific Reddit image URL that may change over time (e.g., CDN rotation, re-encoding, query parameter changes). This makes the test fragile.

The previous review suggested dynamically fetching the expected URL from the API instead of hard-coding it. This approach would make the test resilient to URL changes while still verifying that the extractor correctly retrieves the post image.


86-101: Hard-coded subreddit fallback URL makes test brittle.

Line 100 asserts against a specific subreddit image URL. Subreddit images can change when moderators update banners, icons, or community graphics.

The previous review suggested fetching the expected fallback URL dynamically via ISubredditImageExtractor.GetSubredditImageUrlAsync("bristol") to make the test resilient to subreddit image changes.

🧹 Nitpick comments (2)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (2)

103-118: Consider reading fallback URL from configuration.

Line 117 hard-codes the Reddit fallback URL, which duplicates the default value from RedditOptions.FallbackImageUrl. If the configuration default changes, this test will need updating.

While this assertion is less brittle than the previous two (since it's a configuration value rather than live Reddit data), reading from the configured options would eliminate duplication:

         // Assert
         result.ShouldNotBeNull();
         result.PreviewImageUrl.ShouldNotBeNull();
-        result.PreviewImageUrl.ShouldBe("https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Lockup_Logo.svg");
+        var expectedFallback = new RedditOptions().FallbackImageUrl;
+        result.PreviewImageUrl.ShouldBe(expectedFallback);

213-217: Consider extracting the skip reason to a constant.

The default reason string spans lines 213-214. Extracting it to a class-level constant would improve readability.

     public class RedditPostContentExtractorTests
     {
         private static bool IsRunningInGitHubWorkflow => Environment.GetEnvironmentVariable("GITHUB_ACTIONS") == "true";
+        private const string GitHubWorkflowSkipReason = "Skipped because requests to reddit.com from GitHub workflows " +
+            "are always blocked meaning this test case always fails. This must be run locally instead.";
 
         private readonly RedditPostContentExtractor _extractor;
         
         // ... rest of class ...
         
-        private static void SkipIfInGitHubWorkflow(string reason = "Skipped because requests to reddit.com from GitHub workflows " +
-            "are always blocked meaning this test case always fails. This must be run locally instead.")
+        private static void SkipIfInGitHubWorkflow(string reason = GitHubWorkflowSkipReason)
         {
             Skip.If(IsRunningInGitHubWorkflow, reason);
         }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 36edc81 and 0c613f9.

📒 Files selected for processing (8)
  • README.md (3 hunks)
  • src/Elzik.Breef.Api/Program.cs (2 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (2 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/HttpClientOptions.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/HtmlContentExtractorTests.cs (3 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (8)
README.md (1)
src/Elzik.Breef.Infrastructure/WebPageDownLoaderOptions.cs (1)
  • WebPageDownLoaderOptions (5-11)
src/Elzik.Breef.Infrastructure/HttpClientOptions.cs (2)
src/Elzik.Breef.Infrastructure/WebPageDownLoaderOptions.cs (1)
  • WebPageDownLoaderOptions (5-11)
tests/Elzik.Breef.Infrastructure.Tests.Integration/WebPageDownLoaderOptionsTests.cs (1)
  • WebPageDownLoaderOptionsTests (7-44)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (4)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
  • RedditOptions (5-33)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (2)
  • CanHandle (15-32)
  • Task (34-73)
src/Elzik.Breef.Api/Program.cs (1)
  • Task (26-141)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/ISubredditImageExtractor.cs (1)
  • Task (5-5)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (4)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (2)
  • RedditOptions (5-33)
  • List (24-32)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (5)
  • SubredditContentExtractor (8-85)
  • CanHandle (16-32)
  • Task (34-45)
  • Task (47-51)
  • Task (53-84)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/NewInSubreddit.cs (1)
  • NewInSubreddit (3-6)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPost.cs (3)
  • RedditPost (3-7)
  • RedditPostContent (9-19)
  • RedditComment (21-29)
src/Elzik.Breef.Api/Program.cs (10)
src/Elzik.Breef.Infrastructure/HttpClientOptions.cs (1)
  • HttpClientOptions (5-14)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
  • RedditOptions (5-33)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (1)
  • RawRedditPostTransformer (6-178)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPostClient.cs (1)
  • RedditPostClient (5-12)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RawNewInSubredditTransformer.cs (1)
  • RawNewInSubredditTransformer (3-28)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/SubredditClient.cs (1)
  • SubredditClient (5-13)
src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (1)
  • HtmlContentExtractor (6-84)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1)
  • SubredditContentExtractor (8-85)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (1)
  • RedditPostContentExtractor (8-74)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorStrategy.cs (2)
  • ContentExtractorStrategy (6-36)
  • ContentExtractorStrategy (11-24)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/HtmlContentExtractorTests.cs (3)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (12)
  • MockHttpMessageHandler (549-581)
  • MockHttpMessageHandler (556-562)
  • Fact (163-179)
  • Fact (181-197)
  • Fact (199-242)
  • Fact (308-324)
  • Fact (326-343)
  • Fact (345-371)
  • Fact (373-399)
  • Fact (401-416)
  • Task (564-580)
  • Task (592-595)
src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (3)
  • HtmlContentExtractor (6-84)
  • CanHandle (83-83)
  • Task (8-21)
src/Elzik.Breef.Api/Program.cs (1)
  • Task (26-141)
src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (4)
src/Elzik.Breef.Api/Program.cs (1)
  • Task (26-141)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (4)
  • Task (34-45)
  • Task (47-51)
  • Task (53-84)
  • CanHandle (16-32)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/HtmlContentExtractorTests.cs (1)
  • Task (67-74)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorStrategy.cs (2)
  • Task (28-35)
  • CanHandle (26-26)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (7)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (1)
  • RedditPostContentExtractor (8-74)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (1)
  • RawRedditPostTransformer (6-178)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPostClient.cs (1)
  • RedditPostClient (5-12)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/SubredditClient.cs (1)
  • SubredditClient (5-13)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
  • RedditOptions (5-33)
src/Elzik.Breef.Infrastructure/HttpClientOptions.cs (1)
  • HttpClientOptions (5-14)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (5)
  • SubredditContentExtractor (8-85)
  • Task (34-45)
  • Task (47-51)
  • Task (53-84)
  • CanHandle (16-32)
🪛 markdownlint-cli2 (0.18.1)
README.md

144-144: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-ubuntu
  • GitHub Check: build-ubuntu
  • GitHub Check: Analyze (csharp)
🔇 Additional comments (23)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (6)

43-67: LGTM!

Good test coverage of URL format variations (with/without trailing slash, with/without title). The assertions properly validate both the extract structure and the deserialized Reddit post content.


120-146: LGTM!

Thorough validation of the complete Reddit structure. The conditional check for comments (lines 140-145) is appropriate since not all posts have comments.


148-161: LGTM!

Good coverage of invalid URL formats. The test cases appropriately cover malformed URLs, wrong paths, wrong domains, and incorrect segment structures.


163-176: LGTM!

Proper validation of 404 handling for non-existent posts. The assertion on the exception message ensures the correct error is thrown.


178-194: LGTM!

Comprehensive coverage of valid URL patterns, including case-insensitivity (line 183-184) and trailing slash variations.


196-211: LGTM!

Good coverage of invalid URL patterns. The test on line 202 correctly validates that old.reddit.com is rejected, which aligns with the current RedditOptions domain allowlist (www.reddit.com and reddit.com only).

src/Elzik.Breef.Infrastructure/HttpClientOptions.cs (1)

1-14: LGTM! Clean configuration class.

The HttpClientOptions class is well-structured with appropriate data annotations and sensible defaults. The timeout validation ensures positive values, and the browser-like User-Agent string is a standard practice for HTTP clients.

README.md (1)

83-109: Excellent Reddit configuration documentation.

The Reddit configuration section is comprehensive and clearly explains the exact domain matching behavior, which is crucial for users to understand. The examples and warnings about domain variants (reddit.com vs www.reddit.com) will prevent common configuration mistakes.

tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/HtmlContentExtractorTests.cs (3)

17-44: Good test migration to IHttpClientFactory.

The test has been properly updated to use IHttpClientFactory with a MockHttpMessageHandler, which is the correct pattern for testing HTTP interactions without actual network calls.


46-58: Appropriate test for fallback extractor behavior.

The CanHandle_AnyString_CanHandle test correctly verifies that HtmlContentExtractor accepts any string, which aligns with its role as the default/fallback extractor in the ContentExtractorStrategy.


65-75: Clean mock implementation.

The MockHttpMessageHandler implementation is simple and effective, using modern C# primary constructor syntax for clean test setup.

src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (3)

8-32: Solid URL validation logic.

The CanHandle method correctly validates Reddit subreddit URLs by checking:

  • URI validity
  • Domain matching against configured Reddit domains
  • Path structure (/r/{subreddit})

The domain validation using RedditOptions.AllDomains provides flexible configuration support.


34-45: Clear extraction logic.

The ExtractAsync method properly handles URL normalization (trailing slash), extracts the subreddit name, fetches content, and resolves the image URL. The flow is clean and easy to follow.


47-51: LGTM!

The GetSubredditImageUrlAsync method provides a clean public interface for getting subreddit images, properly constructing the URI and delegating to the shared extraction logic.

src/Elzik.Breef.Api/Program.cs (3)

64-75: Excellent IHttpClientFactory configuration.

The migration from manual HttpClient management to IHttpClientFactory with a named client is the correct approach. The configuration properly sets timeout and user agent from HttpClientOptions, following best practices for HttpClient lifetime management.


77-94: Well-structured Reddit client configuration.

The Reddit options and Refit client registrations are properly configured with validation and consistent base address setup. The pattern is clean and maintainable.


96-114: Clean strategy pattern implementation.

The content extractor registration uses a proper factory pattern to compose the ContentExtractorStrategy with specific extractors (Reddit) and a fallback (HTML). The ordering ensures Reddit URLs are handled by specialized extractors while other URLs fall back to generic HTML extraction.

tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (4)

19-113: Comprehensive CanHandle test coverage.

The test suite thoroughly validates URL handling including:

  • Various URL formats (with/without trailing slash, case-insensitive)
  • Invalid URLs and paths
  • Custom Reddit instances
  • Domain validation

This coverage ensures the extractor correctly identifies Reddit subreddit URLs.


115-259: Thorough ExtractAsync test coverage.

The tests validate all key aspects of content extraction:

  • Image URL resolution and fallback behavior
  • Title formatting
  • JSON content serialization
  • Subreddit name extraction from various URL formats

261-496: Excellent edge case and security testing.

The GetSubredditImageUrlAsync tests cover critical security scenarios:

  • URL validation (data URIs, FTP, file://, javascript:, mailto:)
  • Invalid URI formats
  • Mixed valid/invalid URL prioritization
  • Exception propagation

This thorough testing helps prevent security issues and ensures robust fallback behavior.


534-596: Clean test infrastructure.

The helper methods and mock handlers are well-designed:

  • CreateJsonWithImageKey simplifies test data creation
  • MockHttpMessageHandler supports both success and failure paths
  • ThrowingMockHttpMessageHandler enables exception testing

This infrastructure makes tests readable and maintainable.

src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (2)

4-21: Excellent refactor to IHttpClientFactory.

The migration from direct IWebPageDownloader dependency to IHttpClientFactory follows .NET best practices:

  • Uses named client ("BreefDownloader") configured centrally
  • Avoids HttpClient lifetime management issues
  • Maintains clean separation of concerns

83-83: Correct implementation for default extractor.

The CanHandle method returning true unconditionally is appropriate for HtmlContentExtractor's role as the default/fallback handler in the ContentExtractorStrategy. This ensures any URL not matched by specialized extractors will be handled.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (3)

69-84: Consider deriving expected image URL from API response.

The hard-coded image URL on line 83 makes this test brittle. If Reddit modifies its CDN URLs, changes image processing parameters, or the image is removed, the test will fail.

For improved resilience, consider fetching the post via the client and asserting against the actual ImageUrl property rather than a fixed string:

         // Act
         var result = await _extractor.ExtractAsync(urlWithKnownGoodImage);

         // Assert
+        var uri = new Uri(urlWithKnownGoodImage);
+        var postId = uri.AbsolutePath.Trim('/').Split('/')[3];
+        var expectedPost = await redditPostClient.GetPost(postId);
+        
         result.ShouldNotBeNull();
         result.PreviewImageUrl.ShouldNotBeNull();
-        result.PreviewImageUrl.ShouldBe("https://preview.redd.it/olmpl5vmp3tf1.jpeg?auto=webp&s=1cb106a6fab1ddd48bcf8e9afdd2a06ca22d46ba");
+        result.PreviewImageUrl.ShouldBe(expectedPost.Post.ImageUrl);

Note: You'll need access to redditPostClient (currently local in the constructor). Consider promoting it to a field.


86-101: Consider deriving expected subreddit image from extractor.

The hard-coded subreddit image URL on line 100 creates the same brittleness concern. Subreddit icons and banners frequently change, which would break this assertion.

A previous review suggested resolving the expected URL via subredditImageExtractor at runtime:

         // Act
         var result = await _extractor.ExtractAsync(urlWithNoImage);

         // Assert
+        var subredditName = new Uri(urlWithNoImage).AbsolutePath.Trim('/').Split('/')[1];
+        var expectedImageUrl = await subredditImageExtractor.GetSubredditImageUrlAsync(subredditName);
+        
         result.ShouldNotBeNull();
         result.PreviewImageUrl.ShouldNotBeNull();
-        result.PreviewImageUrl.ShouldBe("https://b.thumbs.redditmedia.com/fMCtUDLMEEt1SrDtRyg1v1xiXVoXmP_3dxScj1kgzoE.png");
+        result.PreviewImageUrl.ShouldBe(expectedImageUrl);

Note: You'll need access to subredditImageExtractor (currently local in the constructor). Consider promoting it to a field.


163-176: Consider asserting on exception properties rather than message text.

The hard-coded exception message on line 175 couples this test to Refit's exact message format. If Refit changes its error message template, this test will break.

For improved resilience, consider asserting on the status code or exception type instead:

         // Act
-        var ex = await Should.ThrowAsync<ApiException>(() => _extractor.ExtractAsync(url));
+        var ex = await Should.ThrowAsync<ApiException>(() => _extractor.ExtractAsync(url));

         // Assert
-        ex.Message.ShouldBe("Response status code does not indicate success: 404 (Not Found).");
+        ex.StatusCode.ShouldBe(System.Net.HttpStatusCode.NotFound);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10b12c3 and d9bc281.

📒 Files selected for processing (1)
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (6)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (1)
  • RedditPostContentExtractor (8-74)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/Raw/RawRedditPostTransformer.cs (1)
  • RawRedditPostTransformer (6-178)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPostClient.cs (1)
  • RedditPostClient (5-12)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
  • RedditOptions (5-33)
src/Elzik.Breef.Infrastructure/HttpClientOptions.cs (1)
  • HttpClientOptions (5-14)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (5)
  • SubredditContentExtractor (8-85)
  • Task (34-45)
  • Task (47-51)
  • Task (53-84)
  • CanHandle (16-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-ubuntu
  • GitHub Check: build-ubuntu
  • GitHub Check: Analyze (csharp)
🔇 Additional comments (10)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (10)

1-17: LGTM!

The class structure is well-organized with appropriate usings, the sealed keyword prevents unintended inheritance, and IDisposable is properly implemented for resource cleanup.


19-41: LGTM!

The constructor properly wires up real Reddit clients with a mocked IHttpClientFactory. The HttpClient is now stored in a field and disposed in the Dispose() method, correctly addressing the resource leak concern from previous reviews.


43-67: LGTM!

Good coverage of URL format variations with comprehensive assertions validating both the Extract result and the deserialized RedditPost structure.


103-118: LGTM!

This test correctly validates the global fallback scenario by asserting against the default RedditOptions.FallbackImageUrl. This is appropriate as it's testing configuration-driven behavior.


120-146: LGTM!

Comprehensive validation of the complete Reddit data structure, with appropriate conditional checks for optional elements like comments.


148-161: LGTM!

Solid negative test coverage with various invalid URL patterns, ensuring proper exception handling.


178-194: LGTM!

Excellent coverage of valid URL patterns, including case-insensitivity, optional title segments, and trailing slashes.


196-211: LGTM!

Comprehensive negative test coverage including missing segments, wrong path structures, extra segments, and unsupported domains.


213-217: LGTM!

Clean helper method with an informative default message explaining why these tests must run locally.


219-222: LGTM!

Proper implementation of IDisposable with safe disposal of the HttpClient resource.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (3)

34-45: Consider adding input validation for consistency.

Unlike RedditPostContentExtractor.ExtractAsync (which validates the URL format and domain before processing), this method assumes CanHandle has already been called. While the strategy pattern likely ensures this, adding validation would make the method more robust and provide clearer error messages if called directly.

Consider adding validation similar to RedditPostContentExtractor:

 public async Task<Extract> ExtractAsync(string webPageUrl)
 {
+    if (!Uri.TryCreate(webPageUrl, UriKind.Absolute, out Uri? webPageUri))
+        throw new InvalidOperationException($"Invalid URL format: '{webPageUrl}'. " +
+            $"URL must be a valid absolute URI.");
+
+    var requestDomain = webPageUri.Host;
+    
+    if (!_redditOptions.AllDomains.Any(allowedDomain => 
+        requestDomain.Equals(allowedDomain, StringComparison.OrdinalIgnoreCase)))
+    {
+        var supportedDomains = string.Join(", ", _redditOptions.AllDomains);
+        throw new InvalidOperationException($"Unsupported domain: '{requestDomain}'. " +
+            $"Supported domains: {supportedDomains}");
+    }
+
     var webPageUri = new Uri(webPageUrl.EndsWith(UrlPathSeparator) ? webPageUrl : webPageUrl + UrlPathSeparator, UriKind.Absolute);
     var webPageParts = webPageUri.AbsolutePath.Trim(UrlPathSeparator).Split(UrlPathSeparator);
+    
+    if (!(webPageParts.Length == 2 && 
+        webPageParts[0].Equals("r", StringComparison.OrdinalIgnoreCase)))
+    {
+        throw new InvalidOperationException($"Unsupported Reddit URL format: '{webPageUrl}'. " +
+            $"Expected format: 'https://[reddit-domain]/r/[subreddit]'.");
+    }
+    
     var subredditName = webPageParts[^1];

47-51: Add input validation for the subredditName parameter.

The method doesn't validate the subredditName parameter before using it to construct a URI. Invalid input (null, empty, or containing invalid characters) will cause unclear exceptions.

 public async Task<string> GetSubredditImageUrlAsync(string subredditName)
 {
+    if (string.IsNullOrWhiteSpace(subredditName))
+        throw new ArgumentException("Subreddit name cannot be null or whitespace.", nameof(subredditName));
+    
     var subRedditBaseUri = new Uri($"{_redditOptions.DefaultBaseAddress}/r/{subredditName}/");
     return await ExtractImageUrlAsync(subRedditBaseUri);
 }

53-84: Consider reusing the HttpClient instance.

The method creates an HttpClient instance twice (lines 56 and 73) with the same name. While IHttpClientFactory manages the underlying handlers efficiently, creating the client once and reusing it would be cleaner and avoid unnecessary allocations.

 private async Task<string> ExtractImageUrlAsync(Uri subRedditBaseUri)
 {
     Uri subRedditAboutUri = new(subRedditBaseUri, "about.json");
     var httpClient = _httpClientFactory.CreateClient("BreefDownloader");
     var jsonContent = await httpClient.GetStringAsync(subRedditAboutUri.AbsoluteUri);

     string[] imageKeys = ["banner_background_image", "banner_img", "mobile_banner_image", "icon_img", "community_icon"];

     using var doc = JsonDocument.Parse(jsonContent);
     var data = doc.RootElement.GetProperty("data");

     foreach (var imageKey in imageKeys)
     {
         if (data.TryGetProperty(imageKey, out var prop))
         {
             var imageUrl = prop.GetString();
             if (!string.IsNullOrWhiteSpace(imageUrl) && 
                 Uri.TryCreate(imageUrl, UriKind.Absolute, out var uri) &&
                 (uri.Scheme == "http" || uri.Scheme == "https"))
             {
-                var client = _httpClientFactory.CreateClient("BreefDownloader");
-                using var response = await client.GetAsync(imageUrl);
+                using var response = await httpClient.GetAsync(imageUrl);
                 if (response.IsSuccessStatusCode)
                 {
                     return imageUrl;
                 }
             }
         }
     }

     return _redditOptions.FallbackImageUrl;
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d9bc281 and 10702f7.

📒 Files selected for processing (1)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (5)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
  • RedditOptions (5-33)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (2)
  • CanHandle (15-32)
  • Task (34-73)
src/Elzik.Breef.Api/Program.cs (1)
  • Task (26-141)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/RedditPostClient.cs (1)
  • Task (7-11)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/ISubredditImageExtractor.cs (1)
  • Task (5-5)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-ubuntu
  • GitHub Check: build-ubuntu
  • GitHub Check: Analyze (csharp)
🔇 Additional comments (2)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (2)

8-14: LGTM - Clean primary constructor usage.

The primary constructor syntax is well-applied, and the selective field storage is appropriate since subredditClient is only referenced once.


16-32: LGTM - Robust URL validation.

The validation logic correctly identifies subreddit URLs with appropriate domain and path checks.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10702f7 and 49ae929.

📒 Files selected for processing (1)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (4)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
  • RedditOptions (5-33)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (2)
  • CanHandle (15-32)
  • Task (34-73)
src/Elzik.Breef.Api/Program.cs (1)
  • Task (26-141)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/ISubredditImageExtractor.cs (1)
  • Task (5-5)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-ubuntu
  • GitHub Check: build-ubuntu
  • GitHub Check: Analyze (csharp)

@sonarqubecloud
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (2)

537-538: Minor: Redundant mock setup.

This mock configuration duplicates the setup in the constructor (line 22-23). While harmless, it's unnecessary since the constructor already configures this default behavior.

You can remove these lines:

-            _mockSubredditClient.GetNewInSubreddit("subreddit")
-               .Returns(new NewInSubreddit { Posts = new List<RedditPost>() });
-

285-306: Weak test assertion noted.

The comment at lines 304-305 acknowledges that this test cannot verify the exact URL called due to MockHttpMessageHandler limitations. While the test is valid (it ensures no exception is thrown), it provides limited verification of the actual behavior.

Consider enhancing MockHttpMessageHandler to capture the last requested URL for verification, or use a more sophisticated HTTP mocking library that provides request inspection capabilities.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49ae929 and 245cc26.

📒 Files selected for processing (1)
  • tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (3)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (2)
  • RedditOptions (5-33)
  • List (24-32)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (5)
  • SubredditContentExtractor (8-85)
  • CanHandle (16-32)
  • Task (34-45)
  • Task (47-51)
  • Task (53-84)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/Client/NewInSubreddit.cs (1)
  • NewInSubreddit (3-6)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-ubuntu
  • GitHub Check: build-ubuntu
  • GitHub Check: Analyze (csharp)
🔇 Additional comments (5)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/Reddit/SubRedditExtractorTests.cs (5)

19-39: Excellent test setup!

The constructor properly initializes all mocks with sensible defaults. The use of MockHttpMessageHandler to simulate HTTP responses is a clean approach that avoids external dependencies.


41-113: Comprehensive URL validation coverage!

The CanHandle tests thoroughly validate URL parsing across various scenarios including case-insensitivity, trailing slashes, custom Reddit instances, and invalid formats. The parameterized tests using Theory and InlineData are well-chosen.


115-259: Thorough ExtractAsync test coverage!

These tests comprehensively validate the extraction workflow including image URL resolution, fallback behavior, content serialization, and subreddit name parsing. The test at lines 199-242 that verifies the serialize/deserialize roundtrip is particularly good practice.


261-496: Outstanding security-aware test coverage!

The GetSubredditImageUrlAsync tests are exceptionally thorough, particularly the security-focused scenarios testing unsuitable URL schemes (lines 418-428: javascript:, data:, file:, ftp:, mailto:). This demonstrates defense-in-depth thinking. The tests also properly validate image priority order, fallback chains, and error propagation.


551-613: Well-designed test infrastructure!

The helper methods and custom HTTP message handlers provide clean, reusable test infrastructure. The MockHttpMessageHandler with conditional response support (lines 566-598) elegantly handles both success and failure scenarios for specific URLs, and the ThrowingMockHttpMessageHandler (lines 600-613) enables clean exception propagation testing.

@elzik elzik merged commit b2e9ecf into main Oct 24, 2025
14 checks passed
@elzik elzik deleted the summarise-sub-reddit branch October 24, 2025 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants