Skip to content

Conversation

@elzik
Copy link
Owner

@elzik elzik commented Oct 29, 2025

Summary by CodeRabbit

  • New Features

    • Summaries now follow content-type-specific instruction templates for clearer, more consistent results.
    • Extracted content is tagged with a content type to drive tailored summarisation.
  • Changes

    • Summary configuration moved from numeric options to file-based instruction guidelines per content type.
    • Sample request examples added for common content types and health checks.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 29, 2025

Walkthrough

The PR refactors extraction to derive ExtractType from extractor class names, introduces file-backed, type-specific summarisation instructions, updates the summariser to accept runtime instructions, and threads those instructions through BreefGenerator to produce summaries; multiple extractors and tests are adapted to the new contracts.

Changes

Cohort / File(s) Summary
Domain: Extract & Instruction API
src/Elzik.Breef.Domain/Extract.cs, src/Elzik.Breef.Domain/IContentSummarisationInstructionProvider.cs
Adds ExtractType to Extract and a new IContentSummarisationInstructionProvider interface for retrieving instructions by extract type.
Extractor Base & Intermediate Type
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs, src/Elzik.Breef.Infrastructure/ContentExtractors/UntypedExtract.cs
New abstract ContentExtractorBase enforces naming convention and converts UntypedExtractExtract with derived type; adds UntypedExtract record and WithType helper.
Extractor Implementations & Strategy
src/Elzik.Breef.Infrastructure/ContentExtractors/*.cs, src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/*, src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorStrategy.cs
Extractors now inherit ContentExtractorBase, implement CreateUntypedExtractAsync, return UntypedExtract; strategy captures extract into local to log ExtractType.
Instruction Provider & Files
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs, src/Elzik.Breef.Infrastructure/SummarisationInstructions/*.md
New file-based provider loads/validates per-type .md instruction files (HtmlContent, RedditPostContent, SubredditContent).
Summariser API & Implementation
src/Elzik.Breef.Domain/IContentSummariser.cs, src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs
SummariseAsync signature extended to accept instructions; AiContentSummariser removes options dependency and initializes chat context from provided instructions.
Application Integration
src/Elzik.Breef.Application/BreefGenerator.cs, src/Elzik.Breef.Api/Program.cs
BreefGenerator now depends on IContentSummarisationInstructionProvider and fetches instructions by ExtractType; Program registers FileBasedContentSummarisationInstructionProvider and removes options binding.
Project / Build Files
src/Elzik.Breef.Infrastructure/Elzik.Breef.Infrastructure.csproj, tests/*/*.csproj
Adds copy/publish rules to include SummarisationInstructions/*.md in build/test outputs; test csproj entries updated accordingly.
Tests: Extractors & Provider
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/*, tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs, tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/HtmlContentExtractorTests.cs
New unit tests for ContentExtractorBase naming and ExtractType derivation; integration tests for instruction provider file loading/validation; extractor tests assert ExtractType.
Tests: Summariser Updates
tests/Elzik.Breef.Infrastructure.Tests.Integration/AiContentSummariserTests.cs, tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentSummariserTests.cs
Tests updated to construct AiContentSummariser without options and pass instructions to SummariseAsync.
Health Tests Refactor
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsBase.cs, tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs, tests/Elzik.Breef.Api.Tests.Functional/HealthTestsNative.cs
Replaces previous health test with a base class and separate Docker/native implementations; Docker test adjusts startup/disposal behavior.
Removed / Cleaned
src/Elzik.Breef.Infrastructure/AI/AiContentSummariserOptions.cs, tests/Elzik.Breef.Infrastructure.Tests.Integration/AiContentSummariserOptionsTests.cs, src/Elzik.Breef.Api/Elzik.Breef.Api.http, tests/Elzik.Breef.Api.Tests.Functional/HealthEndpointTests.cs
Deletes options class and related tests; removes outdated HTTP contract file and an earlier health test.
Samples & Solution
tests/SampleRequests/Elzik.Breef.Api.http, Elzik.Breef.sln, README.md
Adds sample requests file and solution item; updates README to remove AiContentSummariserOptions docs.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant BreefGenerator
    participant Strategy as ContentExtractorStrategy
    participant Extractor as ContentExtractorBase
    participant InstructionProvider
    participant Summariser

    Client->>BreefGenerator: GenerateAsync(url)
    BreefGenerator->>Strategy: ExtractAsync(url)
    Strategy->>Extractor: ExtractAsync(url)
    Extractor->>Extractor: CreateUntypedExtractAsync(url)
    Extractor->>Extractor: GetExtractType() (class name -> type)
    Extractor->>Extractor: WithType(extractType)
    Extractor-->>Strategy: Extract{Title, Content, ExtractType}
    Strategy-->>BreefGenerator: Extract

    BreefGenerator->>InstructionProvider: GetInstructions(extract.ExtractType)
    InstructionProvider-->>BreefGenerator: instructions (loaded from .md)

    BreefGenerator->>Summariser: SummariseAsync(extract.Content, instructions)
    Summariser-->>BreefGenerator: summary
    BreefGenerator-->>Client: Breef{summary,...}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Focus review on:
    • ContentExtractorBase naming-suffix logic and null/error messages in extraction flow
    • FileBasedContentSummarisationInstructionProvider file I/O, validation, and case-insensitivity behavior
    • All call sites updated for the new Extract constructor (ExtractType) and IContentSummariser.SummariseAsync signature
    • AiContentSummariser change to ensure instructions fully replace previous options-driven prompt semantics

Possibly related PRs

  • Summarise Subreddits #71 — Overlaps extractor subsystem refactor touching IContentExtractor, ContentExtractorStrategy, and extractor implementations; likely directly related to the extractor-type changes in this PR.

Poem

🐰 From burrowed files I fetched the rhyme,
Class names whisper types in tidy time.
Markdown instructions, crisp and clear,
Guide the summariser’s listening ear.
Hoppity hop—content trimmed, concise, sublime! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "Add Per-extractor Instructions" directly and clearly summarizes the main change in the changeset. The core objective is to introduce a new mechanism for loading and providing summarization instructions on a per-extractor-type basis, which is evidenced by: the addition of IContentSummarisationInstructionProvider and FileBasedContentSummarisationInstructionProvider to load instructions from markdown files keyed by extract type; the creation of markdown instruction files for each supported content type (HtmlContent, RedditPostContent, SubredditContent); the refactoring of ContentExtractorBase to automatically determine extract type from class name; and the integration of these instructions into the BreefGenerator and IContentSummariser pipeline. The title accurately captures this primary objective with concise and specific language.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch per-extractor-instructions

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ceb78c and caebaef.

📒 Files selected for processing (1)
  • tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-ubuntu
  • GitHub Check: build-ubuntu
  • GitHub Check: Analyze (csharp)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentSummariserTests.cs (1)

34-43: Fix the mocked method name to match the implementation.

The mock is configured for GetChatMessageContentsAsync (plural), but the implementation calls GetChatMessageContentAsync (singular). This mismatch will cause all tests to fail because the mock won't intercept the actual call.

Apply this diff to fix the method name:

-        _ = _chatCompletionService.GetChatMessageContentsAsync(
+        _ = _chatCompletionService.GetChatMessageContentAsync(
             Arg.Is<ChatHistory>(ch => ch.Any(
                 m => m.Content != null && 
                 m.Content.EndsWith(_testContent) && 
                 m.Role == AuthorRole.User)),
             Arg.Any<PromptExecutionSettings>(),
             Arg.Any<Kernel>(),
             Arg.Any<CancellationToken>())
-            .Returns(Task.FromResult<IReadOnlyList<ChatMessageContent>>(
-                [_testSummaryResult]));
+            .Returns(_testSummaryResult);
src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs (1)

3-3: Remove unused import.

The Microsoft.Extensions.Options namespace is no longer used after removing the options-based configuration.

Apply this diff:

 using Elzik.Breef.Domain;
 using Microsoft.Extensions.Logging;
-using Microsoft.Extensions.Options;
 using Microsoft.SemanticKernel.ChatCompletion;
🧹 Nitpick comments (8)
tests/Elzik.Breef.Api.Tests.Integration/Elzik.Breef.Api.Tests.Integration.csproj (1)

42-47: Include instructions on publish as well

Add CopyToPublishDirectory to keep instructions in published test artifacts.

   <ItemGroup>
     <None Include="..\..\src\Elzik.Breef.Infrastructure\SummarisationInstructions\**\*.md" 
 		  Link="SummarisationInstructions\%(RecursiveDir)%(Filename)%(Extension)">
       <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
+      <CopyToPublishDirectory>PreserveNewest</CopyToPublishDirectory>
   </None>
   </ItemGroup>
tests/Elzik.Breef.Infrastructure.Tests.Integration/Elzik.Breef.Infrastructure.Tests.Integration.csproj (1)

56-60: Publish instructions with test artifacts

Mirror the output behavior during publish.

   <ItemGroup>
     <None Include="..\..\src\Elzik.Breef.Infrastructure\SummarisationInstructions\**\*.md" Link="SummarisationInstructions\%(RecursiveDir)%(Filename)%(Extension)">
       <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
+      <CopyToPublishDirectory>PreserveNewest</CopyToPublishDirectory>
     </None>
   </ItemGroup>
tests/Elzik.Breef.Infrastructure.Tests.Integration/AiContentSummariserTests.cs (3)

28-32: Environment variable prefix: use a conventional uppercase prefix

Environment variables are typically uppercase on Linux/macOS. Using "breef_" can cause misses.

-          .AddEnvironmentVariables("breef_")
+          .AddEnvironmentVariables("BREEF_")

49-55: Assert instruction compliance, not just length

Strengthen the test to validate max 200 words and no code blocks per HtmlContent.md.

-      _testOutputHelper.WriteLine(summary);
-      summary.Length.ShouldBeGreaterThan(100, "because this is the minimum acceptable summary length");
+      _testOutputHelper.WriteLine(summary);
+      summary.ShouldNotBeNullOrWhiteSpace();
+      var wordCount = summary.Split(new[]{' ', '\n', '\t'}, StringSplitOptions.RemoveEmptyEntries).Length;
+      wordCount.ShouldBeLessThanOrEqualTo(200, "instructions cap summaries at 200 words");
+      summary.ShouldNotContain("```", "instructions disallow code blocks");

21-25: Make the integration test skippable when AI config is absent

Avoid hard failures on developer machines/CI without credentials by using SkippableTheory.

-    [Theory]
+    [SkippableTheory]
     [InlineData("TestHtmlPage-ExpectedContent.txt")]
     [InlineData("BbcNewsPage-ExpectedContent.txt")]
     public async Task Summarise_WithValidContent_ReturnsSummary(string testExtractedContentFile)
     {
-        // Arrange
+        // Arrange
+        Skip.If(string.IsNullOrWhiteSpace(Environment.GetEnvironmentVariable("BREEF_AISERVICE__APIKEY")),
+            "AI credentials not configured: BREEF_AISERVICE__*");
src/Elzik.Breef.Domain/IContentSummariser.cs (1)

5-5: Add CancellationToken to the async contract

Expose cancellation in the public API to propagate request/HTTP timeouts and avoid hung calls. Updates required at 7 locations:

  • Interface: src/Elzik.Breef.Domain/IContentSummariser.cs (line 5)
  • Implementation: src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs (line 12)
  • Production call: src/Elzik.Breef.Application/BreefGenerator.cs (line 17)
  • Unit test calls: tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentSummariserTests.cs (lines 50, 63, 78)
  • Integration test call: tests/Elzik.Breef.Infrastructure.Tests.Integration/AiContentSummariserTests.cs (line 50)
-        Task<string> SummariseAsync(string content, string instructions);
+        Task<string> SummariseAsync(string content, string instructions, CancellationToken cancellationToken = default);
src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs (2)

15-19: Reconsider the hardcoded formatting instructions.

With the introduction of per-extractor instructions from external markdown files, the hardcoded formattingInstructions creates a dual source of truth for summarization behavior. This reduces the flexibility that external instructions are meant to provide.

Consider either:

  1. Moving the HTML formatting requirement into the external instruction files (e.g., HtmlContent.md)
  2. Removing the hardcoded formatting if it should be controlled by the external instructions

Are the hardcoded formatting instructions intentionally meant to supplement the external instructions, or should all formatting guidance be controlled by the external instruction files?


12-14: Consider validating the content parameter as well.

While instructions is properly validated, the content parameter is not. Empty or whitespace content would result in a meaningless summarization request.

Apply this diff to add content validation:

 public async Task<string> SummariseAsync(string content, string instructions)
 {
+    ArgumentNullException.ThrowIfNullOrWhiteSpace(content);
     ArgumentNullException.ThrowIfNullOrWhiteSpace(instructions);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1aeabe9 and aa57029.

📒 Files selected for processing (9)
  • src/Elzik.Breef.Application/BreefGenerator.cs (1 hunks)
  • src/Elzik.Breef.Domain/IContentSummariser.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/Elzik.Breef.Infrastructure.csproj (1 hunks)
  • src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md (1 hunks)
  • tests/Elzik.Breef.Api.Tests.Integration/Elzik.Breef.Api.Tests.Integration.csproj (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/AiContentSummariserTests.cs (2 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/Elzik.Breef.Infrastructure.Tests.Integration.csproj (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentSummariserTests.cs (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md (1)
src/Elzik.Breef.Infrastructure/AI/AiContentSummariserOptions.cs (1)
  • AiContentSummariserOptions (5-12)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentSummariserTests.cs (1)
src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs (1)
  • AiContentSummariser (8-40)
tests/Elzik.Breef.Infrastructure.Tests.Integration/AiContentSummariserTests.cs (3)
src/Elzik.Breef.Infrastructure/AI/AiServiceOptions.cs (1)
  • AiServiceOptions (5-21)
tests/Elzik.Breef.Infrastructure.Tests.Integration/TestOutputLoggerProvider.cs (1)
  • TestOutputLoggerProvider (6-19)
src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs (1)
  • AiContentSummariser (8-40)
src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs (3)
src/Elzik.Breef.Application/BreefGenerator.cs (1)
  • Task (10-24)
src/Elzik.Breef.Domain/IContentSummariser.cs (1)
  • Task (5-5)
src/Elzik.Breef.Api/Program.cs (1)
  • Task (26-144)
🪛 LanguageTool
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md

[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...t include the title in the response. - Do not include any metadata in the respons...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~8-~8: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nclude any metadata in the response. - Do not include a code block in the respons...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.18.1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md

2-2: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


3-3: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


4-4: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


5-5: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


6-6: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


7-7: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


8-8: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-ubuntu
  • GitHub Check: Analyze (csharp)
  • GitHub Check: build-ubuntu

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md (1)

6-8: Optional: Consider varying sentence structure to reduce repetition.

Lines 6–8 consecutively begin with "Do not", which LanguageTool flags as repetitive. While this consistency is intentional and aids clarity, you could reword some items for stylistic variety if preferred:

-  - Do not include a link to the original article.
-  - Do not include the title in the response.
-  - Do not include any metadata in the response.
-  - Do not include a code block in the response.
+  - Exclude links to the original article.
+  - Omit the title from the response.
+  - Avoid including any metadata.
+  - Do not include a code block in the response.

This is a minor style suggestion and not a blocker.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa57029 and bd25ca5.

📒 Files selected for processing (1)
  • src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md

[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ot include the title in the response. - Do not include any metadata in the respons...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~8-~8: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...include any metadata in the response. - Do not include a code block in the respons...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-ubuntu
  • GitHub Check: build-ubuntu
  • GitHub Check: Analyze (csharp)
🔇 Additional comments (1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md (1)

1-8: Good work addressing the typos and correcting the grammar.

The changes from the previous review have been applied: "then200" is now correctly "than 200" (line 3) and the phrasing is improved. The instructions are clear and prescriptive.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (1)

13-26: Make preview image URL absolute and consider srcset/data- sources*

Largest image selection ignores srcset and data-* attributes and may return a relative URL. Resolve to absolute against webPageUrl and consider common lazy-load attributes.

Apply this focused refactor:

-        var largestImageUrl = GetLargestImageUrl(htmlDocument);
+        var largestImageUrl = GetPreviewImageUrl(htmlDocument, webPageUrl);
@@
-    private static string? GetLargestImageUrl(HtmlDocument htmlDocument)
+    private static string? GetPreviewImageUrl(HtmlDocument htmlDocument, string baseUrl)
     {
         var imageNodes = htmlDocument.DocumentNode.SelectNodes("//img");
         if (imageNodes == null || imageNodes.Count == 0)
         {
             return null;
         }
-        var imageNodesSortedBySize = imageNodes
-            .Select(node => new
+        static string PickUrl(HtmlAgilityPack.HtmlNode n)
+        {
+            // Prefer explicit src; fall back to common lazy-load attributes; then srcset first candidate.
+            var src = n.GetAttributeValue("src", null)
+                   ?? n.GetAttributeValue("data-src", null)
+                   ?? n.GetAttributeValue("data-lazy-src", null)
+                   ?? n.GetAttributeValue("data-original", null);
+            if (string.IsNullOrWhiteSpace(src))
+            {
+                var srcset = n.GetAttributeValue("srcset", null);
+                if (!string.IsNullOrWhiteSpace(srcset))
+                {
+                    // take the first candidate
+                    src = srcset.Split(',')[0].Trim().Split(' ')[0];
+                }
+            }
+            return src ?? string.Empty;
+        }
+
+        static bool TryMakeAbsolute(string baseUrl, string imageUrl, out string absolute)
+        {
+            absolute = string.Empty;
+            if (string.IsNullOrWhiteSpace(imageUrl)) return false;
+            if (imageUrl.StartsWith("data:", StringComparison.OrdinalIgnoreCase) ||
+                imageUrl.StartsWith("javascript:", StringComparison.OrdinalIgnoreCase))
+                return false;
+            if (Uri.TryCreate(imageUrl, UriKind.Absolute, out var abs))
+            {
+                absolute = abs.AbsoluteUri; return true;
+            }
+            if (Uri.TryCreate(baseUrl, UriKind.Absolute, out var baseUri) &&
+                Uri.TryCreate(baseUri, imageUrl, out var resolved))
+            {
+                absolute = resolved.AbsoluteUri; return true;
+            }
+            return false;
+        }
+
+        var imageNodesSortedBySize = imageNodes
+            .Select(node =>
             {
-                Node = node,
-                Width = int.TryParse(node.GetAttributeValue("width", "0"), out var width) ? width : 0,
-                Height = int.TryParse(node.GetAttributeValue("height", "0"), out var height) ? height : 0,
-                ImageUrl = node.GetAttributeValue("src", string.Empty)
-            })
-            .Where(n => !string.IsNullOrWhiteSpace(n.ImageUrl))
+                var width = int.TryParse(node.GetAttributeValue("width", "0"), out var w) ? w : 0;
+                var height = int.TryParse(node.GetAttributeValue("height", "0"), out var h) ? h : 0;
+                var candidate = PickUrl(node);
+                return new { Width = width, Height = height, CandidateUrl = candidate };
+            })
+            .Select(n =>
+            {
+                var ok = TryMakeAbsolute(baseUrl, n.CandidateUrl, out var abs);
+                return new { n.Width, n.Height, ImageUrl = ok ? abs : string.Empty };
+            })
+            .Where(n => !string.IsNullOrWhiteSpace(n.ImageUrl))
             .OrderByDescending(img => img.Width * img.Height);
 
         return imageNodesSortedBySize.FirstOrDefault()?.ImageUrl;
     }

Also applies to: 66-86

src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (2)

36-43: Normalize subreddit URL using Uri APIs; avoid string-based trailing slash

String concatenation may mis-handle URLs with query/fragment. Use GetLeftPart + TrimEnd.

-        var webPageUri = new Uri(webPageUrl.EndsWith(UrlPathSeparator) ? webPageUrl : webPageUrl + UrlPathSeparator, UriKind.Absolute);
+        var originalUri = new Uri(webPageUrl, UriKind.Absolute);
+        var normalized = originalUri.GetLeftPart(UriPartial.Path).TrimEnd(UrlPathSeparator) + UrlPathSeparator;
+        var webPageUri = new Uri(normalized, UriKind.Absolute);

75-79: Avoid downloading full images when verifying; use HEAD or headers-only GET

Current GetAsync downloads the body, which is expensive and slow. Prefer HEAD, and if unsupported, fall back to GET with ResponseHeadersRead.

-                    var client = _httpClientFactory.CreateClient("BreefDownloader");
-                    using var response = await client.GetAsync(imageUrl);
-                    if (response.IsSuccessStatusCode)
-                    {
-                        return imageUrl;
-                    }
+                    var client = _httpClientFactory.CreateClient("BreefDownloader");
+                    // Try HEAD first
+                    using var head = new HttpRequestMessage(HttpMethod.Head, imageUrl);
+                    using var headResp = await client.SendAsync(head, HttpCompletionOption.ResponseHeadersRead);
+                    if (headResp.IsSuccessStatusCode) return imageUrl;
+                    // Fallback to GET without buffering content
+                    using var getResp = await client.GetAsync(imageUrl, HttpCompletionOption.ResponseHeadersRead);
+                    if (getResp.IsSuccessStatusCode) return imageUrl;
🧹 Nitpick comments (6)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorStrategy.cs (1)

28-39: Logging addition looks good; consider guarding selector.

If the default extractor’s CanHandle ever stops returning true, .First throws. Prefer explicit guard.

-            var extractor = _extractors.First(e => e.CanHandle(webPageUrl));
+            var extractor = _extractors.FirstOrDefault(e => e.CanHandle(webPageUrl))
+                ?? throw new InvalidOperationException("No extractor could handle the URL and no default extractor was available.");
src/Elzik.Breef.Infrastructure/ContentExtractors/UntypedExtract.cs (1)

7-8: Harden WithType against blank extractType.

Defensive check avoids bad data if called directly.

-    public Extract WithType(string extractType) 
-        => new(Title, Content, PreviewImageUrl, extractType);
+    public Extract WithType(string extractType)
+    {
+        if (string.IsNullOrWhiteSpace(extractType))
+            throw new ArgumentException("extractType cannot be null or whitespace.", nameof(extractType));
+        return new(Title, Content, PreviewImageUrl, extractType);
+    }
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (1)

11-14: Harden constructor/options and API response handling; add stable fallbacks.

Prevents NREs and ensures Extract always has a title/image.

-    private readonly RedditOptions _redditOptions = redditOptions.Value;
+    private readonly RedditOptions _redditOptions =
+        (redditOptions ?? throw new ArgumentNullException(nameof(redditOptions))).Value;
@@
-        var postId = segments[3];
-        var post = await redditPostClient.GetPost(postId);
-
-        if (string.IsNullOrWhiteSpace(post.Post.ImageUrl))
-        {
-            var subredditName = segments[1];
-            post.Post.ImageUrl = await subredditImageExtractor.GetSubredditImageUrlAsync(subredditName);
-        }
-
-        var postJson = JsonSerializer.Serialize(post);
-
-        return new UntypedExtract(post.Post.Title, postJson, post.Post.ImageUrl);
+        var postId = segments[3];
+        var post = await redditPostClient.GetPost(postId);
+        if (post?.Post is null)
+            throw new InvalidOperationException($"Post '{postId}' not found or malformed response.");
+
+        var subredditName = segments[1];
+        var imageUrl = !string.IsNullOrWhiteSpace(post.Post.ImageUrl)
+            ? post.Post.ImageUrl
+            : await subredditImageExtractor.GetSubredditImageUrlAsync(subredditName);
+
+        var title = string.IsNullOrWhiteSpace(post.Post.Title)
+            ? $"Reddit post {postId}"
+            : post.Post.Title;
+
+        var postJson = JsonSerializer.Serialize(post);
+        return new UntypedExtract(title, postJson, imageUrl);

Also applies to: 34-73

src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (1)

28-39: Optional: normalize and de-entitize extracted content

To reduce boilerplate and entities in content, consider HtmlEntity.DeEntitize and basic whitespace normalization.

-        var content = mainContentNode != null
-            ? mainContentNode.InnerText.Trim() : "Content not found.";
+        var content = mainContentNode != null
+            ? HtmlEntity.DeEntitize(System.Text.RegularExpressions.Regex.Replace(
+                mainContentNode.InnerText, @"\s+", " ")).Trim()
+            : "Content not found."
+            ;
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (1)

9-18: Optional: cache derived extract type once

Compute extract type in ctor and reuse to avoid repeated slicing and reflection.

-    private const string RequiredSuffix = "Extractor";
+    private const string RequiredSuffix = "Extractor";
+    private readonly string _extractType;
@@
-        var typeName = GetType().Name;
+        var typeName = GetType().Name;
         if (!typeName.EndsWith(RequiredSuffix, StringComparison.Ordinal))
         {
             throw new InvalidOperationException(
                 $"Content extractor class '{typeName}' must end with '{RequiredSuffix}' suffix. " +
                 $"This convention is required to derive the ExtractType for domain objects.");
         }
+        _extractType = typeName[..^RequiredSuffix.Length];
@@
-        var extractType = GetExtractType();
+        var extractType = _extractType;
@@
-    private string GetExtractType()
-    {
-        var typeName = GetType().Name;
-
-        return typeName[..^RequiredSuffix.Length];
-    }
+    private string GetExtractType() => _extractType;

Also applies to: 32-37

src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1)

49-53: Optional: broaden supported Reddit hostnames via options

If you intend to support old.reddit.com or other mirrors, add them to RedditOptions.AdditionalBaseAddresses, or match subdomains by suffix.

Also applies to: 85-86

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ab917f and 408c1ce.

📒 Files selected for processing (10)
  • src/Elzik.Breef.Domain/Extract.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorStrategy.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (2 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (4 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (3 hunks)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/UntypedExtract.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/HtmlContentExtractorTests.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorStrategyTests.cs (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (5)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
  • RedditOptions (5-33)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (5)
  • ContentExtractorBase (5-38)
  • ContentExtractorBase (9-18)
  • CanHandle (20-20)
  • Task (22-28)
  • Task (30-30)
src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (2)
  • CanHandle (8-11)
  • Task (13-26)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (2)
  • CanHandle (15-32)
  • Task (34-73)
src/Elzik.Breef.Api/Program.cs (1)
  • Task (26-144)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (2)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorStrategy.cs (2)
  • CanHandle (26-26)
  • Task (28-39)
src/Elzik.Breef.Infrastructure/ContentExtractors/UntypedExtract.cs (1)
  • Extract (7-8)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (4)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (5)
  • ContentExtractorBase (5-38)
  • ContentExtractorBase (9-18)
  • CanHandle (20-20)
  • Task (22-28)
  • Task (30-30)
src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (2)
  • CanHandle (8-11)
  • Task (13-26)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (4)
  • CanHandle (18-34)
  • Task (36-47)
  • Task (49-53)
  • Task (55-86)
src/Elzik.Breef.Domain/IContentExtractor.cs (2)
  • CanHandle (5-5)
  • Task (7-7)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorStrategyTests.cs (4)
src/Elzik.Breef.Infrastructure/ContentExtractors/UntypedExtract.cs (1)
  • Extract (7-8)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (3)
  • Task (22-28)
  • Task (30-30)
  • CanHandle (20-20)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorStrategy.cs (2)
  • Task (28-39)
  • CanHandle (26-26)
src/Elzik.Breef.Domain/IContentExtractor.cs (1)
  • CanHandle (5-5)
src/Elzik.Breef.Domain/Extract.cs (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/UntypedExtract.cs (1)
  • Extract (7-8)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (5)
  • Task (22-28)
  • Task (30-30)
  • ContentExtractorBase (5-38)
  • ContentExtractorBase (9-18)
  • CanHandle (20-20)
src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (3)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (5)
  • ContentExtractorBase (5-38)
  • ContentExtractorBase (9-18)
  • CanHandle (20-20)
  • Task (22-28)
  • Task (30-30)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorStrategy.cs (2)
  • CanHandle (26-26)
  • Task (28-39)
src/Elzik.Breef.Domain/IContentExtractor.cs (2)
  • CanHandle (5-5)
  • Task (7-7)
🔇 Additional comments (8)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/HtmlContentExtractorTests.cs (1)

44-44: Assertion aligns with ExtractType convention.

Matches base class derivation; good signal in integration coverage.

tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs (1)

1-124: Solid coverage of naming guard and ExtractType derivation.

Tests exercise success/failure paths and data integrity; tight and readable.

src/Elzik.Breef.Domain/Extract.cs (1)

3-3: Domain shape change acknowledged.

Looks good; verify all creators/serializers are updated to include ExtractType.

tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorStrategyTests.cs (2)

55-61: Log assertions are precise and resilient.

Checking levels and message starts is a good balance vs. full template matching.

Also applies to: 78-84, 103-109, 129-135


170-172: Null-arg guard tests read well.

Exception messages asserted explicitly; helpful for regressions.

Also applies to: 186-188

src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (1)

15-32: CanHandle logic looks correct for Reddit post URLs.

Domain allowlist + path segments check is straightforward.

src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (1)

8-12: Confirm fallback-only usage of always-true CanHandle

HtmlContentExtractor.CanHandle returns true, so strategy order must ensure specialized extractors run first and this acts as the fallback. Looks fine given Program.cs wiring; just confirming intent.

src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1)

18-34: CanHandle logic looks solid for subreddit roots

Domain allow-list + “/r/{name}” path check reads well. No changes needed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs (1)

103-151: LGTM!

The test helper classes are well-designed and serve clear purposes:

  • Each class tests a specific scenario (valid naming, multiple types, compound names, invalid naming, null handling)
  • Minimal implementations appropriate for unit tests
  • The use of null! on line 149 correctly suppresses nullable warnings while explicitly testing null-return handling

Optional: Consider edge case testing.

The test coverage is comprehensive for the core functionality. You might consider adding tests for edge cases like:

  • An extractor named exactly "Extractor" (though this would be unusual in practice)
  • Verifying behavior with various URL formats (though this may be better suited for integration tests of concrete extractors)

These are nice-to-have additions rather than critical gaps.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 408c1ce and 5076170.

📒 Files selected for processing (3)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/Elzik.Breef.Infrastructure.csproj (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs
🧰 Additional context used
🧬 Code graph analysis (1)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (5)
  • Task (22-35)
  • Task (37-37)
  • ContentExtractorBase (5-45)
  • ContentExtractorBase (9-18)
  • CanHandle (20-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-ubuntu
  • GitHub Check: build-ubuntu
  • GitHub Check: Analyze (csharp)
🔇 Additional comments (9)
src/Elzik.Breef.Infrastructure/Elzik.Breef.Infrastructure.csproj (1)

30-36: ✅ SummarisationInstructions ItemGroup properly configured.

The new ItemGroup correctly includes markdown resources with both CopyToOutputDirectory and CopyToPublishDirectory set to PreserveNewest. This ensures the markdown instruction files are copied to both build output and publish directories, addressing the critical issue flagged in previous review comments about missing publish-time inclusion.

The path resolution is correct: relative to the project directory, the pattern SummarisationInstructions\**\*.md will match the HtmlContent.md file mentioned in the AI summary, and the Link attribute preserves the folder structure at runtime.

tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs (8)

1-6: LGTM!

The using statements and namespace declaration are appropriate and follow standard conventions.


9-14: LGTM!

The test correctly validates that a properly-named extractor (ending with "Extractor" suffix) can be instantiated without throwing.


16-24: LGTM!

The test comprehensively validates the naming convention enforcement, checking both that the exception is thrown and that the message contains all relevant details (class name, suffix requirement, and rationale).


26-38: LGTM!

The test correctly validates that ExtractType is derived by removing the "Extractor" suffix from the class name.


40-54: LGTM!

The test thoroughly validates that all UntypedExtract properties (Title, Content, PreviewImageUrl) are preserved during the conversion to Extract.


56-70: LGTM!

The test validates that different extractor classes produce distinct ExtractType values, ensuring the type derivation is instance-specific and works correctly for multiple extractor types.


72-83: LGTM!

The test validates suffix removal with a realistic extractor name pattern, ensuring the logic works correctly for compound names like those used in production code.


85-101: LGTM!

The test comprehensively validates the null-handling error path, checking both that the appropriate exception is thrown and that the message includes all relevant context (operation, URL, extractor type, and requirement).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (1)

190-190: Optional: Null-conditional operator is unnecessary here.

Since _client is a non-nullable field that's always initialized in the constructor (line 28), the null-conditional operator is technically unnecessary. However, this is defensive programming and the current code is safe.

If you prefer directness:

-        _client?.Dispose();
+        _client.Dispose();
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5076170 and e3a40b0.

📒 Files selected for processing (1)
  • tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (1)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (1)
  • Dispose (219-222)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-ubuntu
  • GitHub Check: build-ubuntu
  • GitHub Check: Analyze (csharp)
🔇 Additional comments (2)
tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (2)

28-31: LGTM: HttpClient timeout improves test reliability.

Adding a 30-second timeout to the HttpClient is appropriate for functional tests involving Docker containers and prevents hanging requests. The timeout value is consistent with ContainerStartTimeoutSeconds.


192-196: LGTM: Defensive null check prevents disposal errors.

The conditional disposal correctly handles the case where _testContainer may be null (when Docker is unavailable). This prevents potential null reference exceptions during cleanup.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/Elzik.Breef.Api/Program.cs (1)

126-133: Consider deriving extract types dynamically for better maintainability.

The registration correctly wires up the instruction provider. However, the hardcoded list of extract type names may require manual updates when new extractors are added.

Consider deriving extract types dynamically from registered extractors or configuration to reduce maintenance burden:

builder.Services.AddSingleton<IContentSummarisationInstructionProvider>(sp =>
{
    var logger = sp.GetRequiredService<ILogger<FileBasedContentSummarisationInstructionProvider>>();
    
    // Option 1: Derive from assembly scanning of IContentExtractor implementations
    // Option 2: Load from configuration
    var extractTypes = new[] { "HtmlContent", "RedditPostContent", "SubredditContent" };
    
    return new FileBasedContentSummarisationInstructionProvider(
        logger,
        Path.Combine(AppContext.BaseDirectory, "SummarisationInstructions"),
        extractTypes);
});
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e3a40b0 and 6b89fec.

📒 Files selected for processing (7)
  • src/Elzik.Breef.Api/Program.cs (1 hunks)
  • src/Elzik.Breef.Application/BreefGenerator.cs (1 hunks)
  • src/Elzik.Breef.Domain/IContentSummarisationInstructionProvider.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md (1 hunks)
  • src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md (1 hunks)
  • tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
src/Elzik.Breef.Domain/IContentSummarisationInstructionProvider.cs (1)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (1)
  • GetInstructions (49-64)
src/Elzik.Breef.Api/Program.cs (1)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (2)
  • FileBasedContentSummarisationInstructionProvider (6-65)
  • FileBasedContentSummarisationInstructionProvider (10-47)
tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs (1)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (2)
  • FileBasedContentSummarisationInstructionProvider (6-65)
  • FileBasedContentSummarisationInstructionProvider (10-47)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (1)
src/Elzik.Breef.Domain/IContentSummarisationInstructionProvider.cs (1)
  • GetInstructions (5-5)
src/Elzik.Breef.Application/BreefGenerator.cs (5)
src/Elzik.Breef.Api/Program.cs (1)
  • Task (26-153)
src/Elzik.Breef.Domain/IContentSummariser.cs (1)
  • Task (5-5)
src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs (1)
  • Task (12-39)
src/Elzik.Breef.Domain/IContentSummarisationInstructionProvider.cs (1)
  • GetInstructions (5-5)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (1)
  • GetInstructions (49-64)
🪛 LanguageTool
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md

[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...t include the title in the response. - Do not include any metadata in the respons...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~8-~8: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nclude any metadata in the response. - Do not include a code block in the respons...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md

[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...t include the title in the response. - Do not include any metadata in the respons...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~8-~8: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nclude any metadata in the response. - Do not include a code block in the respons...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.18.1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md

2-2: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


3-3: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


4-4: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


5-5: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


6-6: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


7-7: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


8-8: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md

2-2: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


3-3: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


4-4: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


5-5: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


6-6: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


7-7: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


8-8: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-ubuntu
  • GitHub Check: build-ubuntu
  • GitHub Check: Analyze (csharp)
🔇 Additional comments (4)
src/Elzik.Breef.Domain/IContentSummarisationInstructionProvider.cs (1)

3-6: LGTM!

The interface provides a clear contract for retrieving type-specific summarisation instructions. The design is simple and appropriate for the use case.

tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs (1)

7-74: LGTM!

The integration tests provide comprehensive coverage of the FileBasedContentSummarisationInstructionProvider:

  • Validates successful instantiation with all required files present
  • Validates failure scenarios for missing files, missing instructions, and missing directory
  • Includes proper cleanup of temporary test directories
src/Elzik.Breef.Application/BreefGenerator.cs (1)

5-24: LGTM! Clean abstraction resolves previous review concerns.

The refactoring cleanly abstracts instruction retrieval through IContentSummarisationInstructionProvider, eliminating direct file access from BreefGenerator. The provider handles all validation and error scenarios, making this implementation robust and testable.

The past review comment suggesting file existence checks is now resolved by the abstraction layer.

src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (1)

6-65: LGTM! Excellent validation and error handling.

The implementation provides robust validation with clear error messages:

  • Validates directory existence on construction
  • Validates required extract type names are provided
  • Tracks and reports all missing instruction files in a single error
  • Validates that loaded instructions are not empty
  • Uses case-insensitive lookup for extract type names
  • Logs successful template loads for observability

The use of synchronous File.ReadAllText in the constructor is appropriate since constructors cannot be async.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md (2)

2-8: Fix markdown list indentation.

The list items (lines 2–8) are indented with a leading space, but markdown linters expect unordered lists to start at column 0. Remove the leading space from each bullet point (same as RedditPostContent.md).

Apply this diff to fix indentation:

-You are an expert summarizer. Your task is to summarize the provided text:
- - Summarise text, including HTML entities.
- - Limit summaries to 10% of the original length but never more than 200 words.
- - Ensure accurate attribution of information to the correct entities.
- - Do not include a link to the original articles.
- - Do not include the title in the response.
- - Do not include any metadata in the response.
- - Do not include a code block in the response.
+You are an expert summarizer. Your task is to summarize the provided text:
+- Summarise text, including HTML entities.
+- Limit summaries to 10% of the original length but never more than 200 words.
+- Ensure accurate attribution of information to the correct entities.
+- Do not include a link to the original articles.
+- Do not include the title in the response.
+- Do not include any metadata in the response.
+- Do not include a code block in the response.

3-3: Fix typo: "then" should be "than".

Line 3 contains the same spelling error as RedditPostContent.md. The phrase should read "but never more than 200 words" (not "then").

Apply this diff:

- - Limit summaries to 10% of the original length but never more then 200 words.
+ - Limit summaries to 10% of the original length but never more than 200 words.
src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md (1)

3-3: Fix typo: "then" should be "than".

Line 3 contains a spelling error that affects instruction clarity. The phrase should read "but never more than 200 words" (not "then").

Apply this diff:

- - Limit summaries to 10% of the original length but never more then 200 words.
+ - Limit summaries to 10% of the original length but never more than 200 words.
🧹 Nitpick comments (1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md (1)

2-8: Fix markdown list indentation.

The list items (lines 2–8) are indented with a leading space, but markdown linters expect unordered lists to start at column 0. Remove the leading space from each bullet point.

Apply this diff to fix indentation:

-You are an expert summarizer. Your task is to summarize the provided text:
- - Summarise text, including HTML entities.
- - Limit summaries to 10% of the original length but never more than 200 words.
- - Ensure accurate attribution of information to the correct entities.
- - Do not include a link to the original articles.
- - Do not include the title in the response.
- - Do not include any metadata in the response.
- - Do not include a code block in the response.
+You are an expert summarizer. Your task is to summarize the provided text:
+- Summarise text, including HTML entities.
+- Limit summaries to 10% of the original length but never more than 200 words.
+- Ensure accurate attribution of information to the correct entities.
+- Do not include a link to the original articles.
+- Do not include the title in the response.
+- Do not include any metadata in the response.
+- Do not include a code block in the response.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b89fec and 8dbfd16.

📒 Files selected for processing (5)
  • Elzik.Breef.Api.http (1 hunks)
  • Elzik.Breef.sln (2 hunks)
  • src/Elzik.Breef.Api/Elzik.Breef.Api.http (0 hunks)
  • src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md (1 hunks)
  • src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md (1 hunks)
💤 Files with no reviewable changes (1)
  • src/Elzik.Breef.Api/Elzik.Breef.Api.http
✅ Files skipped from review due to trivial changes (1)
  • Elzik.Breef.sln
🧰 Additional context used
🪛 LanguageTool
src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md

[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...t include the title in the response. - Do not include any metadata in the respons...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~8-~8: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nclude any metadata in the response. - Do not include a code block in the respons...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md

[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...t include the title in the response. - Do not include any metadata in the respons...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~8-~8: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nclude any metadata in the response. - Do not include a code block in the respons...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.18.1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md

2-2: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


3-3: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


4-4: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


5-5: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


6-6: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


7-7: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


8-8: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md

2-2: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


3-3: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


4-4: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


5-5: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


6-6: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


7-7: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


8-8: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-ubuntu
  • GitHub Check: build-ubuntu
  • GitHub Check: Analyze (csharp)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
src/Elzik.Breef.Api/Program.cs (1)

122-129: Consider making extract type names configurable or discoverable.

The hardcoded list of extract type names ["HtmlContent", "RedditPostContent", "SubredditContent"] creates a maintenance burden. When new content extractors are added, this list must be manually updated, which is error-prone.

Consider one of the following approaches:

  1. Move the list to configuration (appsettings.json) so it can be updated without recompilation
  2. Discover extract type names dynamically by reflecting on registered extractors
  3. Add a comment explaining that this list must be kept in sync with registered extractors

</review_comment_end>

src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (2)

25-43: Consider validating for duplicate extract type names.

If requiredExtractTypeNames contains duplicate entries (case-insensitive), the later entry will silently overwrite the earlier one in the dictionary. While this may not be a practical issue, adding a check would make the validation more explicit and prevent confusion.

Consider adding after line 25:

 foreach (var extractTypeName in requiredExtractTypeNames)
 {
+    if (_templatesByKey.ContainsKey(extractTypeName))
+    {
+        throw new ArgumentException($"Duplicate extract type name '{extractTypeName}' in requiredExtractTypeNames.", nameof(requiredExtractTypeNames));
+    }
+
     var filePath = Path.Combine(instructionFileDirectoryPath, $"{extractTypeName}.md");

</review_comment_end>


46-55: Add parameter validation for extractTypeName.

The method doesn't validate the extractTypeName parameter. If null is passed, the dictionary will throw ArgumentNullException, and if an empty string is passed, the error message may be less helpful.

Add validation at the start of the method:

 public string GetInstructions(string extractTypeName)
 {
+    if (string.IsNullOrWhiteSpace(extractTypeName))
+    {
+        throw new ArgumentException("Extract type name cannot be null or whitespace.", nameof(extractTypeName));
+    }
+
     if (_templatesByKey.TryGetValue(extractTypeName, out var instructions))
     {
         return instructions;
     }

</review_comment_end>

tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs (1)

10-73: Add cleanup for temporary test directories.

The tests create temporary directories but don't clean them up after execution. While they delete existing directories at the start, test failures or exceptions could leave directories behind, and using fixed directory names may cause issues with parallel test execution.

Consider using IDisposable test fixtures or try-finally blocks:

public void Instantiated_AllRequiredFilesPresent_Succeeds()
{
    // Arrange
    var dir = Path.Combine(Path.GetTempPath(), $"SummarisationInstructions_{Guid.NewGuid()}");
    try
    {
        Directory.CreateDirectory(dir);
        File.WriteAllText(Path.Combine(dir, "HtmlContent.md"), "dummy");
        // ... rest of test
    }
    finally
    {
        if (Directory.Exists(dir)) Directory.Delete(dir, true);
    }
}

Alternatively, use a test fixture with IDisposable to manage cleanup automatically.
</review_comment_end>

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8dbfd16 and 10b6231.

📒 Files selected for processing (10)
  • README.md (0 hunks)
  • src/Elzik.Breef.Api/Program.cs (1 hunks)
  • src/Elzik.Breef.Infrastructure/AI/AiContentSummariserOptions.cs (0 hunks)
  • src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (1 hunks)
  • tests/Elzik.Breef.Api.Tests.Functional/HealthEndpointTests.cs (0 hunks)
  • tests/Elzik.Breef.Api.Tests.Functional/HealthTestsBase.cs (1 hunks)
  • tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs (1 hunks)
  • tests/Elzik.Breef.Api.Tests.Functional/HealthTestsNative.cs (1 hunks)
  • tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs (1 hunks)
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/AiContentSummariserOptionsTests.cs (0 hunks)
💤 Files with no reviewable changes (4)
  • tests/Elzik.Breef.Api.Tests.Functional/HealthEndpointTests.cs
  • src/Elzik.Breef.Infrastructure/AI/AiContentSummariserOptions.cs
  • README.md
  • tests/Elzik.Breef.Infrastructure.Tests.Integration/AiContentSummariserOptionsTests.cs
🧰 Additional context used
🧬 Code graph analysis (6)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsBase.cs (1)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs (2)
  • Task (130-153)
  • Task (155-164)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsNative.cs (1)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsBase.cs (1)
  • HealthTestsBase (7-34)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (2)
tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs (1)
  • IEnumerable (99-102)
src/Elzik.Breef.Domain/IContentSummarisationInstructionProvider.cs (1)
  • GetInstructions (5-5)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs (2)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsBase.cs (1)
  • HealthTestsBase (7-34)
tests/Elzik.Breef.Api.Tests.Functional/TestOutputHelperStream.cs (1)
  • TestOutputHelperStream (5-25)
src/Elzik.Breef.Api/Program.cs (1)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (2)
  • FileBasedContentSummarisationInstructionProvider (6-56)
  • FileBasedContentSummarisationInstructionProvider (10-44)
tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs (2)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (3)
  • FileBasedContentSummarisationInstructionProvider (6-56)
  • FileBasedContentSummarisationInstructionProvider (10-44)
  • GetInstructions (46-55)
src/Elzik.Breef.Domain/IContentSummarisationInstructionProvider.cs (1)
  • GetInstructions (5-5)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-ubuntu
  • GitHub Check: build-ubuntu
  • GitHub Check: Analyze (csharp)
🔇 Additional comments (9)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (1)

8-8: LGTM!

The case-insensitive dictionary is appropriate for extract type name lookups, and using readonly prevents reassignment.

</review_comment_end>

tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs (4)

75-97: LGTM with minor note!

Good use of Theory with multiple test cases for invalid input validation. The test properly validates the exception type, parameter name, and message content. The use of Guid.NewGuid() for the directory name is also a good practice to avoid conflicts.

Note: The cleanup concern mentioned in the previous comment applies here as well.

</review_comment_end>


99-102: LGTM!

Simple helper method for providing empty array test data. This works well with xUnit's MemberData attribute.

</review_comment_end>


105-125: LGTM with minor note!

Good test coverage for the GetInstructions method's error path. The test properly validates the exception message includes the type name that wasn't found.

Note: The cleanup concern mentioned in the earlier comment applies here as well.

</review_comment_end>


127-151: LGTM!

Excellent test coverage for empty/whitespace instruction files using Theory with multiple test cases. The assertions properly validate that the exception message contains both the error description and the filename.

Note: The cleanup concern mentioned in the earlier comment applies here as well.

</review_comment_end>

tests/Elzik.Breef.Api.Tests.Functional/HealthTestsBase.cs (1)

7-34: LGTM! Clean test abstraction.

The base class provides a well-structured template for health endpoint testing with appropriate customization points (HostPort, Client, skip support). The test follows the AAA pattern and handles response deserialization correctly.

tests/Elzik.Breef.Api.Tests.Functional/HealthTestsNative.cs (1)

5-17: LGTM! Standard WebApplicationFactory pattern.

The implementation correctly uses WebApplicationFactory for in-process testing. The stored factory reference ensures proper lifetime management aligned with IClassFixture semantics.

tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs (2)

23-61: LGTM! Container configuration is sound.

The constructor properly detects Docker availability, initializes resources, and configures the test container with appropriate environment variables and health check wait strategy.


130-164: LGTM! Lifecycle management is correct.

The async initialization properly handles container startup with timeout, and disposal correctly cleans up resources with appropriate null checks. The timeout exception message is helpful for diagnosing container startup issues.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (2)

95-124: Potential deadlock when reading process output after WaitForExit.

Reading StandardOutput and StandardError after WaitForExit() can cause a deadlock if the process writes more data than the buffer can hold. The process will block waiting for the buffer to be read, but WaitForExit() won't return until the process completes—creating a circular dependency. Docker build output can be verbose enough to trigger this.

Apply this diff to read output asynchronously:

 using var process = new Process { StartInfo = processStartInfo };

 process.Start();
+string standardOutputTask = process.StandardOutput.ReadToEndAsync().GetAwaiter().GetResult();
+string standardErrorTask = process.StandardError.ReadToEndAsync().GetAwaiter().GetResult();
 process.WaitForExit();

-string standardOutput = process.StandardOutput.ReadToEnd();
-string standardError = process.StandardError.ReadToEnd();

 if (process.ExitCode != 0)
 {
     throw new InvalidOperationException(
-        standardError + Environment.NewLine + standardOutput);
+        standardErrorTask + Environment.NewLine + standardOutputTask);
 }

-_testOutputHelper.WriteLine(standardOutput);
+_testOutputHelper.WriteLine(standardOutputTask);

Alternatively, for a cleaner async approach, start async reads before WaitForExit:

 using var process = new Process { StartInfo = processStartInfo };

 process.Start();
+var outputTask = process.StandardOutput.ReadToEndAsync();
+var errorTask = process.StandardError.ReadToEndAsync();
 process.WaitForExit();

-string standardOutput = process.StandardOutput.ReadToEnd();
-string standardError = process.StandardError.ReadToEnd();
+string standardOutput = outputTask.GetAwaiter().GetResult();
+string standardError = errorTask.GetAwaiter().GetResult();

126-154: Same deadlock risk as BuildDockerImage.

Although docker --version produces minimal output, the same deadlock pattern exists here. For consistency and safety, apply the same async read pattern.

Apply this diff:

 using var process = new Process { StartInfo = processStartInfo };
 process.Start();
+var outputTask = process.StandardOutput.ReadToEndAsync();
+var errorTask = process.StandardError.ReadToEndAsync();
 process.WaitForExit();

-string standardOutput = process.StandardOutput.ReadToEnd();
-string standardError = process.StandardError.ReadToEnd();
+string standardOutput = outputTask.GetAwaiter().GetResult();
+string standardError = errorTask.GetAwaiter().GetResult();
♻️ Duplicate comments (1)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs (1)

65-65: Fragile relative path dependency.

The relative path ./../../../../../build/api/build-docker.ps1 remains hardcoded and depends on test execution directory structure. This pattern appears in BreefTestsDocker.cs as well, suggesting it may be an accepted pattern in the codebase.

🧹 Nitpick comments (1)
tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (1)

157-162: Improved error detection using NativeErrorCode.

Checking NativeErrorCode against ERROR_FILE_NOT_FOUND is more reliable than string-based detection. Good improvement.

Consider moving the constant to class level if it might be reused:

 public class BreefTestsDocker : BreefTestsBase, IAsyncLifetime
 {
     private const string DockerImageName = "ghcr.io/elzik/elzik-breef-api:latest";
     private const int ContainerStartTimeoutSeconds = 30;
+    private const int ERROR_FILE_NOT_FOUND = 2;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10b6231 and 6ceb78c.

📒 Files selected for processing (4)
  • Elzik.Breef.sln (2 hunks)
  • tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (4 hunks)
  • tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs (1 hunks)
  • tests/SampleRequests/Elzik.Breef.Api.http (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/SampleRequests/Elzik.Breef.Api.http
🧰 Additional context used
🧬 Code graph analysis (2)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs (3)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsBase.cs (1)
  • HealthTestsBase (7-34)
tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (4)
  • DockerIsUnavailable (126-166)
  • BuildDockerImage (95-124)
  • Task (168-192)
  • Task (194-203)
tests/Elzik.Breef.Api.Tests.Functional/TestOutputHelperStream.cs (1)
  • TestOutputHelperStream (5-25)
tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (1)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (1)
  • Dispose (219-222)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-ubuntu
  • GitHub Check: build-ubuntu
  • GitHub Check: Analyze (csharp)
🔇 Additional comments (9)
Elzik.Breef.sln (2)

43-47: LGTM! Well-structured addition of sample requests folder.

The new SampleRequests solution folder is properly formatted and appropriately located under the tests directory. This organizational structure will help keep sample HTTP requests accessible for testing the new per-extractor instruction functionality.


96-96: Correct nesting configuration.

The SampleRequests folder is properly nested under the Tests solution folder, maintaining consistency with the solution's organizational structure.

tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (2)

28-31: LGTM! Explicit timeout prevents hanging tests.

The 30-second timeout aligns well with the container startup timeout and is appropriate for functional tests.


194-203: LGTM! Disposal pattern improved.

The updated logic correctly:

  • Always disposes the HttpClient
  • Only attempts container cleanup when Docker is available and the container was successfully initialized
  • Properly awaits async disposal operations

This prevents exceptions when the container is null and ensures proper resource cleanup.

tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs (5)

8-61: LGTM! Well-structured test class.

The class properly inherits from HealthTestsBase and implements IAsyncLifetime for test lifecycle management. The conditional setup based on Docker availability is a good pattern that prevents test failures in environments without Docker. The container configuration with dummy environment variables is appropriate for health check validation.


79-91: Great fix! Process output handling is now correct.

The previous issue where StandardOutput.ReadToEnd() was being called multiple times has been properly resolved. The code now reads both streams into variables after WaitForExit() and reuses those variables, ensuring output is captured correctly in both success and failure paths.


94-134: Excellent improvements! Both previous issues resolved.

The method now correctly:

  1. Calls WaitForExit() before reading the output streams (lines 110-113), preventing incomplete reads
  2. Uses NativeErrorCode comparison instead of message string matching (lines 125-127), making the detection robust across locales and Windows versions

These changes align with the pattern established in BreefTestsDocker.cs and follow best practices.


136-159: LGTM! Proper async lifecycle with timeout handling.

The initialization logic correctly implements IAsyncLifetime with appropriate timeout handling and clear error messages. The null check prevents invalid state transitions, and the timeout exception provides helpful diagnostic information.


161-170: LGTM! Clean resource disposal.

The disposal logic properly cleans up resources with appropriate null checks and conditional logic. The async disposal pattern for the test container is correctly implemented.

@sonarqubecloud
Copy link

@elzik elzik merged commit a69e6ae into main Oct 30, 2025
14 checks passed
@elzik elzik deleted the per-extractor-instructions branch October 30, 2025 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants