-
Notifications
You must be signed in to change notification settings - Fork 0
Add Per-extractor Instructions #168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe PR refactors extraction to derive ExtractType from extractor class names, introduces file-backed, type-specific summarisation instructions, updates the summariser to accept runtime instructions, and threads those instructions through BreefGenerator to produce summaries; multiple extractors and tests are adapted to the new contracts. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant BreefGenerator
participant Strategy as ContentExtractorStrategy
participant Extractor as ContentExtractorBase
participant InstructionProvider
participant Summariser
Client->>BreefGenerator: GenerateAsync(url)
BreefGenerator->>Strategy: ExtractAsync(url)
Strategy->>Extractor: ExtractAsync(url)
Extractor->>Extractor: CreateUntypedExtractAsync(url)
Extractor->>Extractor: GetExtractType() (class name -> type)
Extractor->>Extractor: WithType(extractType)
Extractor-->>Strategy: Extract{Title, Content, ExtractType}
Strategy-->>BreefGenerator: Extract
BreefGenerator->>InstructionProvider: GetInstructions(extract.ExtractType)
InstructionProvider-->>BreefGenerator: instructions (loaded from .md)
BreefGenerator->>Summariser: SummariseAsync(extract.Content, instructions)
Summariser-->>BreefGenerator: summary
BreefGenerator-->>Client: Breef{summary,...}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentSummariserTests.cs (1)
34-43: Fix the mocked method name to match the implementation.The mock is configured for
GetChatMessageContentsAsync(plural), but the implementation callsGetChatMessageContentAsync(singular). This mismatch will cause all tests to fail because the mock won't intercept the actual call.Apply this diff to fix the method name:
- _ = _chatCompletionService.GetChatMessageContentsAsync( + _ = _chatCompletionService.GetChatMessageContentAsync( Arg.Is<ChatHistory>(ch => ch.Any( m => m.Content != null && m.Content.EndsWith(_testContent) && m.Role == AuthorRole.User)), Arg.Any<PromptExecutionSettings>(), Arg.Any<Kernel>(), Arg.Any<CancellationToken>()) - .Returns(Task.FromResult<IReadOnlyList<ChatMessageContent>>( - [_testSummaryResult])); + .Returns(_testSummaryResult);src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs (1)
3-3: Remove unused import.The
Microsoft.Extensions.Optionsnamespace is no longer used after removing the options-based configuration.Apply this diff:
using Elzik.Breef.Domain; using Microsoft.Extensions.Logging; -using Microsoft.Extensions.Options; using Microsoft.SemanticKernel.ChatCompletion;
🧹 Nitpick comments (8)
tests/Elzik.Breef.Api.Tests.Integration/Elzik.Breef.Api.Tests.Integration.csproj (1)
42-47: Include instructions on publish as wellAdd CopyToPublishDirectory to keep instructions in published test artifacts.
<ItemGroup> <None Include="..\..\src\Elzik.Breef.Infrastructure\SummarisationInstructions\**\*.md" Link="SummarisationInstructions\%(RecursiveDir)%(Filename)%(Extension)"> <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory> + <CopyToPublishDirectory>PreserveNewest</CopyToPublishDirectory> </None> </ItemGroup>tests/Elzik.Breef.Infrastructure.Tests.Integration/Elzik.Breef.Infrastructure.Tests.Integration.csproj (1)
56-60: Publish instructions with test artifactsMirror the output behavior during publish.
<ItemGroup> <None Include="..\..\src\Elzik.Breef.Infrastructure\SummarisationInstructions\**\*.md" Link="SummarisationInstructions\%(RecursiveDir)%(Filename)%(Extension)"> <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory> + <CopyToPublishDirectory>PreserveNewest</CopyToPublishDirectory> </None> </ItemGroup>tests/Elzik.Breef.Infrastructure.Tests.Integration/AiContentSummariserTests.cs (3)
28-32: Environment variable prefix: use a conventional uppercase prefixEnvironment variables are typically uppercase on Linux/macOS. Using "breef_" can cause misses.
- .AddEnvironmentVariables("breef_") + .AddEnvironmentVariables("BREEF_")
49-55: Assert instruction compliance, not just lengthStrengthen the test to validate max 200 words and no code blocks per HtmlContent.md.
- _testOutputHelper.WriteLine(summary); - summary.Length.ShouldBeGreaterThan(100, "because this is the minimum acceptable summary length"); + _testOutputHelper.WriteLine(summary); + summary.ShouldNotBeNullOrWhiteSpace(); + var wordCount = summary.Split(new[]{' ', '\n', '\t'}, StringSplitOptions.RemoveEmptyEntries).Length; + wordCount.ShouldBeLessThanOrEqualTo(200, "instructions cap summaries at 200 words"); + summary.ShouldNotContain("```", "instructions disallow code blocks");
21-25: Make the integration test skippable when AI config is absentAvoid hard failures on developer machines/CI without credentials by using SkippableTheory.
- [Theory] + [SkippableTheory] [InlineData("TestHtmlPage-ExpectedContent.txt")] [InlineData("BbcNewsPage-ExpectedContent.txt")] public async Task Summarise_WithValidContent_ReturnsSummary(string testExtractedContentFile) { - // Arrange + // Arrange + Skip.If(string.IsNullOrWhiteSpace(Environment.GetEnvironmentVariable("BREEF_AISERVICE__APIKEY")), + "AI credentials not configured: BREEF_AISERVICE__*");src/Elzik.Breef.Domain/IContentSummariser.cs (1)
5-5: Add CancellationToken to the async contractExpose cancellation in the public API to propagate request/HTTP timeouts and avoid hung calls. Updates required at 7 locations:
- Interface:
src/Elzik.Breef.Domain/IContentSummariser.cs(line 5)- Implementation:
src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs(line 12)- Production call:
src/Elzik.Breef.Application/BreefGenerator.cs(line 17)- Unit test calls:
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentSummariserTests.cs(lines 50, 63, 78)- Integration test call:
tests/Elzik.Breef.Infrastructure.Tests.Integration/AiContentSummariserTests.cs(line 50)- Task<string> SummariseAsync(string content, string instructions); + Task<string> SummariseAsync(string content, string instructions, CancellationToken cancellationToken = default);src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs (2)
15-19: Reconsider the hardcoded formatting instructions.With the introduction of per-extractor instructions from external markdown files, the hardcoded
formattingInstructionscreates a dual source of truth for summarization behavior. This reduces the flexibility that external instructions are meant to provide.Consider either:
- Moving the HTML formatting requirement into the external instruction files (e.g.,
HtmlContent.md)- Removing the hardcoded formatting if it should be controlled by the external instructions
Are the hardcoded formatting instructions intentionally meant to supplement the external instructions, or should all formatting guidance be controlled by the external instruction files?
12-14: Consider validating the content parameter as well.While
instructionsis properly validated, thecontentparameter is not. Empty or whitespace content would result in a meaningless summarization request.Apply this diff to add content validation:
public async Task<string> SummariseAsync(string content, string instructions) { + ArgumentNullException.ThrowIfNullOrWhiteSpace(content); ArgumentNullException.ThrowIfNullOrWhiteSpace(instructions);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
src/Elzik.Breef.Application/BreefGenerator.cs(1 hunks)src/Elzik.Breef.Domain/IContentSummariser.cs(1 hunks)src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs(1 hunks)src/Elzik.Breef.Infrastructure/Elzik.Breef.Infrastructure.csproj(1 hunks)src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md(1 hunks)tests/Elzik.Breef.Api.Tests.Integration/Elzik.Breef.Api.Tests.Integration.csproj(1 hunks)tests/Elzik.Breef.Infrastructure.Tests.Integration/AiContentSummariserTests.cs(2 hunks)tests/Elzik.Breef.Infrastructure.Tests.Integration/Elzik.Breef.Infrastructure.Tests.Integration.csproj(1 hunks)tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentSummariserTests.cs(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md (1)
src/Elzik.Breef.Infrastructure/AI/AiContentSummariserOptions.cs (1)
AiContentSummariserOptions(5-12)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentSummariserTests.cs (1)
src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs (1)
AiContentSummariser(8-40)
tests/Elzik.Breef.Infrastructure.Tests.Integration/AiContentSummariserTests.cs (3)
src/Elzik.Breef.Infrastructure/AI/AiServiceOptions.cs (1)
AiServiceOptions(5-21)tests/Elzik.Breef.Infrastructure.Tests.Integration/TestOutputLoggerProvider.cs (1)
TestOutputLoggerProvider(6-19)src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs (1)
AiContentSummariser(8-40)
src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs (3)
src/Elzik.Breef.Application/BreefGenerator.cs (1)
Task(10-24)src/Elzik.Breef.Domain/IContentSummariser.cs (1)
Task(5-5)src/Elzik.Breef.Api/Program.cs (1)
Task(26-144)
🪛 LanguageTool
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md
[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...t include the title in the response. - Do not include any metadata in the respons...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~8-~8: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nclude any metadata in the response. - Do not include a code block in the respons...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.18.1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md
2-2: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
3-3: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
4-4: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
5-5: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
6-6: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
7-7: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
8-8: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
- GitHub Check: build-ubuntu
src/Elzik.Breef.Infrastructure/Elzik.Breef.Infrastructure.csproj
Outdated
Show resolved
Hide resolved
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md
Outdated
Show resolved
Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md (1)
6-8: Optional: Consider varying sentence structure to reduce repetition.Lines 6–8 consecutively begin with "Do not", which LanguageTool flags as repetitive. While this consistency is intentional and aids clarity, you could reword some items for stylistic variety if preferred:
- - Do not include a link to the original article. - - Do not include the title in the response. - - Do not include any metadata in the response. - - Do not include a code block in the response. + - Exclude links to the original article. + - Omit the title from the response. + - Avoid including any metadata. + - Do not include a code block in the response.This is a minor style suggestion and not a blocker.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md
[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ot include the title in the response. - Do not include any metadata in the respons...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~8-~8: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...include any metadata in the response. - Do not include a code block in the respons...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
🔇 Additional comments (1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/HtmlContent.md (1)
1-8: Good work addressing the typos and correcting the grammar.The changes from the previous review have been applied: "then200" is now correctly "than 200" (line 3) and the phrasing is improved. The instructions are clear and prescriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (1)
13-26: Make preview image URL absolute and consider srcset/data- sources*Largest image selection ignores srcset and data-* attributes and may return a relative URL. Resolve to absolute against webPageUrl and consider common lazy-load attributes.
Apply this focused refactor:
- var largestImageUrl = GetLargestImageUrl(htmlDocument); + var largestImageUrl = GetPreviewImageUrl(htmlDocument, webPageUrl); @@ - private static string? GetLargestImageUrl(HtmlDocument htmlDocument) + private static string? GetPreviewImageUrl(HtmlDocument htmlDocument, string baseUrl) { var imageNodes = htmlDocument.DocumentNode.SelectNodes("//img"); if (imageNodes == null || imageNodes.Count == 0) { return null; } - var imageNodesSortedBySize = imageNodes - .Select(node => new + static string PickUrl(HtmlAgilityPack.HtmlNode n) + { + // Prefer explicit src; fall back to common lazy-load attributes; then srcset first candidate. + var src = n.GetAttributeValue("src", null) + ?? n.GetAttributeValue("data-src", null) + ?? n.GetAttributeValue("data-lazy-src", null) + ?? n.GetAttributeValue("data-original", null); + if (string.IsNullOrWhiteSpace(src)) + { + var srcset = n.GetAttributeValue("srcset", null); + if (!string.IsNullOrWhiteSpace(srcset)) + { + // take the first candidate + src = srcset.Split(',')[0].Trim().Split(' ')[0]; + } + } + return src ?? string.Empty; + } + + static bool TryMakeAbsolute(string baseUrl, string imageUrl, out string absolute) + { + absolute = string.Empty; + if (string.IsNullOrWhiteSpace(imageUrl)) return false; + if (imageUrl.StartsWith("data:", StringComparison.OrdinalIgnoreCase) || + imageUrl.StartsWith("javascript:", StringComparison.OrdinalIgnoreCase)) + return false; + if (Uri.TryCreate(imageUrl, UriKind.Absolute, out var abs)) + { + absolute = abs.AbsoluteUri; return true; + } + if (Uri.TryCreate(baseUrl, UriKind.Absolute, out var baseUri) && + Uri.TryCreate(baseUri, imageUrl, out var resolved)) + { + absolute = resolved.AbsoluteUri; return true; + } + return false; + } + + var imageNodesSortedBySize = imageNodes + .Select(node => { - Node = node, - Width = int.TryParse(node.GetAttributeValue("width", "0"), out var width) ? width : 0, - Height = int.TryParse(node.GetAttributeValue("height", "0"), out var height) ? height : 0, - ImageUrl = node.GetAttributeValue("src", string.Empty) - }) - .Where(n => !string.IsNullOrWhiteSpace(n.ImageUrl)) + var width = int.TryParse(node.GetAttributeValue("width", "0"), out var w) ? w : 0; + var height = int.TryParse(node.GetAttributeValue("height", "0"), out var h) ? h : 0; + var candidate = PickUrl(node); + return new { Width = width, Height = height, CandidateUrl = candidate }; + }) + .Select(n => + { + var ok = TryMakeAbsolute(baseUrl, n.CandidateUrl, out var abs); + return new { n.Width, n.Height, ImageUrl = ok ? abs : string.Empty }; + }) + .Where(n => !string.IsNullOrWhiteSpace(n.ImageUrl)) .OrderByDescending(img => img.Width * img.Height); return imageNodesSortedBySize.FirstOrDefault()?.ImageUrl; }Also applies to: 66-86
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (2)
36-43: Normalize subreddit URL using Uri APIs; avoid string-based trailing slashString concatenation may mis-handle URLs with query/fragment. Use GetLeftPart + TrimEnd.
- var webPageUri = new Uri(webPageUrl.EndsWith(UrlPathSeparator) ? webPageUrl : webPageUrl + UrlPathSeparator, UriKind.Absolute); + var originalUri = new Uri(webPageUrl, UriKind.Absolute); + var normalized = originalUri.GetLeftPart(UriPartial.Path).TrimEnd(UrlPathSeparator) + UrlPathSeparator; + var webPageUri = new Uri(normalized, UriKind.Absolute);
75-79: Avoid downloading full images when verifying; use HEAD or headers-only GETCurrent GetAsync downloads the body, which is expensive and slow. Prefer HEAD, and if unsupported, fall back to GET with ResponseHeadersRead.
- var client = _httpClientFactory.CreateClient("BreefDownloader"); - using var response = await client.GetAsync(imageUrl); - if (response.IsSuccessStatusCode) - { - return imageUrl; - } + var client = _httpClientFactory.CreateClient("BreefDownloader"); + // Try HEAD first + using var head = new HttpRequestMessage(HttpMethod.Head, imageUrl); + using var headResp = await client.SendAsync(head, HttpCompletionOption.ResponseHeadersRead); + if (headResp.IsSuccessStatusCode) return imageUrl; + // Fallback to GET without buffering content + using var getResp = await client.GetAsync(imageUrl, HttpCompletionOption.ResponseHeadersRead); + if (getResp.IsSuccessStatusCode) return imageUrl;
🧹 Nitpick comments (6)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorStrategy.cs (1)
28-39: Logging addition looks good; consider guarding selector.If the default extractor’s CanHandle ever stops returning true, .First throws. Prefer explicit guard.
- var extractor = _extractors.First(e => e.CanHandle(webPageUrl)); + var extractor = _extractors.FirstOrDefault(e => e.CanHandle(webPageUrl)) + ?? throw new InvalidOperationException("No extractor could handle the URL and no default extractor was available.");src/Elzik.Breef.Infrastructure/ContentExtractors/UntypedExtract.cs (1)
7-8: Harden WithType against blank extractType.Defensive check avoids bad data if called directly.
- public Extract WithType(string extractType) - => new(Title, Content, PreviewImageUrl, extractType); + public Extract WithType(string extractType) + { + if (string.IsNullOrWhiteSpace(extractType)) + throw new ArgumentException("extractType cannot be null or whitespace.", nameof(extractType)); + return new(Title, Content, PreviewImageUrl, extractType); + }src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (1)
11-14: Harden constructor/options and API response handling; add stable fallbacks.Prevents NREs and ensures Extract always has a title/image.
- private readonly RedditOptions _redditOptions = redditOptions.Value; + private readonly RedditOptions _redditOptions = + (redditOptions ?? throw new ArgumentNullException(nameof(redditOptions))).Value; @@ - var postId = segments[3]; - var post = await redditPostClient.GetPost(postId); - - if (string.IsNullOrWhiteSpace(post.Post.ImageUrl)) - { - var subredditName = segments[1]; - post.Post.ImageUrl = await subredditImageExtractor.GetSubredditImageUrlAsync(subredditName); - } - - var postJson = JsonSerializer.Serialize(post); - - return new UntypedExtract(post.Post.Title, postJson, post.Post.ImageUrl); + var postId = segments[3]; + var post = await redditPostClient.GetPost(postId); + if (post?.Post is null) + throw new InvalidOperationException($"Post '{postId}' not found or malformed response."); + + var subredditName = segments[1]; + var imageUrl = !string.IsNullOrWhiteSpace(post.Post.ImageUrl) + ? post.Post.ImageUrl + : await subredditImageExtractor.GetSubredditImageUrlAsync(subredditName); + + var title = string.IsNullOrWhiteSpace(post.Post.Title) + ? $"Reddit post {postId}" + : post.Post.Title; + + var postJson = JsonSerializer.Serialize(post); + return new UntypedExtract(title, postJson, imageUrl);Also applies to: 34-73
src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (1)
28-39: Optional: normalize and de-entitize extracted contentTo reduce boilerplate and entities in content, consider HtmlEntity.DeEntitize and basic whitespace normalization.
- var content = mainContentNode != null - ? mainContentNode.InnerText.Trim() : "Content not found."; + var content = mainContentNode != null + ? HtmlEntity.DeEntitize(System.Text.RegularExpressions.Regex.Replace( + mainContentNode.InnerText, @"\s+", " ")).Trim() + : "Content not found." + ;src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (1)
9-18: Optional: cache derived extract type onceCompute extract type in ctor and reuse to avoid repeated slicing and reflection.
- private const string RequiredSuffix = "Extractor"; + private const string RequiredSuffix = "Extractor"; + private readonly string _extractType; @@ - var typeName = GetType().Name; + var typeName = GetType().Name; if (!typeName.EndsWith(RequiredSuffix, StringComparison.Ordinal)) { throw new InvalidOperationException( $"Content extractor class '{typeName}' must end with '{RequiredSuffix}' suffix. " + $"This convention is required to derive the ExtractType for domain objects."); } + _extractType = typeName[..^RequiredSuffix.Length]; @@ - var extractType = GetExtractType(); + var extractType = _extractType; @@ - private string GetExtractType() - { - var typeName = GetType().Name; - - return typeName[..^RequiredSuffix.Length]; - } + private string GetExtractType() => _extractType;Also applies to: 32-37
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1)
49-53: Optional: broaden supported Reddit hostnames via optionsIf you intend to support old.reddit.com or other mirrors, add them to RedditOptions.AdditionalBaseAddresses, or match subdomains by suffix.
Also applies to: 85-86
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
src/Elzik.Breef.Domain/Extract.cs(1 hunks)src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs(1 hunks)src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorStrategy.cs(1 hunks)src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs(2 hunks)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs(4 hunks)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs(3 hunks)src/Elzik.Breef.Infrastructure/ContentExtractors/UntypedExtract.cs(1 hunks)tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/HtmlContentExtractorTests.cs(1 hunks)tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs(1 hunks)tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorStrategyTests.cs(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (5)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditOptions.cs (1)
RedditOptions(5-33)src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (5)
ContentExtractorBase(5-38)ContentExtractorBase(9-18)CanHandle(20-20)Task(22-28)Task(30-30)src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (2)
CanHandle(8-11)Task(13-26)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (2)
CanHandle(15-32)Task(34-73)src/Elzik.Breef.Api/Program.cs (1)
Task(26-144)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (2)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorStrategy.cs (2)
CanHandle(26-26)Task(28-39)src/Elzik.Breef.Infrastructure/ContentExtractors/UntypedExtract.cs (1)
Extract(7-8)
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (4)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (5)
ContentExtractorBase(5-38)ContentExtractorBase(9-18)CanHandle(20-20)Task(22-28)Task(30-30)src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (2)
CanHandle(8-11)Task(13-26)src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (4)
CanHandle(18-34)Task(36-47)Task(49-53)Task(55-86)src/Elzik.Breef.Domain/IContentExtractor.cs (2)
CanHandle(5-5)Task(7-7)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorStrategyTests.cs (4)
src/Elzik.Breef.Infrastructure/ContentExtractors/UntypedExtract.cs (1)
Extract(7-8)src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (3)
Task(22-28)Task(30-30)CanHandle(20-20)src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorStrategy.cs (2)
Task(28-39)CanHandle(26-26)src/Elzik.Breef.Domain/IContentExtractor.cs (1)
CanHandle(5-5)
src/Elzik.Breef.Domain/Extract.cs (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/UntypedExtract.cs (1)
Extract(7-8)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (5)
Task(22-28)Task(30-30)ContentExtractorBase(5-38)ContentExtractorBase(9-18)CanHandle(20-20)
src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (3)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (5)
ContentExtractorBase(5-38)ContentExtractorBase(9-18)CanHandle(20-20)Task(22-28)Task(30-30)src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorStrategy.cs (2)
CanHandle(26-26)Task(28-39)src/Elzik.Breef.Domain/IContentExtractor.cs (2)
CanHandle(5-5)Task(7-7)
🔇 Additional comments (8)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/HtmlContentExtractorTests.cs (1)
44-44: Assertion aligns with ExtractType convention.Matches base class derivation; good signal in integration coverage.
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs (1)
1-124: Solid coverage of naming guard and ExtractType derivation.Tests exercise success/failure paths and data integrity; tight and readable.
src/Elzik.Breef.Domain/Extract.cs (1)
3-3: Domain shape change acknowledged.Looks good; verify all creators/serializers are updated to include ExtractType.
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorStrategyTests.cs (2)
55-61: Log assertions are precise and resilient.Checking levels and message starts is a good balance vs. full template matching.
Also applies to: 78-84, 103-109, 129-135
170-172: Null-arg guard tests read well.Exception messages asserted explicitly; helpful for regressions.
Also applies to: 186-188
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/RedditPostContentExtractor.cs (1)
15-32: CanHandle logic looks correct for Reddit post URLs.Domain allowlist + path segments check is straightforward.
src/Elzik.Breef.Infrastructure/ContentExtractors/HtmlContentExtractor.cs (1)
8-12: Confirm fallback-only usage of always-true CanHandleHtmlContentExtractor.CanHandle returns true, so strategy order must ensure specialized extractors run first and this acts as the fallback. Looks fine given Program.cs wiring; just confirming intent.
src/Elzik.Breef.Infrastructure/ContentExtractors/Reddit/SubRedditContentExtractor.cs (1)
18-34: CanHandle logic looks solid for subreddit rootsDomain allow-list + “/r/{name}” path check reads well. No changes needed.
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs (1)
103-151: LGTM!The test helper classes are well-designed and serve clear purposes:
- Each class tests a specific scenario (valid naming, multiple types, compound names, invalid naming, null handling)
- Minimal implementations appropriate for unit tests
- The use of
null!on line 149 correctly suppresses nullable warnings while explicitly testing null-return handlingOptional: Consider edge case testing.
The test coverage is comprehensive for the core functionality. You might consider adding tests for edge cases like:
- An extractor named exactly "Extractor" (though this would be unusual in practice)
- Verifying behavior with various URL formats (though this may be better suited for integration tests of concrete extractors)
These are nice-to-have additions rather than critical gaps.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs(1 hunks)src/Elzik.Breef.Infrastructure/Elzik.Breef.Infrastructure.csproj(1 hunks)tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs
🧰 Additional context used
🧬 Code graph analysis (1)
tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs (1)
src/Elzik.Breef.Infrastructure/ContentExtractors/ContentExtractorBase.cs (5)
Task(22-35)Task(37-37)ContentExtractorBase(5-45)ContentExtractorBase(9-18)CanHandle(20-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
🔇 Additional comments (9)
src/Elzik.Breef.Infrastructure/Elzik.Breef.Infrastructure.csproj (1)
30-36: ✅ SummarisationInstructions ItemGroup properly configured.The new ItemGroup correctly includes markdown resources with both
CopyToOutputDirectoryandCopyToPublishDirectoryset toPreserveNewest. This ensures the markdown instruction files are copied to both build output and publish directories, addressing the critical issue flagged in previous review comments about missing publish-time inclusion.The path resolution is correct: relative to the project directory, the pattern
SummarisationInstructions\**\*.mdwill match the HtmlContent.md file mentioned in the AI summary, and the Link attribute preserves the folder structure at runtime.tests/Elzik.Breef.Infrastructure.Tests.Unit/ContentExtractors/ContentExtractorBaseTests.cs (8)
1-6: LGTM!The using statements and namespace declaration are appropriate and follow standard conventions.
9-14: LGTM!The test correctly validates that a properly-named extractor (ending with "Extractor" suffix) can be instantiated without throwing.
16-24: LGTM!The test comprehensively validates the naming convention enforcement, checking both that the exception is thrown and that the message contains all relevant details (class name, suffix requirement, and rationale).
26-38: LGTM!The test correctly validates that ExtractType is derived by removing the "Extractor" suffix from the class name.
40-54: LGTM!The test thoroughly validates that all UntypedExtract properties (Title, Content, PreviewImageUrl) are preserved during the conversion to Extract.
56-70: LGTM!The test validates that different extractor classes produce distinct ExtractType values, ensuring the type derivation is instance-specific and works correctly for multiple extractor types.
72-83: LGTM!The test validates suffix removal with a realistic extractor name pattern, ensuring the logic works correctly for compound names like those used in production code.
85-101: LGTM!The test comprehensively validates the null-handling error path, checking both that the appropriate exception is thrown and that the message includes all relevant context (operation, URL, extractor type, and requirement).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (1)
190-190: Optional: Null-conditional operator is unnecessary here.Since
_clientis a non-nullable field that's always initialized in the constructor (line 28), the null-conditional operator is technically unnecessary. However, this is defensive programming and the current code is safe.If you prefer directness:
- _client?.Dispose(); + _client.Dispose();
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (1)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (1)
Dispose(219-222)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
🔇 Additional comments (2)
tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (2)
28-31: LGTM: HttpClient timeout improves test reliability.Adding a 30-second timeout to the HttpClient is appropriate for functional tests involving Docker containers and prevents hanging requests. The timeout value is consistent with
ContainerStartTimeoutSeconds.
192-196: LGTM: Defensive null check prevents disposal errors.The conditional disposal correctly handles the case where
_testContainermay be null (when Docker is unavailable). This prevents potential null reference exceptions during cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/Elzik.Breef.Api/Program.cs (1)
126-133: Consider deriving extract types dynamically for better maintainability.The registration correctly wires up the instruction provider. However, the hardcoded list of extract type names may require manual updates when new extractors are added.
Consider deriving extract types dynamically from registered extractors or configuration to reduce maintenance burden:
builder.Services.AddSingleton<IContentSummarisationInstructionProvider>(sp => { var logger = sp.GetRequiredService<ILogger<FileBasedContentSummarisationInstructionProvider>>(); // Option 1: Derive from assembly scanning of IContentExtractor implementations // Option 2: Load from configuration var extractTypes = new[] { "HtmlContent", "RedditPostContent", "SubredditContent" }; return new FileBasedContentSummarisationInstructionProvider( logger, Path.Combine(AppContext.BaseDirectory, "SummarisationInstructions"), extractTypes); });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/Elzik.Breef.Api/Program.cs(1 hunks)src/Elzik.Breef.Application/BreefGenerator.cs(1 hunks)src/Elzik.Breef.Domain/IContentSummarisationInstructionProvider.cs(1 hunks)src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs(1 hunks)src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md(1 hunks)src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md(1 hunks)tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
src/Elzik.Breef.Domain/IContentSummarisationInstructionProvider.cs (1)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (1)
GetInstructions(49-64)
src/Elzik.Breef.Api/Program.cs (1)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (2)
FileBasedContentSummarisationInstructionProvider(6-65)FileBasedContentSummarisationInstructionProvider(10-47)
tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs (1)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (2)
FileBasedContentSummarisationInstructionProvider(6-65)FileBasedContentSummarisationInstructionProvider(10-47)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (1)
src/Elzik.Breef.Domain/IContentSummarisationInstructionProvider.cs (1)
GetInstructions(5-5)
src/Elzik.Breef.Application/BreefGenerator.cs (5)
src/Elzik.Breef.Api/Program.cs (1)
Task(26-153)src/Elzik.Breef.Domain/IContentSummariser.cs (1)
Task(5-5)src/Elzik.Breef.Infrastructure/AI/AiContentSummariser.cs (1)
Task(12-39)src/Elzik.Breef.Domain/IContentSummarisationInstructionProvider.cs (1)
GetInstructions(5-5)src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (1)
GetInstructions(49-64)
🪛 LanguageTool
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...t include the title in the response. - Do not include any metadata in the respons...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~8-~8: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nclude any metadata in the response. - Do not include a code block in the respons...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md
[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...t include the title in the response. - Do not include any metadata in the respons...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~8-~8: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nclude any metadata in the response. - Do not include a code block in the respons...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.18.1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
2-2: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
3-3: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
4-4: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
5-5: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
6-6: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
7-7: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
8-8: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md
2-2: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
3-3: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
4-4: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
5-5: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
6-6: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
7-7: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
8-8: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
🔇 Additional comments (4)
src/Elzik.Breef.Domain/IContentSummarisationInstructionProvider.cs (1)
3-6: LGTM!The interface provides a clear contract for retrieving type-specific summarisation instructions. The design is simple and appropriate for the use case.
tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs (1)
7-74: LGTM!The integration tests provide comprehensive coverage of the FileBasedContentSummarisationInstructionProvider:
- Validates successful instantiation with all required files present
- Validates failure scenarios for missing files, missing instructions, and missing directory
- Includes proper cleanup of temporary test directories
src/Elzik.Breef.Application/BreefGenerator.cs (1)
5-24: LGTM! Clean abstraction resolves previous review concerns.The refactoring cleanly abstracts instruction retrieval through
IContentSummarisationInstructionProvider, eliminating direct file access fromBreefGenerator. The provider handles all validation and error scenarios, making this implementation robust and testable.The past review comment suggesting file existence checks is now resolved by the abstraction layer.
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (1)
6-65: LGTM! Excellent validation and error handling.The implementation provides robust validation with clear error messages:
- Validates directory existence on construction
- Validates required extract type names are provided
- Tracks and reports all missing instruction files in a single error
- Validates that loaded instructions are not empty
- Uses case-insensitive lookup for extract type names
- Logs successful template loads for observability
The use of synchronous
File.ReadAllTextin the constructor is appropriate since constructors cannot be async.
src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md
Outdated
Show resolved
Hide resolved
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md (2)
2-8: Fix markdown list indentation.The list items (lines 2–8) are indented with a leading space, but markdown linters expect unordered lists to start at column 0. Remove the leading space from each bullet point (same as RedditPostContent.md).
Apply this diff to fix indentation:
-You are an expert summarizer. Your task is to summarize the provided text: - - Summarise text, including HTML entities. - - Limit summaries to 10% of the original length but never more than 200 words. - - Ensure accurate attribution of information to the correct entities. - - Do not include a link to the original articles. - - Do not include the title in the response. - - Do not include any metadata in the response. - - Do not include a code block in the response. +You are an expert summarizer. Your task is to summarize the provided text: +- Summarise text, including HTML entities. +- Limit summaries to 10% of the original length but never more than 200 words. +- Ensure accurate attribution of information to the correct entities. +- Do not include a link to the original articles. +- Do not include the title in the response. +- Do not include any metadata in the response. +- Do not include a code block in the response.
3-3: Fix typo: "then" should be "than".Line 3 contains the same spelling error as RedditPostContent.md. The phrase should read "but never more than 200 words" (not "then").
Apply this diff:
- - Limit summaries to 10% of the original length but never more then 200 words. + - Limit summaries to 10% of the original length but never more than 200 words.src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md (1)
3-3: Fix typo: "then" should be "than".Line 3 contains a spelling error that affects instruction clarity. The phrase should read "but never more than 200 words" (not "then").
Apply this diff:
- - Limit summaries to 10% of the original length but never more then 200 words. + - Limit summaries to 10% of the original length but never more than 200 words.
🧹 Nitpick comments (1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md (1)
2-8: Fix markdown list indentation.The list items (lines 2–8) are indented with a leading space, but markdown linters expect unordered lists to start at column 0. Remove the leading space from each bullet point.
Apply this diff to fix indentation:
-You are an expert summarizer. Your task is to summarize the provided text: - - Summarise text, including HTML entities. - - Limit summaries to 10% of the original length but never more than 200 words. - - Ensure accurate attribution of information to the correct entities. - - Do not include a link to the original articles. - - Do not include the title in the response. - - Do not include any metadata in the response. - - Do not include a code block in the response. +You are an expert summarizer. Your task is to summarize the provided text: +- Summarise text, including HTML entities. +- Limit summaries to 10% of the original length but never more than 200 words. +- Ensure accurate attribution of information to the correct entities. +- Do not include a link to the original articles. +- Do not include the title in the response. +- Do not include any metadata in the response. +- Do not include a code block in the response.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
Elzik.Breef.Api.http(1 hunks)Elzik.Breef.sln(2 hunks)src/Elzik.Breef.Api/Elzik.Breef.Api.http(0 hunks)src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md(1 hunks)src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md(1 hunks)
💤 Files with no reviewable changes (1)
- src/Elzik.Breef.Api/Elzik.Breef.Api.http
✅ Files skipped from review due to trivial changes (1)
- Elzik.Breef.sln
🧰 Additional context used
🪛 LanguageTool
src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md
[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...t include the title in the response. - Do not include any metadata in the respons...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~8-~8: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nclude any metadata in the response. - Do not include a code block in the respons...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...t include the title in the response. - Do not include any metadata in the respons...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~8-~8: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...nclude any metadata in the response. - Do not include a code block in the respons...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.18.1)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/RedditPostContent.md
2-2: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
3-3: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
4-4: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
5-5: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
6-6: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
7-7: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
8-8: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
src/Elzik.Breef.Infrastructure/SummarisationInstructions/SubredditContent.md
2-2: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
3-3: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
4-4: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
5-5: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
6-6: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
7-7: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
8-8: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (4)
src/Elzik.Breef.Api/Program.cs (1)
122-129: Consider making extract type names configurable or discoverable.The hardcoded list of extract type names
["HtmlContent", "RedditPostContent", "SubredditContent"]creates a maintenance burden. When new content extractors are added, this list must be manually updated, which is error-prone.Consider one of the following approaches:
- Move the list to configuration (appsettings.json) so it can be updated without recompilation
- Discover extract type names dynamically by reflecting on registered extractors
- Add a comment explaining that this list must be kept in sync with registered extractors
</review_comment_end>
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (2)
25-43: Consider validating for duplicate extract type names.If
requiredExtractTypeNamescontains duplicate entries (case-insensitive), the later entry will silently overwrite the earlier one in the dictionary. While this may not be a practical issue, adding a check would make the validation more explicit and prevent confusion.Consider adding after line 25:
foreach (var extractTypeName in requiredExtractTypeNames) { + if (_templatesByKey.ContainsKey(extractTypeName)) + { + throw new ArgumentException($"Duplicate extract type name '{extractTypeName}' in requiredExtractTypeNames.", nameof(requiredExtractTypeNames)); + } + var filePath = Path.Combine(instructionFileDirectoryPath, $"{extractTypeName}.md");</review_comment_end>
46-55: Add parameter validation for extractTypeName.The method doesn't validate the
extractTypeNameparameter. Ifnullis passed, the dictionary will throwArgumentNullException, and if an empty string is passed, the error message may be less helpful.Add validation at the start of the method:
public string GetInstructions(string extractTypeName) { + if (string.IsNullOrWhiteSpace(extractTypeName)) + { + throw new ArgumentException("Extract type name cannot be null or whitespace.", nameof(extractTypeName)); + } + if (_templatesByKey.TryGetValue(extractTypeName, out var instructions)) { return instructions; }</review_comment_end>
tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs (1)
10-73: Add cleanup for temporary test directories.The tests create temporary directories but don't clean them up after execution. While they delete existing directories at the start, test failures or exceptions could leave directories behind, and using fixed directory names may cause issues with parallel test execution.
Consider using
IDisposabletest fixtures or try-finally blocks:public void Instantiated_AllRequiredFilesPresent_Succeeds() { // Arrange var dir = Path.Combine(Path.GetTempPath(), $"SummarisationInstructions_{Guid.NewGuid()}"); try { Directory.CreateDirectory(dir); File.WriteAllText(Path.Combine(dir, "HtmlContent.md"), "dummy"); // ... rest of test } finally { if (Directory.Exists(dir)) Directory.Delete(dir, true); } }Alternatively, use a test fixture with
IDisposableto manage cleanup automatically.
</review_comment_end>
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
README.md(0 hunks)src/Elzik.Breef.Api/Program.cs(1 hunks)src/Elzik.Breef.Infrastructure/AI/AiContentSummariserOptions.cs(0 hunks)src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs(1 hunks)tests/Elzik.Breef.Api.Tests.Functional/HealthEndpointTests.cs(0 hunks)tests/Elzik.Breef.Api.Tests.Functional/HealthTestsBase.cs(1 hunks)tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs(1 hunks)tests/Elzik.Breef.Api.Tests.Functional/HealthTestsNative.cs(1 hunks)tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs(1 hunks)tests/Elzik.Breef.Infrastructure.Tests.Integration/AiContentSummariserOptionsTests.cs(0 hunks)
💤 Files with no reviewable changes (4)
- tests/Elzik.Breef.Api.Tests.Functional/HealthEndpointTests.cs
- src/Elzik.Breef.Infrastructure/AI/AiContentSummariserOptions.cs
- README.md
- tests/Elzik.Breef.Infrastructure.Tests.Integration/AiContentSummariserOptionsTests.cs
🧰 Additional context used
🧬 Code graph analysis (6)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsBase.cs (1)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs (2)
Task(130-153)Task(155-164)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsNative.cs (1)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsBase.cs (1)
HealthTestsBase(7-34)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (2)
tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs (1)
IEnumerable(99-102)src/Elzik.Breef.Domain/IContentSummarisationInstructionProvider.cs (1)
GetInstructions(5-5)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs (2)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsBase.cs (1)
HealthTestsBase(7-34)tests/Elzik.Breef.Api.Tests.Functional/TestOutputHelperStream.cs (1)
TestOutputHelperStream(5-25)
src/Elzik.Breef.Api/Program.cs (1)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (2)
FileBasedContentSummarisationInstructionProvider(6-56)FileBasedContentSummarisationInstructionProvider(10-44)
tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs (2)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (3)
FileBasedContentSummarisationInstructionProvider(6-56)FileBasedContentSummarisationInstructionProvider(10-44)GetInstructions(46-55)src/Elzik.Breef.Domain/IContentSummarisationInstructionProvider.cs (1)
GetInstructions(5-5)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
🔇 Additional comments (9)
src/Elzik.Breef.Infrastructure/AI/FileBasedContentSummarisationInstructionProvider.cs (1)
8-8: LGTM!The case-insensitive dictionary is appropriate for extract type name lookups, and using
readonlyprevents reassignment.</review_comment_end>
tests/Elzik.Breef.Api.Tests.Integration/FileBasedContentSummarisationInstructionProviderTests.cs (4)
75-97: LGTM with minor note!Good use of Theory with multiple test cases for invalid input validation. The test properly validates the exception type, parameter name, and message content. The use of
Guid.NewGuid()for the directory name is also a good practice to avoid conflicts.Note: The cleanup concern mentioned in the previous comment applies here as well.
</review_comment_end>
99-102: LGTM!Simple helper method for providing empty array test data. This works well with xUnit's
MemberDataattribute.</review_comment_end>
105-125: LGTM with minor note!Good test coverage for the GetInstructions method's error path. The test properly validates the exception message includes the type name that wasn't found.
Note: The cleanup concern mentioned in the earlier comment applies here as well.
</review_comment_end>
127-151: LGTM!Excellent test coverage for empty/whitespace instruction files using Theory with multiple test cases. The assertions properly validate that the exception message contains both the error description and the filename.
Note: The cleanup concern mentioned in the earlier comment applies here as well.
</review_comment_end>
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsBase.cs (1)
7-34: LGTM! Clean test abstraction.The base class provides a well-structured template for health endpoint testing with appropriate customization points (HostPort, Client, skip support). The test follows the AAA pattern and handles response deserialization correctly.
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsNative.cs (1)
5-17: LGTM! Standard WebApplicationFactory pattern.The implementation correctly uses WebApplicationFactory for in-process testing. The stored factory reference ensures proper lifetime management aligned with IClassFixture semantics.
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs (2)
23-61: LGTM! Container configuration is sound.The constructor properly detects Docker availability, initializes resources, and configures the test container with appropriate environment variables and health check wait strategy.
130-164: LGTM! Lifecycle management is correct.The async initialization properly handles container startup with timeout, and disposal correctly cleans up resources with appropriate null checks. The timeout exception message is helpful for diagnosing container startup issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (2)
95-124: Potential deadlock when reading process output after WaitForExit.Reading
StandardOutputandStandardErrorafterWaitForExit()can cause a deadlock if the process writes more data than the buffer can hold. The process will block waiting for the buffer to be read, butWaitForExit()won't return until the process completes—creating a circular dependency. Docker build output can be verbose enough to trigger this.Apply this diff to read output asynchronously:
using var process = new Process { StartInfo = processStartInfo }; process.Start(); +string standardOutputTask = process.StandardOutput.ReadToEndAsync().GetAwaiter().GetResult(); +string standardErrorTask = process.StandardError.ReadToEndAsync().GetAwaiter().GetResult(); process.WaitForExit(); -string standardOutput = process.StandardOutput.ReadToEnd(); -string standardError = process.StandardError.ReadToEnd(); if (process.ExitCode != 0) { throw new InvalidOperationException( - standardError + Environment.NewLine + standardOutput); + standardErrorTask + Environment.NewLine + standardOutputTask); } -_testOutputHelper.WriteLine(standardOutput); +_testOutputHelper.WriteLine(standardOutputTask);Alternatively, for a cleaner async approach, start async reads before
WaitForExit:using var process = new Process { StartInfo = processStartInfo }; process.Start(); +var outputTask = process.StandardOutput.ReadToEndAsync(); +var errorTask = process.StandardError.ReadToEndAsync(); process.WaitForExit(); -string standardOutput = process.StandardOutput.ReadToEnd(); -string standardError = process.StandardError.ReadToEnd(); +string standardOutput = outputTask.GetAwaiter().GetResult(); +string standardError = errorTask.GetAwaiter().GetResult();
126-154: Same deadlock risk as BuildDockerImage.Although
docker --versionproduces minimal output, the same deadlock pattern exists here. For consistency and safety, apply the same async read pattern.Apply this diff:
using var process = new Process { StartInfo = processStartInfo }; process.Start(); +var outputTask = process.StandardOutput.ReadToEndAsync(); +var errorTask = process.StandardError.ReadToEndAsync(); process.WaitForExit(); -string standardOutput = process.StandardOutput.ReadToEnd(); -string standardError = process.StandardError.ReadToEnd(); +string standardOutput = outputTask.GetAwaiter().GetResult(); +string standardError = errorTask.GetAwaiter().GetResult();
♻️ Duplicate comments (1)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs (1)
65-65: Fragile relative path dependency.The relative path
./../../../../../build/api/build-docker.ps1remains hardcoded and depends on test execution directory structure. This pattern appears inBreefTestsDocker.csas well, suggesting it may be an accepted pattern in the codebase.
🧹 Nitpick comments (1)
tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (1)
157-162: Improved error detection using NativeErrorCode.Checking
NativeErrorCodeagainstERROR_FILE_NOT_FOUNDis more reliable than string-based detection. Good improvement.Consider moving the constant to class level if it might be reused:
public class BreefTestsDocker : BreefTestsBase, IAsyncLifetime { private const string DockerImageName = "ghcr.io/elzik/elzik-breef-api:latest"; private const int ContainerStartTimeoutSeconds = 30; + private const int ERROR_FILE_NOT_FOUND = 2;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
Elzik.Breef.sln(2 hunks)tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs(4 hunks)tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs(1 hunks)tests/SampleRequests/Elzik.Breef.Api.http(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/SampleRequests/Elzik.Breef.Api.http
🧰 Additional context used
🧬 Code graph analysis (2)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs (3)
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsBase.cs (1)
HealthTestsBase(7-34)tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (4)
DockerIsUnavailable(126-166)BuildDockerImage(95-124)Task(168-192)Task(194-203)tests/Elzik.Breef.Api.Tests.Functional/TestOutputHelperStream.cs (1)
TestOutputHelperStream(5-25)
tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (1)
tests/Elzik.Breef.Infrastructure.Tests.Integration/ContentExtractors/Reddit/RedditPostContentExtractorTests.cs (1)
Dispose(219-222)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-ubuntu
- GitHub Check: build-ubuntu
- GitHub Check: Analyze (csharp)
🔇 Additional comments (9)
Elzik.Breef.sln (2)
43-47: LGTM! Well-structured addition of sample requests folder.The new SampleRequests solution folder is properly formatted and appropriately located under the tests directory. This organizational structure will help keep sample HTTP requests accessible for testing the new per-extractor instruction functionality.
96-96: Correct nesting configuration.The SampleRequests folder is properly nested under the Tests solution folder, maintaining consistency with the solution's organizational structure.
tests/Elzik.Breef.Api.Tests.Functional/BreefTestsDocker.cs (2)
28-31: LGTM! Explicit timeout prevents hanging tests.The 30-second timeout aligns well with the container startup timeout and is appropriate for functional tests.
194-203: LGTM! Disposal pattern improved.The updated logic correctly:
- Always disposes the
HttpClient- Only attempts container cleanup when Docker is available and the container was successfully initialized
- Properly awaits async disposal operations
This prevents exceptions when the container is null and ensures proper resource cleanup.
tests/Elzik.Breef.Api.Tests.Functional/HealthTestsDocker.cs (5)
8-61: LGTM! Well-structured test class.The class properly inherits from
HealthTestsBaseand implementsIAsyncLifetimefor test lifecycle management. The conditional setup based on Docker availability is a good pattern that prevents test failures in environments without Docker. The container configuration with dummy environment variables is appropriate for health check validation.
79-91: Great fix! Process output handling is now correct.The previous issue where
StandardOutput.ReadToEnd()was being called multiple times has been properly resolved. The code now reads both streams into variables afterWaitForExit()and reuses those variables, ensuring output is captured correctly in both success and failure paths.
94-134: Excellent improvements! Both previous issues resolved.The method now correctly:
- Calls
WaitForExit()before reading the output streams (lines 110-113), preventing incomplete reads- Uses
NativeErrorCodecomparison instead of message string matching (lines 125-127), making the detection robust across locales and Windows versionsThese changes align with the pattern established in
BreefTestsDocker.csand follow best practices.
136-159: LGTM! Proper async lifecycle with timeout handling.The initialization logic correctly implements
IAsyncLifetimewith appropriate timeout handling and clear error messages. The null check prevents invalid state transitions, and the timeout exception provides helpful diagnostic information.
161-170: LGTM! Clean resource disposal.The disposal logic properly cleans up resources with appropriate null checks and conditional logic. The async disposal pattern for the test container is correctly implemented.
|



Summary by CodeRabbit
New Features
Changes