diff --git a/content/arch.md b/content/arch.md new file mode 100644 index 0000000..293a2a9 --- /dev/null +++ b/content/arch.md @@ -0,0 +1,204 @@ ++++ +date = '2025-09-14T15:45:21-07:00' +draft = true +title = 'Why Do LLMs Design Mediocre Architecture?' ++++ + +LLMs create mediocre architecture that compounds technical debt. +By "mediocre architecture" I mean code that's functionally correct but hard to maintain and expand. +In our example it's duplicated code without opinionated abstractions. + +Where should we expect them to fail? +How can tech leads benefit from increases in velocity without eventually crumbling under tech debt? +How should teams change their working style as a result? + +## Worked Example: Implementing Email Notifications + +I ran Claude Code in the [FastAPI Example App](http://github.com/fastapi/fastapi-example-app) with the following prompt: + +> Add a notification system to this FastAPI app that sends email alerts when users perform certain actions (user registration, password reset, important data changes). +> The system should support multiple notification channels in the future (SMS, push notifications) and handle failures gracefully with retry logic. + +### Claude Code's Implementation + +Claude found an existing function for email notifications `generate_new_account_email` and followed a similar pattern in defining the new functions: `generate_admin_account_status_change_email`, `generate_admin_profile_update_email`, `generate_email_change_notification`, `generate_new_account_email`, and `generate_profile_update_email`. + +Here's an example for one of them: + +```python +def generate_admin_account_status_change_email( + email_to: str, + full_name: str | None, + status_change: str, + admin_email: str, + timestamp: str, + reason: str | None = None +) -> EmailData: + project_name = settings.PROJECT_NAME + subject = f"{project_name} - Account Status Update" + html_content = render_email_template( + template_name="admin_account_status_change.html", + context={ + "project_name": project_name, + "full_name": full_name or "User", + "status_change": status_change, + "admin_email": admin_email, + "timestamp": timestamp, + "reason": reason, + "login_link": settings.FRONTEND_HOST, + "support_link": f"{settings.FRONTEND_HOST}/support", + }, + ) + return EmailData(html_content=html_content, subject=subject) +``` + +And here's how it's used: + +```python +def update_user( + *, + session: SessionDep, + user_id: uuid.UUID, + user_in: UserUpdate, + current_user: CurrentUser, +) -> Any: + # 20 lines of endpoint handling code + + # All code until `return` is new notification logic + if settings.emails_enabled: + user_data = user_in.model_dump(exclude_unset=True) + changed_fields = get_changed_fields_display(old_data, user_data) + timestamp = format_timestamp() + + # Check if account status was changed + if "is_active" in user_data and user_data["is_active"] != old_data["is_active"]: + try: + status_change = "Activated" if user_data["is_active"] else "Deactivated" + email_data = generate_admin_account_status_change_email( + email_to=db_user.email, + full_name=db_user.full_name, + status_change=status_change, + admin_email=current_user.email, + timestamp=timestamp, + ) + send_email( + email_to=db_user.email, + subject=email_data.subject, + html_content=email_data.html_content, + ) + except Exception as e: + logger.info(f"Failed to send status change email to {db_user.email}: {e}") + + elif changed_fields: + try: + email_data = generate_admin_profile_update_email( + email_to=db_user.email, + full_name=db_user.full_name, + changed_fields=changed_fields, + admin_email=current_user.email, + timestamp=timestamp, + ) + send_email( + email_to=db_user.email, + subject=email_data.subject, + html_content=email_data.html_content, + ) + except Exception as e: + logger.info(f"Failed to send admin update email to {db_user.email}: {e}") + + return db_user +``` + +Here Claude has successfully understood the existing code structure and has created a new function that follows a similar pattern as before. +The functions to generate emails and their usage are nearly identical with the existing code. + +But here's the thing, if this were a PR, I'd reject it. + +### Why This Doesn't Work + +The existing approach to sending notification emails has reached its critical mass. +We now have 6 duplicate functions doing essentially the same thing. +About 60% of the endpoint code is just handling notification emails. + +There are also two dangerous gotchas in the code: + +1. The code catches bare `Exception`s and logs them at `info` level - a surefire way to leave logical bugs undiscovered forever. +2. The HTML templates used by generator methods don't actually exist. + + +### Conflicting Optimization Objectives + +Better design would require removing the existing email function and replacing it with an abstraction. +But that means making opinionated decisions about the codebase. + +This is RLHF-induced sycophancy at work. +Not only does it respond to you with "you're perfectly right!", it considers your code as such. +Well, unless you're terribly off course. +Claude saw one way of doing emails and assumed that was the way. + +From a product perspective, this makes sense. I rarely want an LLM coding agent making radical changes to my codebase. + +Instead of being a problem, this is an opportunity to delineate responsibilities. + +## Reclaim Your Role as an Architect + +When LLMs handle the implementation details, our role as reviewers fundamentally changes. +We're no longer line-by-line code inspectors catching syntax errors or style violations. +Instead, we become architectural guardians, focusing on the bigger questions: +Are we solving the right problem? +Is this the right abstraction? +Will this scale? + +This shift isn't just about adapting to AI tools; it's about reclaiming the most intellectually rewarding part of software development—the design decisions that shape how our systems grow and evolve. + +The first line of defence is a good ol' whiteboard meeting. +I've noticed myself pulling a colleague aside for a design discussion more frequently. +Especially as I know that once I have a solid high-level design, the (LLM-assisted) implementation will be easy. + +Sometimes, just implementing a feature is faster than having a long discussion around it. +Normally, I'm hesitant to suggest drastic architectural changes in PRs. +The tradeoff between clean software architecture and time taken to implement it is rarely worth it in a startup. +Especially, if there's an existing solution. +However, as the code can be easily generated, that's no longer a valid concern. +That's why, when reviewing PRs, I now pay more attention to high-level design decisions. + +As a reviewer, it is difficult to think about both the big picture design and line-by-line implementation all at once. +That's why I outsource most of the low-level thinking to automated tooling. +Tests and classic standard analyzers are helpful here. +Obviously we use Recurse ML as our safety net. + +However, one issue I find myself battling with is that LLMs tend to repeat the same mistakes over and over again. +This is where [custom rules](https://docs.recurse.ml/gh/configs/rules/) are particularly helpful. +Think of them as code review comments that automatically check every PR. +When the tool spots code that violates these patterns, it leaves a comment explaining the issue. + +I've learned from [Nate](https://natesesti.com) @ Continue.dev the following rule of thumb: + +> Whenever I leave a comment in a PR, I ask myself, should this be an automatic rule? +> Most of the time the answer is "yes" and I create it. + +For example, I created [`bare_exceptions.md`](https://github.com/continuedev/awesome-rules/blob/main/rules/recurse-ml/bare_exceptions.md) to prevent the bare `Exception` case from making it into our production codebase. + + +## Conclusion + +LLMs create mediocre architecture because they optimize for consistency with existing patterns, not optimal design. +This isn't a bug—it's how they're trained to be helpful rather than opinionated. + +This limitation is an opportunity. +When LLMs handle implementation, we get to focus on the hard architectural decisions that actually matter. + +The practical takeaways: +1. Design first, implement second. Pull a colleague aside for that whiteboard session. +2. Review PRs for architecture, not syntax. Let tests and linters catch the small stuff. +3. Turn repeated review comments into automated rules. If you're saying it twice, automate it. +4. Don't hesitate to suggest refactoring. When implementation is cheap, good design becomes affordable. + +Embrace the promise of LLMs to free humans from rote work towards creative high-level tasks. +Designing elegant software is hard. +It requires understanding existing code, having a mental model of people who will maintain it and the environment it will run in. +However, it's also fun. +It's creative in the same way that writing or visual art is creative. +The space of viable solutions is vast. +Greatness is highly subjective and context-dependent. +But creating something, that other people appreciate and build upon is one of the most rewarding parts of the human experience.