What a month for AI coding.
Anthropic got 16 Claude instances to build a C compiler (with some caveats). OpenAI shipped GPT-5.3-Codex, and it's good! And then OpenClaw — a vibe-coded AI agent with 230K GitHub stars — became a full-blown security crisis, with malicious packages showing up on pub.dev too.
There's a new Flutter release as well, so let's start there. 👇
Flutter 3.41 + Dart 3.11
Flutter 3.41 landed earlier this month with 868 commits from 145 contributors. The ongoing work to decouple Material and Cupertino into standalone packages continues, and this release brings several other improvements:
- Fragment shader improvements: Synchronous image decoding (
decodeImageFromPixelsSync) eliminates frame lag, and 128-bit float texture support unlocks GPU-accelerated photo filters. - Impeller bounded blur: Fixes color bleeding on translucent widgets using
BackdropFilter. - Widget Previewer: Better VS Code and IntelliJ integration, plus support for dependencies on platform-specific libraries like
dart:ffianddart:io. - New Getting Started Experience: The redesigned Learn section on the Flutter website, powered by Jaspr.
- 2026 Roadmap: Four stable releases planned. WebAssembly is on track to become the default for Flutter Web (40% faster load times, 30% less runtime memory).
For the full details, read the official blog post:
As for Dart 3.11, this release focuses on tooling improvements rather than new language features. The highlights include faster analyzer plugins (~10s saved on startup), a new dart pub cache gc command to reclaim disk space, enhanced dot shorthand tooling, and pub workspace glob support. The Dart & Flutter MCP Server was also improved. 🤖
For all the details:
For more context about Flutter's position in 2026 (including a full timeline of significant milestones over the last year), check out this State of Flutter 2026 article.
AI News
This was a big month for agentic AI coding. Two stories in particular stood out to me.
📝 Building a C Compiler with a Team of Parallel Claudes
Anthropic researcher Nicholas Carlini orchestrated 16 Claude instances to build a C compiler in Rust over nearly 2,000 Claude Code sessions and ~$20K in API costs. The result: a 100,000-line compiler with a 99% pass rate on standard compiler test suites.
The multi-agent engineering appears to be genuinely impressive:
- Agents claimed work from a shared task queue via lock files, with git handling merges and conflict resolution.
- Each agent had specialized roles: core compilation, code deduplication, performance optimization, documentation, and architecture.
- Tests served as the primary feedback mechanism, with comprehensive suites using SQLite, Redis, libjpeg, and the GCC torture tests.
Here's the full write-up:
However, Anthropic's marketing of the project attracted legitimate criticism. ThePrimeagen published a detailed response pointing out that the framing of "from scratch, no human intervention" was misleading:
- Not "from scratch": Claude was trained on GCC (open source), and used its 37-year-old torture test suite for validation, plus GCC itself as an "online Oracle" to verify output. Starting "from scratch" with the perfect test suite and a reference implementation to test against is not quite the same thing as starting from a blank slate.
- Not "no human intervention": agents crashed and required restarting during the two weeks.
- Can't actually boot Linux: the compiler lacks a 16-bit x86 code generator small enough to meet Linux's 32KB real mode limit.
- Hello World didn't compile: the README example failed because the project lacks a linker.
So what's the real takeaway? As ThePrimeagen himself acknowledges: getting 16 AI agents to run mostly autonomously for two weeks and produce a substantial, functional piece of software is genuinely cool. It shows that models are improving and can handle projects at this scale with the right orchestration. That's the real story — no hype needed.
Chris Lattner (creator of LLVM, Swift, and Mojo) also published a thoughtful analysis, calling it "real progress, a milestone for the industry" while noting that the compiler reproduces decades of established engineering rather than inventing novel abstractions. His core thesis: as coding becomes cheaper, the real challenge becomes choosing the right problems and managing the resulting complexity.
📝 GPT-5.3-Codex: Full Autonomy Has Arrived?
Earlier this month, OpenAI released GPT-5.3-Codex, and the early reviews are eye-opening.
Matt Shumer's hands-on review describes it as the first model where "full autonomy starts feeling operationally real." In practice, this means you can specify an outcome, set up validation criteria, press go, and come back hours later to find the work done — including code changes, GitHub pushes, deployments, and log monitoring.
Key highlights:
- Can run tasks for 8+ hours without degradation.
- 25% faster than GPT-5.2-Codex, and tops SWE-Bench Pro and Terminal-Bench 2.0.
- Self-improvement capabilities: the model debugged its own training run and scaled its own GPU clusters.
I've been testing GPT-5.3-Codex for my own Flutter work over the past few weeks, and I have to say: I'm finding it faster, cheaper, and sharper than Opus 4.6 — often surfacing edge cases and insights that Opus misses. With that said, while "8+ hours without degradation" may be possible, my workflows require frequent human oversight and I care more about output quality than "how long it can run".
The OpenClaw Security Crisis
Now for the other side of the coin. If the Claude compiler pushes the boundaries of AI autonomy, the OpenClaw saga shows what happens when the worst practices collide with rapid adoption.
OpenClaw (formerly Clawdbot/Moltbot) is an open-source AI agent that exploded to 150K+ GitHub stars. It connects to LLMs and can autonomously execute tasks through messaging platforms like WhatsApp, Telegram, and Slack. Sounds cool, right?
The problem: OpenClaw requires broad permissions to function (email, calendars, messaging platforms, file system), and misconfigured or exposed instances quickly became a magnet for attackers. One of OpenClaw's own maintainers warned on Discord: "if you can't understand how to run a command line, this is far too dangerous of a project for you to use safely." Within weeks, the security issues piled up:
- 40,000+ instances exposed on the public internet because the gateway binds to
0.0.0.0by default. - API keys stored in plaintext markdown and JSON files.
- 12-20% of ClawHub marketplace skills were malicious, with the ClawHavoc campaign distributing Atomic Stealer to harvest crypto keys, SSH credentials, and browser passwords.
- CVE-2026-25253 (CVSS 8.8): a one-click remote code execution exploit where visiting a single malicious webpage is enough.
- Prompt injection attacks already seen in the wild, including crypto wallet drain attempts.
For the full details, here's CrowdStrike's breakdown:
And it gets worse. In a case of meta-irony, Cline CLI was also compromised via a supply chain attack that silently installed OpenClaw on ~4,000 developer machines. The root cause? Prompt injection exploiting AI-assisted GitHub workflows to steal npm publish credentials. An AI coding tool, compromised via an AI-specific attack vector. 😬
For entertainment value, here's a video of OpenClaw deleting an entire inbox:
Why Flutter Devs Should Care
This isn't just a general security story. The r/FlutterDev community has been flagging OpenClaw-generated packages appearing on pub.dev — vibe-coded packages that lack proper testing, security review, and sometimes contain hidden dependencies or malicious code.
The broader lesson: AI agents can ship code at unprecedented speed, but that speed makes proper security practices more important, not less. Treat community-generated skills and packages with the same skepticism you'd give a random npm package from a stranger.
AI Articles
Beyond the headline stories, I bookmarked some excellent articles this month that are good food for thought.
📝 The Software Development Lifecycle Is Dead
Boris Tane argues that AI agents haven't just accelerated the SDLC — they've dismantled it. The traditional sequential stages (requirements → design → implementation → testing → review → deployment → monitoring) didn't get faster. They merged into a single, tight feedback loop.

His key insights:
- Requirements are now fluid and iterative rather than frozen specifications.
- Code review via PRs becomes a bottleneck when agents generate hundreds of changes daily. Self-verification and second-agent reviews are replacing human code review for routine changes.
- Observability becomes the primary safety mechanism, with monitoring feeding back to agents for automatic fixes.
- "Context engineering" replaces process management as the new critical skill.
Read the full article:
I've been using a traditional "spec → plan → implement → review → ship" cycle in my own work. But I'm starting to notice that if the spec is solid and the agent has its own verification loop (TDD helps greatly here), manual code reviews become less important. The discipline shifts upstream — getting the requirements right matters more than ever.
📝 The Importance of Artifacts in AI-Assisted Programming
Nicholas Zakas makes a compelling case for why documentation isn't optional when coding with AI. His core point: AI has no memory beyond its context window. It can't tell you why it made a decision six months ago that brought down your server today.
His recommended artifacts:
- Product Requirements Documents (PRDs): Capture the what and why.
- Architectural Decision Records (ADRs): Immutable records of technical choices and their rationale.
- Technical Design Documents (TDDs): The how of implementation.
- Task Lists: Granular work items with dependencies and acceptance criteria.
If you're using AI coding tools and not maintaining these artifacts, this article will change how you think about documentation:
These two articles pair well together: as the SDLC collapses into tight feedback loops, documentation artifacts become the new source of truth that compensates for what both AI and humans forget over time.
I also recommend Code Is Cheap Now. Software Isn't — a thoughtful piece arguing that while LLMs have made code generation nearly free, the barrier to building meaningful software remains unchanged. Engineering value is shifting from syntax mastery toward architectural thinking, taste, and knowing where not to cut corners. This echoes Chris Lattner's thesis perfectly.
Latest from Code with Andrea
I've been quiet on the content front lately, and for good reason: I've been heads-down building an agentic coding toolkit for Flutter — and dogfooding it heavily.
Here are a few Flutter web apps I built entirely with my spec-driven workflow (no manual coding):
If you're curious what the generated code looks like, I've open sourced the Currency Converter on GitHub:
Andrej Karpathy pointed out that since December, "the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks". Just like him, I've now had multiple occasions where agents "ran into multiple issues, researched solutions online, resolved them one by one". For me, this is truly unlocking new possibilities.
Until Next Time
The toolkit is shaping up well and I'm hoping to launch it soon. I want to get it right — something you can actually use to write quality Flutter code, faster.
Thanks for reading, and happy coding! 🎉





