AI Architect News

Vol. MMXXVI·No. 1·How intelligence gets built
11 reading now
2026
5
May
News·Models

OpenAI makes GPT-5.5 Instant the new ChatGPT default

GPT-5.5 Instant replaces GPT-5.3 Instant as ChatGPT's default model, with better benchmarks and fewer hallucinations in law, medicine, and finance — plus richer context management and memory-source controls.

ChatGPT showing a dinner recommendation with a Sources panel for saved memories.
MAI logo over a blurred green and orange background.
News·Models

Introducing MAI-Thinking-1

Microsoft AI introduced MAI-Thinking-1, a medium-sized reasoning model for complex problems with strong coding and math performance, trained on clean, traceable enterprise-grade data.

News·Agents

Introducing FrontierCode

A coding eval built from 40+ hour tasks by open-source maintainers to test whether model-written code is actually mergeable.

A hand-drawn image where a black square overlaps a white circle, revealing highlighted nodes and connections inside it.
Analysis·Research

Tracing the thoughts of a language model

Anthropic's circuit tracing maps the internal pathways of Claude — surfacing advance planning, shared cross-language concepts, and moments where a model's stated reasoning diverges from what it actually computes.

Liquid-cooled multi-GPU server with MI355X GPUs alongside MLPerf 6.0 results.
News·Hardware

AMD's MI355X clears 1M tokens/sec on MLPerf 6.0

AMD's MLPerf Inference 6.0 run puts MI355X GPUs past 1 million tokens/sec at multinode scale, with single-node results competitive against NVIDIA's B200 and B300.

Repost·Agents

Serverless agents and the durable execution problem

Managed agents scale better when their loop runs as stateless serverless functions, with durable state in the data layer and tool execution handled by backend systems. This architecture avoids sandbox overhead, fragmented artifacts, and brittle coordination while letting agents integrate into business workflows.

Featured
Repost·Industry Articles

The Rise of the Agent Runtime

Every major vendor is quietly rebuilding fragments of the same missing runtime layer, and none ships the whole thing.

AI Agent
Golem social preview image
Repost·Machine Learning

How LLMs Actually Work

A from-the-ground-up walkthrough of how modern LLMs work, from tokens to transformer blocks to the next-token loop

LLM
Transformer pipeline from tokenization to next-token prediction
Repost·Agents

Durable Agent Workflows Are Just Data Pipelines

Orchestrating agents at scale isn't a new problem — it's the same problem every system has: define a unit of work, persist it, and resume from the log. Durability is a logging discipline; scale and governance are different problem classes, and knowing the difference is the whole game.

Architecture
Repost·engineering

Modern Engineering Values

Coding agents make software execution dramatically faster, moving engineering work toward ownership, taste, guardrails, repo-local context, stack control, and option value. Teams that keep feedback loops tight and technical judgment close to the code can ship more without treating AI-generated code as a substitute for engineering direction.

principles
Ollamav0.30.8
ollama/ollama
patch
  • Decoupled prompt caching from context shift for better KV-cache reuse
  • More stable MLX inference with hardened linear and embedding layers
  • MLX runner snapshots during prompt processing and speculative decoding
Show 2 moreShow less
  • Improved recurrent-model support via gated-delta kernels
  • Fixed ollama launch selecting the wrong provider
174k · Go · 2026-06-12
vLLMv0.23.0
vllm-project/vllm
minor
  • DeepSeek-V4 hardened across backends with new attention kernels and EPLB
  • Model Runner V2 now default for Llama and Mistral dense models
  • Encoder-free Gemma 4 Unified support, plus Gemma 4 MTP
Show 2 moreShow less
  • Multi-tier KV-cache offloading adds an object-store secondary tier
  • Experimental Rust frontend gains streaming generate and dynamic LoRA
82.8k · Python · 2026-06-12
LangGraph1.2.5
langchain-ai/langgraph
patch
  • Released langgraph 1.2.5
  • Fixed merging lc_versions config metadata
  • Fixed an updateState bug for deltaChannel on empty threads
Show 2 moreShow less
  • Migrated Python type checking to ty
  • Shipped langgraph-cli 0.4.28
34.7k · Python · 2026-06-12
Cloudflare Agentsagents@0.16.0
cloudflare/agents
minor
  • Rebuilt agents/browser on the codemode runtime as one durable browser_execute tool
  • Sandboxed CDP code runs with abort-and-replay, pausing mid-run for approval
  • Browser session modes: one-shot, reuse, and dynamic shared sessions
Show 2 moreShow less
  • Fixed RPC calls hanging forever during connection churn
  • agents/chat adds pausedExecutionUpdate for human-in-the-loop durable runs
5.1k · TypeScript · 2026-06-12
Claude Agent SDKv0.2.101
anthropics/claude-agent-sdk-python
patch
  • Typed system/task_updated events as TaskUpdatedMessage
  • Active-task tracking no longer hangs on a terminal task_updated
  • Added TaskUpdatedStatus and TERMINAL_TASK_STATUSES helpers
Show 1 moreShow less
  • Bundled Claude CLI updated to 2.1.177
7.3k · Python · 2026-06-13
In this edition

Welcome to the debut release of AI Architect News. We open on a frontier-model wave — Claude Fable 5 and Mythos 5, OpenAI's GPT-5.5 Instant, MAI-Thinking-1, the Fusion API, and Moonshot's Kimi K2.7-Code — set against the quieter shifts that decide how intelligence gets built: agent tooling and durable runtimes, the scramble for inference capacity from Maia 200 to AMD's MLPerf run, and new tools for seeing inside the models. We close in the gallery, with what those models now make — image, video, and music generated from a prompt.

Editor's note

A debut is a thesis. Ours: the story is no longer just bigger models, but the systems around them — runtimes, silicon, harnesses, and the tools that let us watch a model think. The interesting work is between the layers, not only inside them — and that is the beat we will keep.