If you blinked this week, you missed a model war. OpenAI dropped GPT-5.5 on Wednesday; DeepSeek shipped V4 – open-source, 1.6T parameters, MIT – within hours. Meanwhile, Kubernetes 1.36 landed with gang scheduling for AI batch workloads, GrafanaCON in Barcelona delivered Grafana 13 and a ground-up Loki rearchitecture, and Vercel disclosed an OAuth supply-chain breach that should make every platform team audit their third-party integrations today. This was one of those weeks where multiple tectonic plates moved at once.

AI and Machine Learning

OpenAI releases GPT-5.5 – “a new class of intelligence for real work” – OpenAI shipped GPT-5.5 on April 23, calling it their smartest model yet. The headline numbers: 82.7% on Terminal-Bench 2.0, 58.6% on SWE-Bench Pro, and strong gains on internal Expert-SWE evaluations for long-horizon coding. The model matches GPT-5.4 per-token latency while using fewer tokens on equivalent Codex tasks, meaning it’s both smarter and cheaper to run. It’s rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex, with API access coming soon. Source

DeepSeek ships V4 – open-source, 1.6T parameters, same-day counter-launch – Hours after GPT-5.5, DeepSeek released V4 in preview: a 1.6T-parameter MoE model (49B activated) with a 1M-token context window, MIT license, and weights on Hugging Face. The Flash variant (284B total, 13B activated) is separately trained, not a trimmed Pro. API pricing lands at $1.74/$3.48 per million tokens for Pro – significantly cheaper than Opus 4.7, GPT-5.5, or competitors. V4 introduces Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to cut KV cache requirements by roughly 10x. The model was pre-trained on 32T+ tokens using FP4+FP8 mixed precision. Huawei also announced its Ascend 950-based supernode will fully support V4. Source

Anthropic launches Claude Opus 4.7 with stronger coding and high-res vision – Anthropic released Claude Opus 4.7 on April 16, shipping notable improvements over Opus 4.6 in advanced software engineering – particularly on the hardest tasks. The model can now accept images up to 2,576 pixels on the long edge (3x the prior limit), making it significantly better at interpreting charts, dense documents, and UIs. A new “xhigh” effort level gives developers control over the reasoning-depth-vs-latency tradeoff. Pricing stays at $5/$25 per million tokens. Alongside the launch, Anthropic introduced a Cyber Verification Program that gives vetted security researchers full model access without standard restrictions – a direct response to the Project Glasswing findings about AI cybersecurity risk. Source

Site Reliability Engineering

Kubernetes 1.36 “Haru” ships with 68 enhancements, including gang scheduling for AI workloads – Kubernetes 1.36 dropped on April 22 with 18 stable, 25 beta, and 25 alpha features. The standout for AI teams: gang scheduling enters alpha, letting the scheduler guarantee all-or-nothing pod placement for distributed training and batch inference jobs instead of processing pods individually (which wastes resources when only half a gang gets scheduled). Fine-grained kubelet API authorization goes GA, applying least-privilege access to individual kubelet endpoints – a big win for observability setups that previously required broad node proxy permissions. Resource Health Status moves to beta with DRA support, so you can now determine whether pod crashes correlate with unhealthy or unknown device states on GPUs and other dynamic resources. User namespaces also graduate to stable. Source

Vercel discloses OAuth supply-chain breach originating from third-party AI tool – On April 19, Vercel disclosed a security incident that’s become this week’s must-read postmortem. The attack didn’t start at Vercel – it started at Context.ai, a third-party AI SaaS, where an employee was infected with Lumma Stealer. The attacker used stolen OAuth tokens to pivot through Google Workspace into a Vercel employee’s corporate account, then accessed internal environments and read a subset of customer environment variables not marked “sensitive.” The detection gap: roughly 2 months, from February 2026 to April 2026. Mandiant and law enforcement are involved. The lesson is uncomfortable: your supply chain isn’t just your code and npm packages anymore – it’s every SaaS app your employees authenticate with via OAuth. Source

Gateway API v1.5 ships biggest release yet – six features graduate to stable – Gateway API v1.5, released in March but getting widespread adoption this month, is the project’s largest release to date. Six widely-requested features move to the Standard (stable) channel: ListenerSet for multi-tenant listener management, TLSRoute, HTTPRoute CORS Filter, Client Certificate Validation, Certificate Selection for Gateway TLS Origination, and ReferenceGrant. The project has also moved to a release-train model, which should produce a more predictable cadence going forward. For teams still on Ingress: this is the release that closes most of the remaining feature gaps. Source

Observability

GrafanaCON 2026: Grafana 13 launches with suggested dashboards, Git Sync GA, and AI Assistant everywhere – Grafana Labs held GrafanaCON in Barcelona this week and shipped a stack of updates. Grafana 13 introduces suggested dashboards (auto-surfacing pre-built dashboards for connected data sources), dynamic dashboards GA, and a Graphviz panel. Git Sync is now generally available, enabling native GitOps for dashboard management. The AI-powered Grafana Assistant is expanding beyond Grafana Cloud to OSS and Enterprise users via a one-click Cloud connection. The new Grafana Advisor tool runs health checks and provides optimization recommendations. Grafana now counts 35 million users worldwide. Source

Grafana rearchitects Loki with Kafka – eliminates 2.3x storage overhead – The most technically significant GrafanaCON announcement: Loki is getting a new Kafka-backed ingestion architecture. The current design replicates every log line to three ingesters for HA, but drift between ingesters means deduplication fails – Grafana’s internal metrics show they store 2.3x the data they ingest on average. The new architecture replaces replication-at-ingestion with Kafka as the durability layer, eliminating this overhead entirely. The result: 10x faster aggregated queries and dramatically lower storage costs. This is the kind of foundational change that makes self-hosted Loki significantly more viable at scale. Source

Grafana Labs targets the “AI blind spot” with LLM observability and agent-aware CLI – Also at GrafanaCON: Grafana launched AI Observability in Grafana Cloud (public preview) for monitoring LLM-powered applications in production – tracking token usage, latency, error rates, and evaluation metrics in real time. They also shipped GCX, a new agent-aware CLI designed to surface Grafana Cloud data inside agentic development environments. And o11y-bench, an open-source benchmark for evaluating AI agents running observability workflows, is now available. Source

OpenTelemetry formally deprecates OpenTracing compatibility requirements – The OpenTelemetry Specification project merged the deprecation of OpenTracing compatibility requirements, effective as of March 2026 with earliest removal no sooner than March 2027. Implementing new OpenTracing compatibility is no longer required for new SDKs. Existing shims can be maintained for backwards compatibility during the deprecation period, but all new work should target native OTel APIs and OTLP-based workflows. If you’re still depending on an OpenTracing shim, now is the time to plan your migration. Source

  • Kong launches Agent Gateway for multi-agent AI traffic – AI Gateway 3.14 adds governance for agent-to-agent communication alongside LLM and MCP traffic, with unified observability in Konnect. Kong
  • OpenTelemetry eBPF Instrumentation (OBI) hits beta – The successor to Grafana Beyla reaches v0.8.0 with zero-code observability for Kubernetes, targeting 1.0 GA in late 2026. DEV Community
  • Auto-diagnosing Kubernetes alerts with HolmesGPT – A two-person SRE team at STCLab shares how they built an AI investigation pipeline using HolmesGPT (CNCF Sandbox) with runbook-driven investigation. CNCF Blog
  • Skyscanner shares OTel scaling story – Managing OpenTelemetry collectors across 24 production clusters with 1,000+ microservices. OpenTelemetry Blog
  • Kubernetes AI Gateway Working Group + Agent Sandbox – The community is building two complementary primitives: AI Gateway for inference traffic routing (KV-cache-aware scheduling) and Agent Sandbox for isolated, stateful agent runtimes. Kubernetes Blog
  • Grafana Cloud observability survey: 77% of orgs now lean on open source/open standards – Yet 38% still cite complexity as their top challenge. Grafana Labs

My Take

The same-day GPT-5.5 / DeepSeek V4 launches are the clearest signal yet that frontier AI is a commodity race. DeepSeek shipping 1.6T parameters under Apache 2.0 at a fraction of proprietary pricing means the moat isn’t model intelligence – it’s the ecosystem you build around it. OpenAI is leaning hard into agentic coding and tool use; DeepSeek is betting that open weights and aggressive pricing will win the long game. Anthropic, meanwhile, is carving out a niche with safety-first features like the Cyber Verification Program.

What interests me more is what’s happening in the infrastructure layer underneath. Kubernetes 1.36’s gang scheduling is a direct response to AI workloads that don’t fit the one-pod-at-a-time scheduling model. The AI Gateway Working Group is tackling inference-specific routing (KV-cache pressure, LoRA adapter selection) that standard Gateway API can’t handle. Grafana’s Loki rearchitecture and AI Observability launch show that the observability stack is simultaneously being rebuilt for efficiency and extended to cover LLM workloads. And the Vercel breach is a sobering reminder that as we rush to integrate AI SaaS tools into our workflows, every OAuth grant is a potential lateral movement path.

The convergence is real: AI is driving infrastructure changes, infrastructure is being built to serve AI, and observability is racing to cover both. The teams that will navigate this well are the ones treating AI not as a separate domain but as another class of distributed system that needs the same rigor around scheduling, security, and observability.


Thanks for reading this week’s roundup. If something here caught your eye or I missed a story you think deserves attention, I’d love to hear about it – reach out on LinkedIn. See you next week.