Ping to Production

LiteLLM Supply Chain Attack: The AI Package That Turned Into a Fork Bomb (But Stole Your AWS Keys First)

Akshat Sinha — Wed, 25 Mar 2026 11:39:25 GMT

Why I run my AI agents in isolated infra , and why March 24th proved me right

I get a lot of raised eyebrows when I tell people I run all my AI agent experiments in isolated infrastructure. Separate VMs. No cloud credentials mounted. Network egress restricted. "Isn't that overkill for hobby stuff?" Maybe. But then March 24, 2026 happened, and I spent the day watching my timeline fill with developers discovering their SSH keys, AWS credentials, Kubernetes configs, and crypto wallet seeds had just been quietly exfiltrated , courtesy of a Python package they trusted.

The package was litellm. The attack was elegant, the cleanup was brutal, and the lessons are directly relevant to anyone who uses AI tooling on their dev machine. Let me walk you through it.

What Even Is LiteLLM?

Quick context if you're not deep in the AI stack: LiteLLM is a Python library that gives you one unified API to talk to 100+ LLM providers , OpenAI, Anthropic, Gemini, Bedrock, you name it. Instead of writing provider-specific code for each model, you call LiteLLM and it handles the translation layer.

It's extremely popular. About 3.4 million downloads per day popular.

It gets used in two ways: as a Python SDK in your code, and as a standalone proxy server that your entire org routes model calls through. That second use case is important. When you run LiteLLM as a proxy, that one machine holds API keys for every AI provider your team uses. From an attacker's perspective, that's a very interesting machine to compromise.

How the Attack Got In (This Part Is Wild)

The attackers didn't phish a developer. They didn't brute-force anything. They played the long game through the toolchain.

Five days before the attack, they compromised trivy-action , the GitHub Action for Trivy, an open-source container security scanner made by Aqua Security. They rewrote the Git tags in the repo to point to a malicious release. LiteLLM used Trivy in its CI/CD pipeline, pulling it from apt without a pinned version.

Think about that for a second. A security scanner was the entry point.

When LiteLLM's CI next ran, it pulled the poisoned Trivy action, which silently exfiltrated the PYPI_PUBLISH token from the GitHub Actions runner environment. The attackers now had direct publish rights to the litellm package on PyPI.

The day before the attack, they registered models.litellm.cloud , a domain crafted to look official, registered a single day before it would be used as the exfiltration endpoint.

On March 24, they published two malicious versions in 13 minutes.

Here's the detail that should give any DevOps person pause: neither version appears anywhere in the LiteLLM GitHub release history. The repo only goes up to v1.82.6.dev1. Versions 1.82.7 and 1.82.8 were uploaded directly to PyPI using the stolen token, bypassing every CI/CD workflow, every review, every safeguard the team had in place. The package registry was updated. The repository never was.

The Two Payloads

The attackers published two versions, each with a different delivery mechanism , suggesting they were iterating in real time.

v1.82.7 embedded the malicious payload inside litellm/proxy/proxy_server.py. It fires when anything imports litellm.proxy, which is the standard import path for running LiteLLM's proxy server.

v1.82.8 went further. It added a file called litellm_init.pth to site-packages. If you're not familiar with .pth files: Python automatically executes them on every interpreter startup. Not on import. Not on first use. On every startup , including when you run pip, when your IDE's language server initializes, when a subprocess spawns. No import litellm required, ever.

The .pth payload looks like this:

import os, subprocess, sys
subprocess.Popen([sys.executable, "-c", "import base64; exec(base64.b64decode('...'))"])

Double base64-encoded, so it survives naive grep. And here's the kicker: the file is correctly declared in the wheel's RECORD with a valid checksum:

litellm_init.pth,sha256=ceNa7wMJnNHy1kRnNCcwJaFjWX3pORLfMh7xGL8TUjg,34628

pip install --require-hashes would pass. You're verifying you received exactly what the attacker published, and you did. The integrity guarantees of the package ecosystem assume the signing credentials are trustworthy. Once those are stolen, that assumption is gone.

What the Malware Actually Did

The community reverse-engineered the decoded payload within hours of the disclosure. Here's what it harvested from affected machines:

Credentials and keys: SSH private keys (id_rsa, id_ed25519, id_ecdsa), .env files, shell history across bash/zsh/sh, .netrc, .npmrc, .vault-token, .git-credentials, .gitconfig.

Cloud credentials: AWS credentials file + config, live IMDS queries with full IMDSv2 signing to pull from Secrets Manager and SSM Parameter Store, GCP application default credentials, Azure token files.

Kubernetes: ~/.kube/config, all of /etc/kubernetes/*.conf, in-cluster service account tokens and certificates.

Containers: Docker config.json from every known path, including Kaniko.

CI/CD configs: terraform.tfvars, .gitlab-ci.yml, .travis.yml, Jenkinsfile, .drone.yml. Anything that might contain secrets that ended up in version-controlled configs.

Databases and services: .my.cnf, .pgpass, .mongorc.js, Redis CLI history.

Crypto wallets: Bitcoin, Ethereum keystore, Solana, Cardano, Monero, Ledger hardware wallet configs.

Everything was encrypted (AES-256-CBC, session key wrapped with a hardcoded 4096-bit RSA public key) and POST'd to https://models.litellm.cloud/. Temporary files , session.key, payload.enc, tpcp.tar.gz , were created in /tmp during exfiltration.

If the malware found a Kubernetes service account token, it went further. It read all secrets across every namespace in the cluster, then attempted to deploy privileged alpine:latest pods to every node in kube-system, mounting the host filesystem and installing a persistent backdoor. Pods were named node-setup-{node_name}.

The backdoor itself , written to ~/.config/sysmon/sysmon.py and registered as a systemd user service , polls https://checkmarx.zone/raw every five minutes for a URL and executes whatever it finds. The attacker can push live payloads to compromised machines at will.

The Bug That Saved People (Accidentally)

Here's the most darkly ironic part of this whole story.

The .pth mechanism fires on every Python startup. The first thing the payload does is spawn a new Python subprocess. That subprocess also triggers .pth execution since litellm_init.pth is still in site-packages. Which spawns another. Which spawns another.

An unintended fork bomb , a bug in the malware itself.

This is why Callum McMahon at FutureSearch noticed anything was wrong in the first place. His 48GB Mac ground to a halt. htop took tens of seconds to open. 11,000 processes running. Without that mistake, the payload would have exfiltrated credentials silently in the background, planted its backdoor, cleaned up temp files, and disappeared. Nobody would have known until someone tried to use a rotated key and found it already being used.

As Andrej Karpathy put it on X: the malware's own poor quality is what made it visible.

The Disclosure: Community 1, Attackers 0

Once Callum's team identified the malicious package, they posted a detailed technical disclosure in GitHub issue #24512 at 11:48 UTC. It hit Hacker News about 45 minutes later and reached 324 points.

The attackers responded by flooding the issue with 88 bot comments from 73 previously-compromised developer accounts in a 102-second window. Then they used the stolen krrishdholakia maintainer account , the actual LiteLLM CEO's account , to close issue #24512 as "not planned."

The community opened a new tracking issue (#24518), noted what had happened, and kept the discussion alive on Hacker News. PyPI quarantined both versions at ~13:38 UTC. Total exposure window: about three hours.

By 15:09 UTC, the LiteLLM maintainers confirmed all GitHub, Docker, and PyPI credentials had been rotated and maintainer accounts moved to new identities. Google's Mandiant team was brought in for forensic analysis of the build pipeline.

Major downstream projects , DSPy, MLflow, CrewAI, OpenHands, Arize Phoenix , filed emergency PRs to pin away from the compromised versions the same day.

This Wasn't a One-Off. It Was Phase 09.

The group behind this, tracked as TeamPCP, has been running an ongoing campaign since at least December 2025. LiteLLM was Phase 09.

The same RSA public key appears in the Trivy, KICS (a Checkmarx IaC scanner), and LiteLLM payloads. Same tpcp.tar.gz naming. Same infrastructure registrar. The target selection across all three is deliberate: each is a tool that requires elevated, broad access to the systems it operates on. A container scanner, an IaC scanner, an LLM gateway , all of them sit deep inside CI/CD pipelines and developer machines, with legitimate reasons to read credentials.

TeamPCP also deployed something called CanisterWorm, which uses the Internet Computer Protocol (ICP) as a C2 channel. ICP canisters can't be taken down by domain registrars or hosting providers. They're also apparently using an AI agent for automated attack targeting. Supply chain attacks are now getting automated. Fun times.

What This Means If You Run AI Tooling (Read: Probably You)

Here's the thing that makes this incident different from a typical npm leftpad situation. The AI developer ecosystem has converged on patterns that are genuinely great for productivity and genuinely terrible for security:

uvx and npx auto-pull the latest version of everything. When Cursor loads an MCP server, it runs it via uvx, which automatically resolves and downloads dependencies. Unpinned, from the internet, on your dev machine, which has your AWS credentials, SSH keys, and Kubernetes config sitting in well-known paths that have been in default locations for twenty years.

Transitive dependencies are invisible. Callum didn't install litellm. His MCP server had an unpinned litellm dependency. uvx pulled the latest version, which happened to have been maliciously published 13 minutes earlier. The attack surface was a dependency of a plugin of an IDE.

LLM gateways are credential aggregators by design. If you're running LiteLLM as a proxy , which is the recommended production pattern , that machine holds API keys for every model provider you use. Compromising it is a one-stop shop.

For what it's worth: this is exactly why I run AI experiments in isolated infra. Not because I'm paranoid, but because the ergonomics of the AI tooling ecosystem , auto-pulling dependencies, local execution, broad filesystem access , are a different threat model than running a web server. A compromised nginx config doesn't exfiltrate your AWS credentials. A compromised Python package that fires on every interpreter startup might.

What You Should Actually Do

If you installed litellm between 10:39 and ~13:38 UTC on March 24, 2026, assume the machine is compromised regardless of whether you ran any application code. The .pth mechanism fires during pip install itself.

Check for the persistence backdoor:

ls ~/.config/sysmon/sysmon.py
systemctl --user status sysmon.service

Check for the .pth file:

find $(python3 -c "import site; print(' '.join(site.getsitepackages()))") \
  -name "*.pth" -exec grep -l "base64\|subprocess\|exec" {} \;

Check Kubernetes:

kubectl get pods -A | grep node-setup-

Then rotate everything: SSH keys, cloud credentials, API keys, database passwords, Kubernetes tokens. Audit AWS Secrets Manager and SSM Parameter Store if instance metadata was accessible. It's a brutal checklist but a necessary one.

Going forward, regardless of whether you were affected:

Pin your dependencies. Use lock files with checksums. Unpinned transitive dependencies are your attack surface.
Audit .pth files in your environments. Most legitimate packages don't install them. If you see one you don't recognize: that's a red flag.
Treat your dev machine like it has prod credentials. Because it probably does.
If you run MCP servers locally, check their dependency manifests. Anything pulling in unpinned versions of large, popular libraries is an exposure.
Consider isolated infra for AI agent experiments. A VM with no cloud credentials mounted, egress restricted to what it actually needs. Yes, it's friction. It's also a lot less friction than rotating all your credentials and auditing your Kubernetes cluster.

The Thing That Sticks With Me

The AI tooling security conversation usually centers on prompt injection , tricking LLMs into doing bad things, the "lethal trifecta" of tool use, memory, and exfiltration. That's a real and evolving threat.

But the attack that actually hit people on March 24th required no AI manipulation whatsoever. No jailbreaking. No clever prompt. Just stolen CI/CD credentials, a malicious PyPI upload, and Python's decades-old .pth mechanism doing exactly what it was designed to do. The most sophisticated-looking threat in the AI ecosystem was beaten by the oldest trick in the supply chain book.

The irony is that LiteLLM, a tool purpose-built to manage access to AI systems, became the delivery vehicle for an attack that had nothing to do with AI at all. It was just a package. With dependencies. In a pipeline. Like everything else.

Pin your dependencies. Isolate your infra. And maybe double-check which security scanners your CI/CD is pulling.

Terraform vs Crossplane: The Ultimate DevOps Infrastructure Showdown

Akshat Sinha — Tue, 20 Jan 2026 03:30:10 GMT

The Infrastructure Management Odyssey

Imagine it's 2 AM, and you are in a digital wrestling match with cloud configurations that seem to have a mind of their own. As a DevOps engineer, I've been there, drowning in a sea of manual deployments, battling configuration drift, and desperately seeking a way to bring order to infrastructure chaos.

Multiple cloud providers, endless configuration files, and the constant fear of inconsistent deployments have haunted me for days. Enter the game-changers: Infrastructure as Code (IaC):

Meet the Infrastructure Provisioning Titans

Terraform: The Established Veteran

Developed by HashiCorp, Terraform has been the backbone of infrastructure provisioning for years. With its declarative HashiCorp Configuration Language (HCL), it's essentially the Swiss Army knife of cloud infrastructure. Describe your entire infrastructure as code, version control it, and deploy across multiple cloud providers with surgical precision.

Crossplane: The Cloud-Native Disruptor

If Terraform is the seasoned veteran, Crossplane is the innovative newcomer challenging the status quo. Built with a Kubernetes-native approach, Crossplane reimagines infrastructure management by leveraging Kubernetes Custom Resource Definitions (CRDs). Applying a YAML to create a K8s Cluster has its own sense of satisfaction.

Deep Dive: Technical Comparison

Flexibility and Reach

Dimension	Terraform	Crossplane
Provider Support	100+ cloud providers	Multi-cloud with Kubernetes-native approach
Configuration Language	Custom HCL	Kubernetes YAML
State Management	Explicit state files	Stateless, Kubernetes reconciliation

Detailed Configuration Examples

Terraform: AWS EC2 Instance Deployment

Provision a basic web server:

resource "aws_instance" "web_server" {
  # Specific Amazon Machine Image (AMI)
  ami           = "ami-0c55b159cbfafe1f0"

  # Instance type selection
  instance_type = "t2.micro"

  # Resource tagging for management
  tags = {
    Name = "WebServer"
    Environment = "Production"
    ManagedBy = "Terraform"
  }
}

Crossplane: Kubernetes-Native Resource Provisioning

Crossplane resource definition for AWS EC2 instance:

apiVersion: ec2.aws.upbound.io/v1beta1
kind: Instance
metadata:
  name: web-server-crossplane
spec:
  forProvider:
    # Identical AMI and instance type
    imageId: ami-0c55b159cbfafe1f0
    instanceType: t2.micro

    # Enhanced metadata and region specification
    region: us-east-1
    tags:
      - key: Name
        value: WebServer
      - key: Environment
        value: Production

Performance and Architectural Considerations

1. Terraform's Approach

State Management: Maintains explicit state files.

Pros:

Predictable infrastructure tracking
Detailed change planning

Cons:

Potential state drift
Requires careful state file management

2. Crossplane's Strategy

Kubernetes Native Reconciliation: Stateless resource management.

Pros:

Dynamic resource composition
Seamless GitOps workflows

Cons:

Steeper learning curve
Kubernetes dependency

When to Choose What

Terraform is Your Best Bet If:

You require extensive multi-cloud support.
Your team is comfortable with the HashiCorp ecosystem.
You need complex, stateful infrastructure management.
Detailed change planning is crucial.

Crossplane Shines When:

Kubernetes is central to your infrastructure strategy.
You embrace GitOps principles.
Dynamic, composable infrastructure is a priority.
You want tighter integration with cloud-native tools.

Hybrid Approach: The Best of Both Worlds

In the world of infrastructure management, adopting a hybrid approach can be a game-changer. Instead of rigidly choosing between Terraform and Crossplane, consider them as complementary tools.

Use Terraform for initial, comprehensive infrastructure setup across cloud providers, and then leverage Crossplane's dynamic Kubernetes-native capabilities for ongoing, flexible management. This strategy allows you to implement each tool's unique strengths precisely where they provide the most value, creating a more adaptive and powerful infrastructure provisioning ecosystem.

The Human Element in Infrastructure as Code

Remember, no tool is universally perfect. The right choice depends on:

Infrastructure Needs: Your specific technical requirements serve as the primary navigation compass. Understanding the unique architectural demands of your project is essential.
Team Expertise: The skill set and comfort level of your team influence tool selection. A tool that aligns with your team's existing knowledge can speed up implementation and reduce the learning curve.
Cloud Environment Complexity: Whether you're managing a simple single-cloud deployment or a complex multi-cloud ecosystem, your chosen tool must provide the flexibility and robustness to handle your current and future infrastructure landscape.
Long-term Vision: Look beyond immediate requirements. Select a tool that can scale, adapt, and support your architectural roadmap, ensuring your infrastructure can evolve seamlessly with your organizational growth and technological ambitions.

P.S. If you are on AWS, do check out my colleague's article on Karpenter and how it helped us move from Reactive Scaling to Developer-Aware scaling: Autoscaling Evolved: Our Journey with Karpenter

Final Thoughts

Infrastructure as Code isn't just about selecting the right provisioning tool. It's about creating predictable, manageable, and scalable environments that adapt to your organization's evolving needs.

Exploring Kubernetes v1.35 'Timbernetes': All About the World Tree Version

Akshat Sinha — Fri, 19 Dec 2025 03:00:49 GMT

Author's Note: This piece was inspired by Nicolas Vermandé comprehensive analysis at ScaleOps. His work shaped the structure and focus of my coverage. Check out his article for additional insights.

The Release That Makes You Choose: Upgrade or Archaeology

Picture this: You're sipping your morning coffee, scrolling through your Slack, and someone drops the Kubernetes v1.35 release notes. "Another quarterly release," you think. "Probably just some minor tweaks and the usual beta promotions."

Wrong. So very, very wrong.

Kubernetes v1.35 is the release equivalent of your apartment landlord saying "we're doing renovations" and then showing up with a wrecking ball. This isn't just another feature release,it's an infrastructure intervention. And if you're still running CentOS 7 nodes... well, I'm not saying you should panic, but maybe start stress-testing your resume template.

Let me explain.

The Theme: Yggdrasil, Squirrels, and Existential Questions About Operating Systems

First, can we just appreciate the theme? Timbernetes. The World Tree. Inspired by Yggdrasil from Norse mythology,the cosmic tree that connects all realms. The logo features three adorable squirrels: a wizard holding an LGTM scroll (for reviewers), a warrior with an axe and Kubernetes shield (for release crews), and a rogue with a lantern (for triagers who bring light to dark issue queues).

It's wholesome. It's nerdy. It's the kind of branding that makes you want to print stickers and put them on your laptop next to that one from KubeCon 2019 that's starting to peel.

But here's what the cheerful squirrels don't tell you: Kubernetes v1.35 represents a philosophical shift in how the project sees itself. This isn't just about features,it's about focus.

Kubernetes is doubling down on being infrastructure.

What does that mean? Think about the role of foundational systems. They provide powerful, reliable building blocks, mechanisms for resource management, workload placement, and lifecycle control,but they don't prescribe how you should use them. They give you the tools; you bring the strategy.

v1.35 delivers exactly that: in-place resource mutation, coordinated gang scheduling, structured device allocation, and enhanced observability. These are sophisticated capabilities that unlock new possibilities. But they're capabilities, not complete solutions.

The native controllers,HPA, VPA, the default scheduler,provide baseline functionality. They work. They're reliable. But they're increasingly designed as reference implementations rather than production-optimized systems for every use case.

This creates an interesting dynamic: the primitives are maturing rapidly, but the intelligence layer,the part that decides when to resize a pod, where to place an AI workload, or how to optimize for cost,is increasingly left as an exercise for the platform team.

It's not a bug. It's a design choice. And it has implications for how you approach Kubernetes in production.

Breaking Changes: The Modernization Mandate

Let's start with the uncomfortable stuff. You know how doctors say "this won't hurt" right before it definitely hurts? Yeah, this is like that.

cgroup v1 Is Dead. No, Really, Actually Dead.

Remember cgroup v1? That venerable Linux resource management system that's been around since... forever? It's gone. Not deprecated with a gentle "please consider migrating" message. Removed. Deleted. Sent to the great /dev/null in the sky.

If your kubelet detects cgroup v1 on startup, it will fail. Hard. No negotiation. No "just this once." It's like trying to run Windows 95 programs on Windows 11, technically there's compatibility mode, but do you really want to be that person?

Here's how to check if you're about to have a very bad day:

stat -fc %T /sys/fs/cgroup

If you see cgroup2fs, congratulations! You're living in 2025 (soon to be 2026). If you see tmpfs... I have bad news. You're running cgroup v1, and your Friday afternoon just got a lot more interesting.

The impact on legacy fleets is, shall we say, spicy. CentOS 7 (which hit EOL in June 2024, by the way,you really should have migrated by now), RHEL 7, Ubuntu 18.04... they all default to cgroup v1. Even if you're on a modern distro, if your kubelet is explicitly set to cgroupDriver: cgroupfs instead of systemd, you're going to hit this wall like a cartoon character hitting a pane of glass.

There is an escape hatch: you can set failCgroupV1: false in your KubeletConfiguration. But using it is like continuing to smoke after your doctor shows you the lung X-rays. Sure, technically you can, but it locks you out of all the cool v2-only features: memory QoS, certain swap configurations, and,here's the kicker,Pressure Stall Information (PSI) metrics.

PSI is the metric that tells you not just that CPU usage is high, but that processes are actively stalling waiting for CPU. It's the difference between "we're busy" and "we're drowning." It's a game-changer for autoscaling intelligence. And you can't have it on cgroup v1.

So yeah, time to upgrade those nodes.

containerd 1.x: The Final Season

Here's another fun one: Kubernetes v1.35 is the last version that supports containerd 1.x. In v1.36, it's gone. This is your final warning, like when Netflix sends you three emails saying a show is about to leave the platform.

Why does this matter? Because containerd 2.0 removes support for Docker Schema 1 images. You know, those ancient container images that were pushed five years ago and have been lurking in your registry like digital archaeology? They won't pull anymore.

Before you upgrade, you need to:

Check your container runtime versions:

kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.nodeInfo.containerRuntimeVersion}{"\n"}{end}'

Scan for Schema 1 images (yes, this is tedious):

skopeo inspect docker://your-registry/old-image:tag | jq '.schemaVersion'

Update your containerd configs. And I mean really update them,containerd 2.0 removes deprecated registry.configs and registry.auths structures. If your automated node upgrade scripts inject old configs, your runtime will crash. Don't discover this at 2 AM on a Saturday.

IPVS Mode Gets the Deprecation Talk

For years, IPVS mode in kube-proxy was the recommendation for large clusters because iptables couldn't scale. It was faster, more efficient, and made you feel like a sophisticated network engineer.

But here's the thing: maintaining IPVS behavior that perfectly matches iptables semantics while also supporting every new Service feature turned out to be... complicated. Like, "we're spending more time on compatibility than innovation" complicated.

So Kubernetes v1.35 deprecates IPVS mode. It still works! You'll just get a warning on startup. The future is nftables,a more modern, programmable backend that fixes iptables' scaling issues without IPVS's maintenance burden.

Check your mode:

kubectl get configmap kube-proxy -n kube-system -o yaml | grep -i "mode"

If you see mode: ipvs, you've got until v1.38 to migrate. That's probably about a year, give or take. Start testing nftables in staging. Take your time. But do take it seriously.

The Flagship Feature: In-Place Pod Resizing Goes GA 🎉

Alright, enough doom and gloom. Let's talk about the feature that's going to make stateful workload operators weep with joy: In-Place Pod Resizing is now Generally Available.

This is huge. Like, "finally, VPA might actually be usable in production" huge.

The Historical Inefficiency (aka "The Restart Tax")

Let me paint you a picture. You've got a production database. It's been running smoothly, serving queries, living its best life. Then you realize: "Hey, this needs more memory. Let's change the limit from 4GB to 8GB."

In Kubernetes v1.32 and earlier, here's what happens:

Pod gets terminated
All accumulated state evaporates (JIT compilation cache, warm database connections, Redis data if it's not persisted)
New pod gets created
New pod might fail to schedule (oops, no capacity)
New pod might fail readiness probes (cold start blues)
New pod might land on a worse node
Your pager goes off
You question your career choices

This is why Vertical Pod Autoscaler (VPA) was relegated to "recommendation mode" in most organizations. Sure, it could tell you what resources you needed, but actually applying those recommendations meant disruption. So teams would use it once at deploy time, like a sizing calculator, and then overprovision everything "just in case."

Result? Clusters running at 30-40% utilization. The tool that was supposed to optimize resource usage became a one-time measurement device.

The v1.35 Magic

With KEP-1287 graduating to GA, the resources field in a Pod spec is now mutable. You can patch it via a new /resize subresource, the kubelet evaluates feasibility, and,get this,the container keeps running.

No restart. No new container ID. No reset restartCount. The memory cgroup limit just... changes. From inside the container, it's seamless.

Here's what it looks like:

kubectl patch pod my-database --subresource resize --type='merge' -p '{
  "spec": {
    "containers": [{
      "name": "postgres",
      "resources": {
        "requests": {"memory": "8Gi"},
        "limits": {"memory": "8Gi"}
      }
    }]
  }
}'

And just like that, your database has more memory. No downtime. No data loss. No 3 AM incident.

The Gotchas (Because Of Course There Are Gotchas) :)

QoS Class Is Still Immutable

Kubernetes has three QoS classes: Guaranteed (requests == limits), Burstable (requests < limits), and BestEffort (no resources specified). These determine scheduling priority and eviction behavior. And they're immutable.

So if you try to resize a Guaranteed pod by changing only the limits (which would make it Burstable), the API server will reject it with a very polite "Pod QOS Class may not change as a result of resizing."

The fix? For Guaranteed pods, always resize requests and limits together. Keep them equal.

The Memory Shrink Hazard

Increasing memory is safe. Decreasing memory is... interesting.

Let's say you have a container with a 4GB limit, currently using 3GB, and you decide to resize down to 2GB. What happens?

In v1.35.0-rc.1, the kubelet is smart enough to say "nope." The resize enters PodResizeInProgress with an error message like "attempting to set pod memory limit below current usage." The cgroup limit doesn't decrease. The container keeps running at 4GB.

Your spec says 2GB. Reality is 4GB. The resize is stuck in limbo. And somewhere, a platform engineer is staring at this state wondering if Kubernetes is broken (it's not,it's protecting you).

The solution? Use resizePolicy to specify that memory changes should trigger a container restart:

apiVersion: v1
kind: Pod
metadata:
  name: safe-resize-app
spec:
  containers:
  - name: app
    resources:
      requests:
        cpu: "500m"
        memory: "256Mi"
      limits:
        cpu: "1"
        memory: "512Mi"
    resizePolicy:
    - resourceName: cpu
      restartPolicy: NotRequired      # Hot resize for CPU
    - resourceName: memory
      restartPolicy: RestartContainer  # Restart for memory,clean slate

Now CPU changes are instant, and memory changes trigger a controlled restart. Best of both worlds.

Native VPA: The Mechanism Works, The Intelligence... Doesn't

Here's the awkward part. VPA now supports updateMode: InPlaceOrRecreate. The mechanism works beautifully,pods resize without eviction, container IDs stay the same, everything is smooth.

But the VPA recommender is still... how do I put this delicately... not great?

VPA relies on Metrics Server, which polls every 15-60 seconds and analyzes historical averages. By the time it detects a memory spike and issues a patch, the OOM might have already occurred. It's reactive, not predictive. It sees "high memory usage" but doesn't know if that's a leak, valid cache expansion, or normal JVM heap behavior. So it scales up blindly (hello, cost waste) or hesitates to scale down (hello, permanently oversized pods).

The API is production-ready. The intelligence layer isn't.

And that's kind of the theme of this whole release, isn't it? Kubernetes gives you the primitives. You provide the smarts.

Gang Scheduling: Finally, Native "All-or-Nothing"

If you're in the AI/ML space, this is your headline feature: Gang Scheduling has landed as an alpha feature.

The Problem

You're training a distributed model. It needs 100 GPUs. The scheduler places 95 pods successfully, but then hits capacity. Now you've got 95 pods sitting there, holding 95 expensive GPUs, waiting for 5 more that might never come.

Meanwhile, other jobs are starving because those 95 GPUs are locked. You've created a deadlock. The cluster is effectively stuck. Someone's burning budget. Someone else is burning out.

Previously, solving this required external schedulers like Volcano or Kueue. They work great! But they're external dependencies with their own learning curves, deployment complexities, and operational overhead.

The v1.35 Solution

Kubernetes v1.35 introduces native gang scheduling via the new Workload API. Here's how it works:

apiVersion: scheduling.k8s.io/v1alpha1
kind: Workload
metadata:
  name: distributed-training
spec:
  podGroups:
  - name: workers
    policy:
      gang:
        minCount: 10  # All-or-nothing: need all 10 to start
---
apiVersion: batch/v1
kind: Job
metadata:
  name: pytorch-training
spec:
  parallelism: 10
  completions: 10
  template:
    spec:
      workloadRef:           # Real Pod field, not an annotation!
        name: distributed-training
        podGroup: workers
      containers:
      - name: trainer
        image: pytorch/pytorch:latest
        resources:
          limits:
            nvidia.com/gpu: 1

The scheduler sees workloadRef and holds all pods until it can place the entire gang. No partial allocation. No deadlock. No wasted GPU-hours.

It's elegant. It's native. It's...alpha.

The Native Scheduler Gap

Here's the catch: the native scheduler's gang scheduling implementation is basic. It handles the mechanics (don't schedule anything until you can schedule everything), but it doesn't handle the economics.

There's no queue management. No fair-share policies. No sophisticated backfill. No preemption intelligence for gang workloads.

For serious AI supercomputing, you'll still want Volcano or Kueue managing the queue strategy. The difference is now Kubernetes handles the gang semantics natively, and your external orchestrator handles the scheduling policy.

It's a division of labor. Kubernetes is the kernel. Your orchestrator is the user space. Sounds familiar :) ?

Opportunistic Batching: Not What You Think

Opportunistic Batching (KEP-5598) graduated to beta and is enabled by default. The name makes it sound like Kubernetes will now schedule 1,000 identical pods in one massive batch operation.

That's not what this is.

What It Actually Does

When the scheduler places a pod, it might keep the ranked node list in a small cache. For the very next pod with an identical "scheduling signature," it can return a hint: "try node X first." It's not "schedule 1,000 pods at once." It's "maybe skip some work for pod #2 if it's identical to pod #1."

It's opportunistic. It's a micro-optimization. And it comes with two non-obvious requirements.

Requirement 1: Pods Must Be "Signable"

The scheduler computes a signature from all the fields that affect placement. If any scheduler plugin can't produce a signature fragment, the whole pod becomes "unbatchable."

On day one of testing v1.35.0-rc.1, we hit this immediately. PodTopologySpread with system-default constraints blocked signatures for every pod. Zero batching happened,not because our pods were different, but because a default plugin refused to sign them.

The fix is to disable system-default topology constraints:

# /etc/kubernetes/kube-scheduler-config.yaml
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- pluginConfig:
  - name: PodTopologySpread
    args:
      defaultingType: List
      defaultConstraints: []  # Disable defaults for batching

Requirement 2: Pods Must "Fill" Nodes

The scheduler only reuses the cached ranking if the previously chosen node becomes infeasible for the next pod. If the node still has capacity, it flushes the cache (node_not_full) because reusing might cause suboptimal packing.

Translation: Batching works great for "fat" pods that fill entire nodes (think GPU workers consuming 8 cores, 64GB RAM, 1-8 GPUs). For "tiny" microservice pods that pack 50-to-a-node, batching can't safely reuse hints.

Who Benefits?

Distributed training jobs with node-filling workers. The pods are identical, they're large, and they benefit from both gang scheduling (all-or-nothing placement) and batching (faster scheduling decisions).

For dense microservice workloads? Don't expect miracles. And that's by design,the scheduler is protecting against suboptimal bin-packing.

Diagnostic metrics to watch:

scheduler_batch_attempts_total{result="hint_used"} → Batching is working
scheduler_batch_cache_flushed_total{reason="node_not_full"} → Pods too small
scheduler_batch_cache_flushed_total{reason="pod_not_batchable"} → Signature problems

HPA Gets Precision Tuning 🎯

The Horizontal Pod Autoscaler's fixed 10% tolerance has been a pain point forever. For a 1,000-replica deployment, that's a 100-pod dead zone where HPA just... won't react.

Kubernetes v1.35 promotes Configurable Tolerance to beta (enabled by default). You can now set tolerance per-HPA, and,even better,set it differently for scale-up versus scale-down.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-frontend
spec:
  scaleTargetRef:
    apiVersion: apps/1
    kind: Deployment
    name: frontend
  minReplicas: 10
  maxReplicas: 500
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleUp:
      tolerance: 0.02  # 2% , respond faster to traffic spikes
      stabilizationWindowSeconds: 60
    scaleDown:
      tolerance: 0.15  # 15% , conservative, avoid thrashing
      stabilizationWindowSeconds: 300

This asymmetric pattern (tight scale-up, loose scale-down) maps perfectly to how humans handle incidents: scale up on smoke, scale down on proof.

Quick gotchas:

Tolerance is stored as Quantity, so 0.02 becomes 20m in the API (don't be confused by the format)
Two HPAs targeting the same workload = silent failure with AmbiguousSelector
There's a warm-up period after HPA creation where you might see "did not receive metrics"

But overall? This is a clean, practical improvement that makes HPA more usable for high-scale workloads.

Security: The Year of Not Leaking Credentials

Image Pull Credential Verification (Beta, Default On)

Here's a fun multi-tenant security gap that's finally closed: Previously, if Tenant A pulled a private image with valid credentials, Tenant B could use that cached image without any credentials at all. The kubelet only verified on first pull.

In v1.35, the KubeletEnsureSecretPulledImages feature is enabled by default. The kubelet now re-validates credentials for every pod, even if the image is already cached locally.

This means:

The image cache is no longer a "free pass" in multi-tenant clusters
If a pull secret expires or rotates, pods that previously started fine (due to caching) will now fail with ImagePullBackOff
You need to monitor pull secret expiry and treat cache-dependent startups as a bug

The feature is configurable via imagePullCredentialsVerificationPolicy in KubeletConfiguration:

AlwaysVerify , Default in v1.35, check credentials for every pod
NeverVerify , Old behavior (insecure)
NeverVerifyAllowlistedImages , Skip verification only for specific image patterns

Structured Authentication Config (GA)

Multiple OIDC providers without restarting the API server? Yes, please.

The old way involved juggling --oidc-* flags, restarting the API server every time you wanted to add a provider, and generally feeling like you were living in 2015.

Kubernetes v1.35 graduates Structured Authentication Configuration to GA. You now use a dedicated config file:

# /etc/kubernetes/auth-config.yaml

apiVersion: apiserver.config.k8s.io/v1
kind: AuthenticationConfiguration
jwt:
# Production IdP for humans
- issuer:
    url: https://okta.example.com
    audiences:
    - production-cluster
  claimMappings:
    username:
      expression: 'claims.email.split("@")[0]'
    groups:
      expression: 'claims.groups.map(g, "okta:" + g)'
  claimValidationRules:
  - expression: 'claims.exp - claims.iat <= 3600'
    message: "Token lifetime cannot exceed 1 hour"

# CI/CD IdP for pipelines
- issuer:
    url: https://gitlab.example.com
    audiences:
    - ci-cluster
  claimMappings:
    username:
      claim: preferred_username
    groups:
      claim: roles

Enable it with --authentication-config=/etc/kubernetes/auth-config.yaml on the API server, and you're done. Multiple providers, dynamic reloads, CEL expressions for custom logic. It's beautiful.

Pod Certificates (Beta)

Native workload identity without external controllers, CRDs, or sidecars. The kubelet generates keys, requests certificates via PodCertificateRequest, writes credential bundles directly to the Pod's filesystem, and auto-rotates.

It's still beta and disabled by default (you need to enable certificates.k8s.io/v1beta1 and the PodCertificateRequest feature gate), but it's the future of service mesh and zero-trust architectures.

Pure mTLS flows with no bearer tokens in the issuance path. The kube-apiserver enforces node restriction at admission time. It's elegant, it's secure, and it's coming.

Quick Hits: Other Cool Stuff

Because this post is already longer than a CVS receipt, here are some other notable features in rapid-fire mode:

Deployment Terminating Replicas (Beta) , Ever seen a rolling update trigger quota errors despite having capacity? The controller was ignoring terminating pods when counting replicas. v1.35 adds .status.terminatingReplicas so you can finally see the overlap. Mystery solved.
Storage Version Migration (Beta) , Native support for migrating stored data to new schema versions, no external tools needed. Historically this required manual "read/write loops" piping kubectl commands together like it's 1999.
StatefulSet MaxUnavailable (Beta) , Parallel updates for StatefulSets! Set maxUnavailable and watch multiple pods update simultaneously instead of one-at-a-time. Perfect for stateful apps that can tolerate some downtime.
KYAML (Beta) , A safer, less ambiguous subset of YAML designed specifically for Kubernetes. Addresses the infamous "Norway Bug" and other YAML footguns. Enabled by default (disable with KUBECTL_KYAML=false).
User Namespaces (Beta) , Containers can run as root internally while being mapped to unprivileged users on the host. Reduces privilege escalation risk if a container gets compromised.
Node Declared Features (Alpha) , Nodes can advertise their supported feature gates via .status.declaredFeatures. The scheduler uses this to avoid placing pods on incompatible nodes during mixed-version upgrades. Finally, a real answer for heterogeneous clusters.
Extended Toleration Operators (Alpha) , Tolerations can now use numeric comparison: "only schedule on nodes with SLA > 95%." Auto-evict if it drops below threshold. Numeric intent for placement!

The Philosophical Shift (Or: The Part Where I Get Existential)

Here's the thing about Kubernetes v1.35 that nobody's saying out loud: it's clarifying the project's identity in a way that might make some people uncomfortable.

A few years ago, the expectation was that upstream Kubernetes would deliver production-grade autoscaling, intelligent scheduling, and operational maturity out of the box. Native VPA would get smarter. HPA would understand seasonality. The scheduler would learn topology economics and cost optimization.

That hasn't happened. And based on the trajectory of v1.35, it's probably not going to.

Kubernetes is choosing to be a kernel.

It's focusing on:

✅ Robust, low-level primitives (in-place resize, DRA, gang scheduling)
✅ Safe, performant APIs (structured auth, pod certificates, storage migration)
✅ Well-defined extension points (Workload API, DRA device classes)

It's leaving to users:

❌ When to resize (intelligence, not mechanism)
❌ Where to place AI workloads (economics, not mechanics)
❌ How to optimize bin-packing (strategy, not structure)

The native controllers,HPA, VPA, default scheduler, are becoming reference implementations, not production-grade optimization engines.

And you know what? That's probably the right call for an open-source project at this scale. You can't be everything to everyone. Focus on the primitives, nail the APIs, and let the ecosystem build the intelligence layer.

But it does mean the burden shifts. Platform teams need to either:

Build intelligence layers themselves (hard, but flexible)
Adopt ecosystem tools that provide intelligence (easier, but adds dependencies)
Accept the limitations of native controllers (simplest, but leaves optimization on the table)

There's no wrong answer. But there is a choice to make.

The Pre-Upgrade Checklist (aka "How to Not Ruin Your Weekend")

Before you upgrade to v1.35, here's what you absolutely, positively need to check:

🔴 BLOCKERS (Fix These Or Don't Upgrade)

cgroup v2 on all nodes

stat -fc %T /sys/fs/cgroup  # Must show "cgroup2fs"

containerd 2.0 or later

kubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.containerRuntimeVersion}'

If either of these fails, stop. Do not pass Go. Do not collect $200. Fix your infrastructure first.

🟠 HIGH PRIORITY (Fix Soon After Upgrade)

Scan for Docker Schema 1 images , skopeo inspect every image in your registries
Verify kube-proxy mode , Make sure you're not on IPVS (or have a migration plan)
Update containerd configs , Remove deprecated registry structures
Check image pull secrets , Credential verification is mandatory now
RBAC for exec/attach , Add create verb for pod subresources

🟡 MEDIUM PRIORITY (Plan For It)

Run kubepug or pluto to find deprecated APIs in your manifests
Consider switching VPA to InPlaceOrRecreate mode
Evaluate maxUnavailable for StatefulSets that can tolerate parallel updates
Test HPA configurable tolerance for high-scale workloads

Post-Upgrade Validation

# Verify PSI is available

cat /proc/pressure/memory

# Test in-place resize

kubectl patch pod test-pod --subresource resize --type='merge' -p '{"spec":{"containers":[{"name":"app","resources":{"requests":{"memory":"512Mi"}}}]}}'

# Check scheduling batching metrics (Prometheus)

scheduler_batch_attempts_total{result="hint_used"}

# Verify feature gates

kubectl get --raw /metrics | grep -i "kubernetes_feature_enabled"

The Bottom Line: What Should You Do?

If you're a platform engineer: Treat this as a checkpoint release. Your technical debt is due. Schedule time for infrastructure upgrades, not just manifest changes. Test cgroup v2, plan containerd 2.0 migration, audit your RBAC policies. This isn't optional.

If you're running AI/ML workloads: Explore gang scheduling and batching, but understand they're primitives. You still need orchestration intelligence for production-grade queue management. The scheduler handles mechanics; you provide strategy.

If you operate stateful workloads: In-place resize is production-ready. Test it. Love it. Deploy it. But if you're using native VPA, keep expectations realistic,the mechanism is GA, the intelligence isn't.

If you're a developer: Most of this won't affect you directly. Enjoy the more precise HPA, safer YAML with KYAML, and better Pod lifecycle tracking. Your platform team is handling the hard stuff.

A Note for Managed Kubernetes Users (EKS, AKS, GKE, etc.)

If you're running on a managed Kubernetes service, your upgrade experience will be different,and in many ways, easier.

The Good News:

No cgroup v2 migration pain , Cloud providers handle node OS upgrades. By the time EKS/AKS/GKE offer v1.35, their node images will already be running cgroup v2-compatible systems.
Automatic containerd updates , Managed services bundle compatible container runtime versions. You won't manually upgrade containerd.
Control plane upgrades managed , API server, scheduler, controller-manager upgrades happen with a button click (or API call).

What You Still Need to Handle:

Application compatibility , Test your workloads with v1.35 API changes, especially if you use newer beta features.
RBAC updates , The WebSocket permission changes for exec/attach/portforward still affect your roles and service accounts.
Image pull secrets , The stricter credential verification affects all clusters. Audit your pull secret lifetimes and rotation policies.
Cost implications , New features like in-place resize and better autoscaling can improve efficiency, but you need to configure them. Managed Kubernetes doesn't automatically optimize your resource usage.

Timeline Expectations:

EKS typically lags 2-4 weeks behind upstream releases
GKE usually within 2-3 weeks for Rapid channel, longer for Regular/Stable
AKS generally 2-4 weeks post-release

Check your provider's release notes,they often add their own enhancements or defer certain alpha features.

Bottom Line: Managed Kubernetes handles infrastructure concerns, but you're still responsible for application architecture, resource optimization, and operational intelligence. v1.35's primitives are available to you; making them useful is still your job.

Final Thoughts

Kubernetes v1.35 is a fascinating release. It's not the most feature-packed release we've ever seen (v1.16's "The One With Everything" still holds that crown). But it's clarifying. It's opinionated about what Kubernetes is and what it isn't.

It's a kernel. It provides primitives. It's your job to make them smart.

And honestly? That's liberating. Because now we know where we stand. The expectations are clear. The division of labor is explicit.

Build on the primitives. Fill the intelligence gaps. Make something amazing.

The World Tree has another growth ring. What will you build in its branches?

Written with chai , mild existential dread, and genuine excitement for the future of Kubernetes. May your upgrades be smooth and your cgroups be v2. ☕️

Happy upgrading! 🐿️

Kubernetes v1.34: The Smooth Operator Release

Akshat Sinha — Wed, 20 Aug 2025 18:30:00 GMT

The Kubernetes ecosystem continues to evolve at a remarkable pace, and the latest v1.34 release, planned for Wednesday, August 27th, 2025, represents one of the most significant updates in recent memory. Unlike previous releases that focused heavily on deprecations and removals, Kubernetes v1.34 takes a different approach, it’s entirely focused on enhancements and new capabilities that will reshape how we manage containerized workloads at scale.

What Makes v1.34 Special?

What makes v1.34 particularly exciting is its focus on maturity and stability. This release showcases significant feature graduations, with several major capabilities moving from beta to stable, and important enhancements reaching beta status. Most remarkably, this release contains no deprecations or removals, a refreshing change that allows teams to upgrade with confidence, knowing their existing configurations and workflows will continue to work seamlessly.

Major Features Graduating to Stable (GA)

Dynamic Resource Allocation (DRA) Core Reaches Production Readiness

The headline feature of v1.34 is undoubtedly the graduation of Dynamic Resource Allocation (DRA) core to stable status. DRA was originally introduced as an alpha feature in v1.26, went through a significant redesign for v1.31, reached beta in v1.32, and now achieves general availability in v1.34.

Why DRA Matters

If you’ve ever struggled with GPU allocation, custom hardware integration, or complex device scheduling in Kubernetes, DRA is about to become your best friend. Traditional device plugins have served us well, but they come with significant limitations:

Static allocation: Once a device is assigned, it can’t be dynamically reallocated
Limited flexibility: Device requests are binary ,you either get the device or you don’t
Poor observability: Limited insight into device utilization and allocation failures

DRA changes this paradigm entirely. It provides a flexible framework for categorizing, requesting, and utilizing specialized hardware like GPUs, FPGAs, network accelerators, and custom silicon.

How DRA Works in Practice

DRA introduces several new API types under resource.k8s.io/v1:

ResourceClaim: Represents a request for specific resources
DeviceClass: Defines categories of available devices
ResourceClaimTemplate: Templates for dynamic claim creation
ResourceSlice: Contains information about available resources

Here’s a practical example of how you might request GPU resources with DRA:

apiVersion: v1
kind: Pod
metadata:
  name: ml-training-pod
spec:
  resourceClaims:
  - name: gpu-claim
    resourceClaimTemplateName: gpu-template
  containers:
  - name: trainer
    image: ml-training:latest
    resources:
      claims:
      - name: gpu-claim
        request: gpu

The magic happens through CEL (Common Expression Language) expressions that allow fine-grained device filtering. You can now specify requirements like “give me a GPU with at least 16GB memory and CUDA compute capability > 7.0” directly in your resource claims.

With DRA graduating to stable, the resource.k8s.io/v1 APIs will be available by default, making this a production-ready solution for complex device management scenarios.

Production-Ready Tracing for Kubelet and API Server

Two major tracing enhancements are graduating to stable in v1.34, transforming Kubernetes observability:

API Server Tracing (KEP-647) and Kubelet Tracing (KEP-2831) both reach general availability after their journey from alpha (v1.22 and v1.25 respectively) to beta (v1.27) and now to stable.

Deep Observability Comes to Kubernetes Core

The tracing implementation uses OpenTelemetry standards to instrument critical operations:

Kubelet operations: Complete visibility into CRI calls, pod lifecycle events, and node-level operations
API server operations: End-to-end request tracing from admission controllers to etcd
Context propagation: Trace IDs flow through the entire system, enabling correlation across components

**Real-World Impact
**Imagine debugging a pod that’s stuck in ContainerCreating state. Instead of grepping through disconnected logs across multiple components, you now get:

Unified trace view: See the entire pod creation flow in one timeline
Precise bottleneck identification: Pinpoint exactly where delays occur
Cross-component correlation: Connect kubelet operations with container runtime behaviors
Performance insights: Quantify the impact of configuration changes

This level of observability transforms Kubernetes from a “black box” into a transparent, debuggable system, and with stable graduation, you can confidently build production monitoring solutions around these capabilities.

Key Features Graduating to Beta

ServiceAccount Tokens for Image Pull Authentication

Moving to Beta and Enabled by Default

One of the most significant security improvements in v1.34 is the beta graduation of ServiceAccount token integration for kubelet credential providers (KEP-4412). This feature addresses a longstanding security concern: the use of long-lived image pull secrets.

The Security Problem

Traditional image pull secrets suffer from several security issues:

Long-lived credentials: Secrets don’t rotate automatically
Broad access: One secret often provides access to multiple registries
Operational overhead: Manual credential management and rotation

The Modern Solution

The new approach leverages short-lived, automatically rotated ServiceAccount tokens that follow OIDC-compliant semantics. Each token is scoped to a specific Pod, dramatically reducing the blast radius of credential compromise.

Benefits include:

Automatic rotation: Tokens refresh without manual intervention
Workload-level identity: Each workload gets its own scoped credentials
Reduced attack surface: No more long-lived secrets sitting in etcd
Better compliance: Aligns with modern identity-aware security practices

This change represents a fundamental shift toward a zero-trust model for container image access.

Enhanced Pod-Level Resource Management

PodLevelResources Graduates to Beta

The PodLevelResources feature is now beta and enabled by default. This enhancement allows defining CPU and memory resources for an entire pod using pod.spec.resources, providing more intuitive resource management for multi-container pods.

Better Pod Lifecycle Tracking

PodObservedGenerationTracking Reaches Beta

This feature, now beta and enabled by default, populates status.observedGeneration fields in pods and their conditions, enabling a better understanding of when pod status reflects the current specification.

Enhanced Traffic Distribution Policies

PreferSameZone and PreferSameNode Graduate to Beta

Building on KEP-3015, the enhanced traffic distribution capabilities are graduating to beta with the feature gate enabled by default in v1.34.

Network topology awareness gets a significant upgrade with the evolution of Service traffic distribution policies. The spec.trafficDistribution field now supports more granular preferences.

Beyond PreferClose

The original PreferClose policy is being deprecated in favor of two more specific options:

PreferSameZone: Equivalent to the current PreferClose behavior, prioritising endpoints in the same availability zone
PreferSameNode: Takes locality to the extreme, preferring endpoints on the same physical node as the client

Practical Applications

PreferSameNode is particularly valuable for:

Edge computing: Minimizing latency for IoT and edge workloads
Data-intensive applications: Reducing network traversal for high-bandwidth communications
Co-located microservices: Optimizing performance for tightly coupled services

apiVersion: v1
kind: Service
spec:
  trafficDistribution: PreferSameNode
  # ... rest of service spec

Fine-Grained HPA Control with Configurable Tolerance

Graduating to Beta from Alpha

The HPA configurable tolerance feature (KEP-4951) is expected to graduate to beta in v1.34. This enhancement addresses one of the most common complaints about autoscaling behavior.

The Problem with One-Size-Fits-All

The default cluster-wide 10% tolerance for HPA scaling decisions often proves inadequate:

Large deployments: 10% might mean hundreds of unnecessary pods remain during scale-down
Sensitive workloads: Some applications need more responsive scaling
Cost optimization: Different workloads have different cost sensitivity profiles

The Solution: Workload-Specific Tolerance

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  behavior:
    scaleUp:
      tolerance: 5%  # More aggressive scale-up
    scaleDown:
      tolerance: 15% # More conservative scale-down

This granular control enables:

Optimized resource utilization: Right-size tolerance for each workload
Cost management: More aggressive scale-down for cost-sensitive applications
Performance optimization: Responsive scale-up for latency-critical services

Exciting New Alpha Features

Pod Replacement Policy for Deployments

Introducing Alpha Feature

The new podReplacementPolicy field (KEP-3973) gives you explicit control over the trade-off between deployment speed and resource consumption. This alpha feature can be enabled using the DeploymentPodReplacementPolicy and DeploymentReplicaSetTerminatingReplicas feature gates.

Resource management during deployments has always been a balancing act between speed and resource consumption. This new feature provides two distinct policies:

TerminationStarted: Creates new pods immediately when old ones begin terminating

Faster rollouts and reduced downtime
Higher temporary resource consumption

TerminationComplete: Waits for complete termination before creating new pods

Controlled resource usage and predictable capacity planning
Slower rollouts

apiVersion: apps/v1
kind: Deployment
spec:
  podReplacementPolicy: TerminationStarted
  # ... rest of deployment spec

This feature is particularly valuable for:

Resource-constrained environments: Where every CPU core and GB of RAM matters
Long-terminating workloads: Applications with extended graceful shutdown periods
Cost-sensitive deployments: Where temporary resource spikes impact billing

KYAML: Kubernetes-Optimized Configuration Format

Alpha Support for kubectl Output

KEP-5295 introduces KYAML as a new output format for kubectl v1.34, addressing common YAML pitfalls while maintaining full compatibility.

Solving YAML’s Pain Points

YAML’s flexibility comes with notorious drawbacks:

The Norway Bug: Unquoted country codes like NO being interpreted as boolean false
Indentation sensitivity: Subtle whitespace errors causing deployment failures
Type coercion surprises: Strings sometimes becoming numbers or booleans unexpectedly

KYAML’s Principled Approach

KYAML addresses these issues through consistent rules:

Always double-quote strings: Eliminates type coercion surprises
Unquoted keys: Unless potentially ambiguous
Consistent syntax: Always use {} for objects, [] for arrays
Comment support: Unlike JSON, KYAML supports comments
Trailing commas allowed: Reduces diff noise and syntax errors

# Traditional YAML (problematic)
apiVersion: v1
kind: ConfigMap
data:
  country: NO  # Oops! This becomes boolean false
  version: 1.0  # This might become a float

# KYAML (safe)
apiVersion: "v1"
kind: "ConfigMap"
data: {
  country: "NO",  # Explicitly a string
  version: "1.0", # Explicitly a string
}

KYAML remains a strict subset of YAML, ensuring compatibility with existing tooling while providing safety guarantees. You’ll be able to request KYAML output using kubectl get -o kyaml, while all existing YAML and JSON output formats remain available.

Additional Operational Improvements:-

Enhanced Memory Management: Memory limits can now be decreased with a NotRequired resize restart policy, with intelligent checks to prevent OOM-kill scenarios during the adjustment. This improvement provides more flexibility in resource management without compromising pod stability.

Better CSI Volume Handling: The kubelet now detects terminal CSI volume mount failures due to exceeded attachment limits and marks stateful pods as Failed, allowing controllers to recreate them. This prevents pods from getting stuck indefinitely in the ContainerCreating state.

Improved Metrics and Observability: New metrics provide better insight into:

User namespace pod creation success/failure rates with started_user_namespaced_pods_total and started_user_namespaced_pods_errors_total
ResourceClaim controller operations with resourceclaim_controller_creates_total and resourceclaim_controller_resource_claims

What This Means for Your Operations

For Platform Engineers: v1.34 represents a maturation of Kubernetes’ enterprise capabilities. The stability graduation of DRA, tracing, and several beta features means you can confidently build these into your platform abstractions without fear of API churn.

For Security Teams: The ServiceAccount token integration for image pulls represents a significant step toward zero-trust container registries. With this feature moving to beta and enabled by default, it’s time to start planning migration away from long-lived pull secrets.

For FinOps Teams: The combination of beta-level HPA configurable tolerance and alpha-level pod replacement policies provides new levers for balancing performance and cost. These features enable more sophisticated cost optimization strategies.

For Developers: Alpha KYAML support means safer, more maintainable configuration files on the horizon. The stable tracing capabilities will dramatically improve debugging experiences across the development lifecycle.

Wrapping Up

Kubernetes v1.34 is looking pretty solid, nothing too flashy, but it’s packed with the kind of practical improvements that actually make a difference in day-to-day work. With plenty of enhancements and zero deprecations, it’s one of those rare releases where you don’t have to worry about things breaking when you upgrade. The GPU allocation improvements with DRA are finally ready for prime time, and there are some nice observability upgrades baked right in. When it drops on August 27th, it should be a pretty smooth transition, no hunting down deprecated APIs or scrambling to fix broken deployments. It’s not revolutionary, but sometimes the best releases are the ones that just work without giving you a headache.

Argo CD 3.0: Navigating the Next Frontier of GitOps Deployment

Akshat Sinha — Tue, 04 Mar 2025 18:30:00 GMT

In the rapidly evolving landscape of Kubernetes deployments, Argo CD 3.0 emerges as a pivotal milestone that promises to redefine how organizations approach continuous delivery. This version represents a carefully orchestrated evolution of the platform, balancing innovation with practical considerations for enterprise deployments.

Version Support Strategy

Starting with 3.0, Argo CD will:

Stop releasing new 2.x minor versions
Continue cutting patch releases for the two most recent minor versions (2.14 until 3.2 is released, and 2.13 until 3.1 is released)

The versioning strategy reflects a mature approach to software maintenance, ensuring stability while pushing the boundaries of continuous delivery technologies.

The v3 RC is planned for March 17, 2025 and v3 GA for May 6, 2025.

Critical Breaking Changes and Deprecations

1. Fine-Grained RBAC Transformation

The role-based access control (RBAC) mechanism in Argo CD has undergone a significant transformation, addressing long-standing challenges in permission management. Previously, the system operated with broad, catch-all permissions that often introduced potential security risks.

Before v3:

Update or delete actions on an application automatically applied to sub-resources
Broad permissions were the default, potentially exposing systems to unintended modifications

In v3:

Update and delete actions now only apply to the application itself
Explicit policies must be defined for sub-resource permissions
Administrators can create highly specific access rules with fine-grained control

The new permission model introduces a more complex but powerful approach to access management. Example scenarios:

Granular Resource-Level Permissions To grant a user permission to delete only Pods within a specific application, you can now use a precisely crafted policy:

# Allows deleting Pods in the 'prod-app' Application
p, example-user, applications, delete/*/Pod/*/*, default/prod-app, allow

Nuanced Access Control The system now supports intricate permission combinations. For instance, you can:

Allow updates to an application while denying updates to its sub-resources
Explicitly deny application deletion while permitting specific resource deletions

# Explicitly deny application deletion
p, example-user, applications, delete, default/prod-app, deny

# Allow deleting Pods within the application
p, example-user, applications, delete/*/Pod/*/*, default/prod-app, allow

Glob Pattern Considerations Argo CD’s RBAC uses a unique glob pattern evaluation that requires careful configuration. The matching can be complex due to how slashes are processed. Best practices include:

Always include all resource parts in the pattern
Use four slashes for most precise matching
Be aware that resource kinds and namespaces can interact in unexpected ways

Migration Strategy Organizations can preserve the previous broad permission model by setting server.rbac.disableApplicationFineGrainedRBACInheritance to false in the Argo CD ConfigMap. However, this is recommended only as a temporary measure during migration.

Example Migration Path

# Legacy Approach (No Longer Default)
- p, some-user, applications, *, *, allow  # Gave broad permissions

# New Approach
- p, some-user, applications, *, *, allow  # Requires explicit sub-resource permissions
- p, some-user, applications, update/*/Deployment/*/*, specific-app, allow

2. Logs RBAC Enforcement

The approach to logging access has been dramatically refined, treating logs as a first-class security resource. This change represents a more nuanced and secure method of managing application visibility and access.

Changes:

Logs are now a first-class RBAC resource
Automatic logs access for application users has been removed
Explicit logs access must now be granted

Configuration:

Remove server.rbac.log.enforce.enable from argocd-cm ConfigMap
Manually grant logs access at project or global scope

3. Metrics Consolidation

Metric management has been streamlined to provide a more focused and efficient monitoring experience. The removal of certain legacy metrics demonstrates Argo CD’s commitment to maintaining a clean and modern observability approach.

Removed Metrics:

argocd_app_sync_status
argocd_app_health_status
argocd_app_created_time

Migration:

These metrics’ information is now available as labels on argocd_app_info
Update monitoring dashboards and alerts accordingly

4. Dex SSO Authentication Changes

Authentication mechanisms have been refined to provide more stable and predictable user identification. This change addresses the inherent challenges of using internally generated claims for authentication and authorization.

Before:

Used sub claim for RBAC subject
Subject based on Dex internal implementation

In v3:

Now uses federated_claims.user_id claim
Requests federated:id scope from Dex

# Old Policy (Incorrect)
- g, ChdleGFtcGxlQGFyZ29wcm9qLmlvEgJkZXhfY29ubl9pZA, role:example

# New Policy
- g, example@argoproj.io, role:example

5. Repository Configuration

The approach to repository management has been simplified and standardized, pushing organizations towards more declarative and Kubernetes-native configuration methods.

Deprecation:

Removed support for repository configuration in argocd-cm ConfigMap
All repositories must now be managed as Kubernetes Secrets

Verification:

kubectl get cm argocd-cm -o=jsonpath="[{.data.repositories}, {.data['repository.credentials']}, {.data['helm.repositories']}]"

6. ApplicationSet Nested Selectors

The ApplicationSet configuration has been simplified to provide more predictable and consistent behavior across different deployment scenarios.

Change:

applyNestedSelectors field is now ignored
Nested selectors are always applied
Remove explicit selectors in existing ApplicationSets

7. Cluster Configuration

Cluster management has been refined to provide more explicit and controlled interaction with in-cluster resources, reducing ambiguity in deployment configurations.

When cluster.inClusterEnabled is set to "false":

Existing in-cluster Applications will be in an Unknown state
Cannot create new in-cluster Applications
Deleting Applications will not delete previously managed resources

8. Health Status Tracking

Performance optimization has been a key focus, with changes designed to reduce unnecessary load on the application controller while maintaining comprehensive resource tracking.

Before:

Health status persisted under /status in Application CR
Caused load on application controller

In v3:

Health status stored externally
Can revert by setting controller.resource.health.persist to true

9. Plugin Environment Variables

Plugin management has been enhanced to provide more flexibility and consistency in configuration handling.

New Behavior:

Empty environment variables are now passed to config management plugins

spec:
  source:
    plugin:
      name: example-plugin
      env:
        - name: VERSION
          value: "1.2.3"
        - name: DATA  # Now passed as an empty string
          value: ""

Conclusion

Argo CD 3.0 represents a significant step in refining GitOps practices. While the changes require careful migration, they ultimately provide more granular control, improved security, and cleaner configuration management. For detailed changes and explanations checkout the official Agro CD documentation here.

PS:- I still do kubectl edit on production. ;)