Post-Quantum Readiness Is a Refactoring Problem, Not a Cryptography One

In April 2026, AWS shipped hybrid post-quantum TLS for the clients that talk to Secrets Manager, negotiating a lattice key exchange on a connection most engineers will never think about. That is the quiet tell. The new primitives are no longer a research artifact: NIST finalized the standards as FIPS 203, 204, and 205 in August 2024, and a year ago Java 24 added ML-KEM and ML-DSA to the platform itself through JEP 496 and JEP 497, exposing them through the same KeyPairGenerator and Signature APIs every JVM developer already uses. The algorithms are downloadable, supported, and in some cases already negotiated on your behalf. They are no longer the hard part.

That is exactly why the framing most teams have adopted is wrong. Post-quantum is being treated as a cryptography decision, a question of which algorithm to pick and when. For the overwhelming majority of engineering organizations, it is not a cryptography decision at all. It is a refactoring decision. The binding constraint is not which primitive wins. It is whether your codebase can change primitives without a multi-quarter rewrite. Most cannot, and that property has a name and a fix that you can start on this week, regardless of what the standards bodies do next.

The question that actually predicts readiness

There is one question that separates teams who will manage the transition calmly from teams who will manage it as an incident. It is not “have we picked ML-KEM.” It is: how many places in our code instantiate a cipher, a key generator, or a signature directly, and could we change all of them from one place?

A team that cannot answer that has no post-quantum readiness, no matter how many vendor PQC features they have switched on. Enabling hybrid TLS at your load balancer is good, and it protects data in transit through that one hop. It does nothing for the API token you encrypted with a hard-coded AES/GCM call three years ago, the documents sitting in object storage under a key wrapped with RSA, or the signatures on the firmware you ship. Those live in your code, scattered across call sites, and every one of them is a place you will eventually have to find and change.

The deadlines are real even if they are not imminent. NIST’s draft transition guidance, IR 8547, proposes deprecating today’s public-key algorithms around 2030 and disallowing them after 2035, and the NSA’s CNSA 2.0 timeline pushes national-security systems toward post-quantum signing well before the decade is out. None of that helps a team whose crypto is welded to its call sites. The work that pays off across every one of those deadlines is the same work: making the codebase able to swap. The industry has a term for it, crypto-agility, and it has been quietly recommended for years. The post-quantum deadline is simply what turns a latent piece of technical debt into a dated, board-visible liability.

A maturity model for crypto-agility

Crypto-agility is not binary, and treating it as a checkbox is how teams convince themselves they are further along than they are. It is a ladder. Knowing which rung you are on tells you what the next concrete piece of work is.

The crypto-agility maturity model. The jump that matters is L2 to L4: externalized configuration buys you a swap, but only inventory plus an enforced boundary keeps you there.

At Level 0, primitives are instantiated wherever they are needed. A cipher string here, a key size there, a signature algorithm baked into a token library. There is no single place to change anything, so any transition is a search-and-replace across the whole codebase, with no guarantee you found every site.

Level 1 centralizes crypto into a shared utility. This feels like progress, and it is the rung most teams believe is the finish line. It usually is not, for reasons the next section makes uncomfortable.

Level 2 externalizes the algorithm, key size, and parameters out of code and into configuration. Now you can change the algorithm without a deploy of new logic. This is the first rung where a swap is genuinely possible rather than theoretical.

Level 3 is a pluggable provider whose outputs are self-describing. Every ciphertext and signature carries an identifier for the algorithm and key that produced it, so you can decrypt old data with the old scheme while writing new data with the new one, and run both at once during a rolling migration. Hybrid modes, classical plus post-quantum, live here.

Level 4 adds the two things that keep you from sliding back down: a maintained cryptographic inventory, sometimes called a CBOM, and a fitness function in the build that fails when someone reaches around the boundary. Without enforcement, every new feature is an opportunity to recreate Level 0 one call site at a time.

Why a shared crypto library is not agility

Here is the claim that tends to start an argument: most teams who say “we route all crypto through a shared library” are at Level 1, and Level 1 is not agility. The shared library is necessary, but on its own it produces the appearance of readiness rather than the substance, and the gap shows up in three recurring anti-patterns.

The first is the crypto util that isn’t. The helper exists, but it still hard-codes the algorithm internally, or worse, it accepts a cipher string as an argument and passes it straight through. The symptom is easy to detect: grep the codebase for cipher suite strings, key sizes, or algorithm names and see how many land outside the utility. If callers are choosing algorithms, the boundary is a suggestion, not a boundary.

// L0 / L1 in disguise: the "shared util" still leaks the algorithm.
// The choice of RSA and the key behavior live at the call site,
// so swapping to a post-quantum KEM means changing every caller.
public byte[] wrapDataKey(PublicKey pub, byte[] dataKey) {
    Cipher c = Cipher.getInstance("RSA/ECB/OAEPWithSHA-256AndMGF1Padding"); // hard-coded scheme
    c.init(Cipher.ENCRYPT_MODE, pub);
    return c.doFinal(dataKey);
}

The problem in that snippet is not that RSA-OAEP with SHA-256 is weak; it is a perfectly respectable scheme. The problem is location. The primitive and its parameters live at the call site, so moving to a post-quantum key-encapsulation mechanism means finding and editing every caller rather than changing one line behind a boundary.

The second anti-pattern is algorithm in the schema. The algorithm’s identity gets baked into stored data formats, database columns, or serialized payloads with no version tag. You end up with a table full of ciphertext and no reliable way to know what encrypted any given row, which makes migration a forensic exercise. The fix is to make outputs self-describing, with the algorithm and key version bound into the authenticated payload rather than left as rewritable metadata, which is the whole point of Level 3.

// L3: the boundary owns the choice, and the output declares itself.
// A versioned key id lets you decrypt old records and write new ones
// under a different (e.g. post-quantum hybrid) scheme during migration.
public interface CryptoProvider {
    Envelope wrap(byte[] dataKey, KeyRef keyRef);   // keyRef = configured key version; its metadata names the algorithm
    byte[]   unwrap(Envelope envelope);             // dispatches via an allowlisted registry, never trusts envelope.algId() blindly
}

// Envelope: { version, keyId, algId, nonce, ciphertext }, serialized together.
// algId/keyId/nonce must be authenticated (bound as AEAD associated data,
// or covered by the signature), not just adjacent fields an attacker can rewrite.
// Callers never see "RSA" or "ML-KEM". They see wrap() and unwrap().

The third is the pinned dependency that can’t move. The component that terminates TLS or signs your artifacts is a vendored library or appliance locked to a version where hybrid is not an option, so enabling post-quantum becomes a major upgrade owned by someone else. The symptom is that your crypto choices are not yours to make. This one is a procurement and architecture problem as much as a code one, and it is worth surfacing early because its lead time is the longest.

Make the boundary enforceable

A boundary that depends on everyone remembering to use it will erode. The difference between Level 3 and Level 4 is a test that fails the build when code reaches around the boundary, which turns crypto-agility from a documented intention into an invariant. In a JVM codebase, an ArchUnit rule does this in a few lines.

// A fitness function: no class outside the crypto boundary may touch
// JCA primitives directly. New violations fail CI; that is the point.
@AnalyzeClasses(packages = "com.acme")
class CryptoAgilityRules {

    @ArchTest
    static final ArchRule primitives_live_only_behind_the_boundary =
        noClasses()
            .that().resideOutsideOfPackage("com.acme.crypto..")
            .should().dependOnClassesThat()
            .resideInAnyPackage(
                "javax.crypto..",            // Cipher, KeyGenerator, Mac
                "java.security..")           // KeyPairGenerator, Signature, MessageDigest
            // java.security.. is deliberately broad; allowlist the non-crypto
            // types it also holds (Principal, AccessController) via .ignoreDependency(...)
            // or narrow to the concrete primitive classes once the baseline is clean.
            .because("all cryptographic primitives must go through com.acme.crypto");
}

The objection is immediate and fair: a real codebase has hundreds of existing violations, and a rule that fails the build on day one just gets disabled. The answer is to ratchet, not to boil the ocean. Capture the current violations as a frozen baseline, fail the build only on net-new ones, and burn the baseline down deliberately as teams touch the code. ArchUnit supports a FreezingArchRule for exactly this, and the same pattern exists for other ecosystems through architecture-lint tools or a custom check over a static-analysis pass.

// Freeze today's reality; gate tomorrow's additions.
// Existing violations are recorded once and ignored until fixed;
// any new direct primitive use fails the build immediately.
@ArchTest
static final ArchRule gated =
    FreezingArchRule.freeze(primitives_live_only_behind_the_boundary);

That single move changes the economics. You stop owing an all-at-once refactor and start owing a slow, bounded cleanup that can never regress. It is the cheapest high-leverage step on the whole ladder, and it is available to a team sitting at Level 0 today.

The rule is not the whole story, and it should not pretend to be. It sees static dependencies inside the JVM, which means crypto reached through reflection, a sidecar, an external CLI, a TLS-terminating appliance, or a key-management policy slips straight past it. That is precisely why Level 4 pairs the fitness function with a maintained inventory: the gate stops the code you control from drifting, and the inventory accounts for the crypto you do not.

Where to start, and where to stop

Crypto-agility is worth real money, which means it competes with features and has to be scoped like any other investment. Two principles keep it from sprawling.

First, prioritize by data lifetime, not by call-site count. The harvest-now-decrypt-later threat only bites data that must stay confidential for years, so long-lived stored secrets, archived records, and anything signed for long-term verification come first. Ephemeral session traffic and short-lived tokens can ride the slower path of TLS-layer and library upgrades, because by the time a quantum computer is decrypting yesterday’s TLS handshake, that session is long irrelevant. A team that re-encrypts its cache before its ten-year document store has its priorities backwards.

Second, match the rung to the blast radius. A single service with one crypto call site does not need ArchUnit and a CBOM; getting to Level 2 is plenty. A platform organization with dozens of services sharing a security reputation needs Level 4, because there the failure mode is not one team’s mistake but the slow drift of the whole estate back to hard-coded primitives. Push to the rung that matches your exposure and stop there.

The unglamorous constraints are the ones that decide whether this happens at all. Post-quantum modules still lag on FIPS validation, so regulated workloads may have to run hybrid or wait. Some libraries do not yet expose a pluggable provider, which caps you below Level 3 until they do, and the honest move is to record that as a known ceiling rather than pretend. And there is the political constraint that sinks more of this work than any technical one: a team that recently passed a security audit feels finished, and crypto-agility reads as refactoring with no visible feature behind it. It is exactly the kind of debt that is invisible until a deadline makes it a fire drill.

Start with the inventory, because you cannot manage what you have not counted.

Post-quantum readiness will be reported up the chain as an algorithm milestone, because that is the part with a press release. The teams that come through it without drama will be the ones who understood it as an architecture milestone first: a single boundary every primitive passes through, outputs that say what made them, and a test that refuses to let the codebase forget. The algorithm behind that boundary is replaceable, and that is the entire point. Build the boundary, and the next transition, post-quantum or whatever follows it, becomes a configuration change instead of a project.

If you are being asked for a post-quantum plan this year and the honest first answer is “we are not sure how many places our code touches crypto,” that is the conversation worth having now rather than against a deadline. We have helped teams run the inventory, place the boundary, and put a fitness function behind it, and we are happy to talk it through.