Current proof, not a broad claim

Catch AI release behavior changes before green CI ships them.

We ran GPT-5.4 on a small set of fixed release-it coding tasks. release-it is an open-source release automation tool for versioning, tagging, and publishing npm packages. On two publish-behavior operations, some model-written patches kept the relevant release tests green but still changed approved release behavior, and DriftFence would have blocked those changes before merge. The current public proof is specific: DriftFence can catch AI-written release-it changes that silently alter custom npm registry handling or private-package publish rules even when the relevant release tests still pass.

Best for release teams Teams using AI to edit release scripts and publish flows.
Best for npm policy risk Teams where the wrong registry or wrong publish behavior is expensive.
Best for Git-based review Teams that want a readable approval surface instead of another opaque test failure.
Best current proof surface: npm run demo:release-wedge and npm run demo:release-wedge:check
Method note: the model prompt did not mention DriftFence or include DriftFence artifacts.

The proof loop is simple.

The model gets a normal coding task without being told DriftFence is part of the evaluation, the relevant release tests still pass, and DriftFence blocks if the approved release behavior changed.

1. Model run

AI patch

A model gets a constrained release-it task and only the code context it needs to make the requested change.

2. Conventional gate

CI green

The targeted release tests still pass, so this is not just a case where the normal test suite already failed.

3. Saved expected behavior

DriftFence red

DriftFence compares runtime behavior to the approved expected behavior for that operation and blocks the patch when that behavior changed.

Why this case fits.

This release-it case matches the original DriftFence motivation better than the broad OSS replay studies: it is pre-merge, AI-written, repeatable, and easy to explain to an operator.

Operational risk

Wrong registry. Wrong publish behavior. Real blast radius.

  • A mistake in how release-it carries a custom npm registry through a release can send commands to the wrong registry.
  • A mistake in release-it's private-package publish rules can change what gets versioned or published.
  • These are behavior changes, not just wording changes.
Product fit

Behavior approval, not generic bug hunting.

  • The operation is stable enough to capture as a compact behavior record and approve in Git.
  • The approval surface lives in Git, where reviewers already work.
  • The report explains what changed instead of only saying “tests failed.”

Inspect the exact evidence.

The tabs below separate the clearest registry example, the supporting private-package operation, and the two other tested operations. That keeps the proof easy to inspect and keeps the claim boundary visible.

Registry operation

Custom npm registry handling is the clearest single demo.

In the headline evidence set, DriftFence would have blocked 2/6 model-written patches even though the relevant release tests passed 6/6 times. Independent review judged one blocked patch worth rejection and one likely noise.

Release tests: 6 / 6 passing DriftFence: blocked 2 / 6 Independent review: 1 reject, 1 likely noise
  • Representative blocked patch: run-api-10-gpt54
  • Task brief file: ../.tmp/agent-eval/release-it-registry-publishconfig-propagation/.driftfence-agent/AGENT_PROMPT.md
  • Patch file: ../.tmp/agent-eval/release-it-registry-publishconfig-propagation/.driftfence-agent/runs/run-api-10-gpt54/patch.diff
  • DriftFence report file: ../.tmp/agent-eval/release-it-registry-publishconfig-propagation/.driftfence-agent/runs/run-api-10-gpt54/check-report.md

What DriftFence is and is not proving here.

Trust goes up when the non-claim is as clear as the claim. This page is intentionally narrower than a generic product homepage.

What this page supports

Specific pre-merge release operation evidence.

This page supports one specific use case: AI-written release-it changes that alter publish behavior while the relevant release tests still pass.

What this page does not support

Not broad efficacy across every repo or operation.

This page does not yet show broad efficacy across all repos, all operations, or every release-it task. The two comparison operations shown above stay in the story to keep that boundary explicit.

cd /Users/austinyoung/Phase2/DriftFence && npm run demo:release-wedge:check && npm run demo:release-wedge