AI-written patch example

AI-written patches changed approved behavior even though the repo's own checks still passed.

DriftFence was tested on AI-written patches from blind extra-high reasoning runs. Run setting: GPT-5.5 at extra-high reasoning. In release-it, DriftFence blocked 3 of 6 test-passing private-package patches, and a separate blinded follow-up review judged all three meaningful. In verdaccio, it blocked 5 of 5 test-passing deprecation-merge patches while nearby cases did not trigger DriftFence.

Measured on blind extra-high reasoning runs. Run setting: GPT-5.5. The relevant standard CI checks ran first, DriftFence ran second, and blocked patches then received blinded follow-up review.

One test-passing patch DriftFence would have blocked.

This blind extra-high reasoning patch kept the relevant release-it standard CI checks green. DriftFence blocked the change, and independent review later upheld that block. Run setting: GPT-5.5.

representative blind patch CI verdict: passing DriftFence gate: blocked independent review: upheld
Representative patch

A refactor removed the expected npm command.

The patch tried to make private-package handling more implicit. It widened when the npm plugin stays enabled and forced publish = false for private packages, so the protected scenario no longer emitted the expected npm command.

lib/plugin/npm/npm.js representative patch record
+const isPrivatePackageWithImplicitNpm = options => {
+  if (options !== false || !hasAccess(MANIFEST_PATH)) return false;
+  const manifest = getManifest();
+  const releaseItConfig = manifest['release-it'];
+  return manifest.private && (!releaseItConfig || releaseItConfig.npm !== false);
+};

@@ class npm extends Plugin {
  static isEnabled(options) {
-    return hasAccess(MANIFEST_PATH) && options !== false;
+    if (!hasAccess(MANIFEST_PATH)) return false;
+    return options !== false || isPrivatePackageWithImplicitNpm(options);
  }

@@ constructor(...args) {
+    const shouldDisablePublish = isPrivatePackageWithImplicitNpm(this.options);
+    this.options = getOptionsObject(this.options);
+    if (shouldDisablePublish) this.options.publish = false;
Relevant repo checks still passed

The fixed benchmark command stayed green.

Every patch in this experiment ran the same release-it benchmark command. For this representative patch, that command still passed with exit code 0.

benchmark command definition run record
node --env-file=.env.test --test --test-concurrency=1 test/benchmark/minimal-patch-release.driftfence.js test/benchmark/subdirectory-version-without-repo-tag.driftfence.js test/benchmark/prerelease-next-tag-publish.driftfence.js test/benchmark/private-package-lockfile-bump.driftfence.js test/benchmark/registry-publishconfig-propagation.driftfence.js

ci.verdict: passing
ci.exitCode: 0
DriftFence report

The first reported difference was not a test failure.

DriftFence compared the recorded operation behavior from that test run against the approved private-package contract and reported the first mismatch on npm command count for the protected operation scenario.

report excerpt
protected operation private-package publishing behavior
scenario private package version bump without publish
status VIOLATING
message Mismatch at output.commands.npm.
First divergence
component output
path output.commands.npm
expected output.commands.npm = 1
observed output.commands.npm = 0

Measured release-it and verdaccio cases.

In release-it, DriftFence blocked 3 of 6 private-package changes while the same standard CI checks still passed. In verdaccio, it blocked 5 of 5 deprecation-merge changes, and nearby cases did not trigger DriftFence.

release-it results

Three measured release-it cases.

In release-it, DriftFence blocked 3 of 6 model-written private-package patches while the same release-it standard CI checks still passed. Two nearby release cases did not trigger DriftFence.

private-package: 3 / 6 blocked prerelease: 0 / 5 blocked subdirectory: 0 blocks from DriftFence, 1 CI failure
Blocked case

Private-package publishing behavior

DriftFence blocked 3 of 6 test-passing private-package patches, and a separate blinded follow-up review judged all three meaningful.

Comparison case not flagged

Prerelease publish did not trigger DriftFence.

DriftFence blocked 0 of 5 test-passing prerelease patches here.

Comparison case not flagged

Subdirectory version produced no standalone signal.

DriftFence produced no standalone block here, and one run hit a normal CI failure first.

One additional release-it case: the subdirectory-version task is kept on the page because standard CI failed first there, so it is listed separately from the cases where standard CI stayed green.

A second repo showed the same pattern.

Outside release-it, a verdaccio case showed the same pattern, and a nearby verdaccio case did not trigger DriftFence.

Second codebase result

Verdaccio deprecation merge: 5 / 5 blocked.

Outside release-it, a verdaccio deprecation-merge case produced 5 of 5 blocked test-passing patches. A separate blinded follow-up review judged all five meaningful.

Comparison case not flagged

Verdaccio proxy selection: 0 / 5 blocked.

The verdaccio proxy-selection case did not trigger DriftFence: 0 of 5 blocked while the relevant standard CI checks still passed.

How the experiment was measured.

Each reported number comes from one pre-merge model run and the single patch it produced. Run setting: GPT-5.5 at extra-high reasoning effort. The relevant standard CI checks ran first. DriftFence then compared the recorded behavior from that run against the previously approved behavior for the same operation.

1. Fixed task

Each run starts from the same setup.

Each task uses the same repo state and the same approved expected-behavior file before the model writes anything.

  • Every run in a task starts from the same setup.
  • Editable paths are limited to implementation files.
  • The model could not update the approved behavior file.
2. Model writes the patch

The prompt looked like a normal coding task.

The model-facing prompt contained the task goal, success criteria, allowed edit paths, and the source context needed to make a plausible code change.

  • The prompt did not mention DriftFence.
  • No approved behavior files or scoring hints were shown.
  • The output had to be a code patch only.
3. Check the result

Run the relevant tests first, then DriftFence.

After applying the patch, the harness runs the relevant task tests and then driftfence check --mode enforce against the same approved behavior files.

  • Each unit is one model run and its resulting patch.
  • The headline metric is how often DriftFence would have blocked changed behavior even though the repo's own test checks still passed.
  • Blocked runs then received independent follow-up review.
What the model saw

Normal coding work.

Task title, goal, allowed edit paths, conventional success criteria, and curated release-it source files needed to make a plausible implementation change.

What the model did not see

No hidden scoring hints.

No benchmark-only tests, no behavior files, no DriftFence reports, and no instruction to optimize for the gate.

Other cases from the same test set.

Not every case in the same test set triggered DriftFence.

Registry-focused case

Registry handling did not trigger DriftFence.

A separate registry-focused case from the same repo produced no DriftFence block in this run set.

Public references.

These public references explain the protocol and summarize the measured results without relying on repo-local build artifacts.

Fastest path

Want the measured cases mapped to your own workflow?

Start with a workflow fit review if you want to map these examples to your own repo. Bring the workflow you care about, the repo or release path, and the checks you already trust. The form generates a structured private email so the first reply can be specific.

  • Best for teams evaluating a critical workflow where approved behavior needs an explicit merge gate.
  • Best first message: the workflow name, the behavior that must not drift silently, and whether you want Team or Pilot.
  • Best next step: compare one concrete DriftFence pattern from the release-it result against one workflow in your repo.
Useful context

Bring one repo and one risky path.

That is enough to tell whether DriftFence can protect the workflow or whether the current test surface needs work first.

Repo or PR Workflow path Current test gap
Commercial start

Team from $750/month. Pilots from $15,000.

If the measured cases fit your workflow, the pricing page shows the launch team, pilot, and enterprise paths.

Private path

Start with a workflow fit review.

Use the fit-review page for private product questions, follow-up on your own repo, and pilot scoping.