Deep Review: 20260406-103045-pr-2640 ¶

Date	2026-04-06 10:30
Repo	mikefarah/yq
Round	1
Mode	Thorough (4 passes)
Author	@copilot-swe-agent
PR	#2640 — Add system(command; args) operator (disabled by default)
Branch	`copilot/add-system-command-operator`
Commits	`6f94991c` Apply suggestions from code review `b3b44788` Remove deprecated --enable-system-operator alias `e10e8127` Apply suggestions from code review `53abbbae` Validate command node type and handle multiple results `5ea069a5` Update system-operators.md `2c8605f6` Update headers/system-operators.md `884c2d8b` Add yqFlags to expressionScenario; fix system op docs `da611f7a` Evaluate system command/args per matched node `6315cfe9` Update operator_system.go `9a72c0f7` Add system(command; args) operator `d181cf87` Initial plan
Reviewers	Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro
Verdict	Merge with fixes — address I1 (silent null on disabled) and I4 (misleading docs) before merge; remaining items can follow
Wall-clock time	`96 min 32 s`

Executive Summary ¶

This PR adds a system operator that executes external commands from yq expressions, gated behind --security-enable-system-operator. The implementation follows established yq patterns: simpleOp registration, blockOpType for optional arguments, SingleReadonlyChildContext for per-node evaluation, and exec.Command (no shell) to avoid injection. The main concerns are the silent-null behaviour when disabled (inconsistent with how env/file operators fail hard) and misleading documentation about semicolon-separated arguments.

Critical Issues ¶

None.

Important Issues ¶

I1. Disabled mode silently injects nulls instead of erroring (important, regression) Claude Gemini ¶

When the system operator is disabled (the default), it returns null per matched node with only a warning on stderr. The existing env and file operators return errors when their security gates are active (operator_env.go:20-22, operator_load.go:66-68). In a pipeline like yq '.field = system("cmd")' file.yml | kubectl apply -f -, a user who forgets the flag gets null values silently embedded in output with exit code 0. The warning goes to stderr, which pipelines routinely discard.

pkg/yqlib/operator_system.go

func systemOperator(d *dataTreeNavigator, context Context, expressionNode *ExpressionNode) (Context, error) {
	if !ConfiguredSecurityPreferences.EnableSystemOps {
		log.Warning("system operator is disabled, use --security-enable-system-operator flag to enable")
		results := list.New()
		for el := context.MatchingNodes.Front(); el != nil; el = el.Next() {
			candidate := el.Value.(*CandidateNode)
			results.PushBack(candidate.CreateReplacement(ScalarNode, "!!null", "null"))
		}
		return context.ChildContext(results), nil
	}

Fix: Return an error, matching the codebase convention:

 if !ConfiguredSecurityPreferences.EnableSystemOps {
-    log.Warning("system operator is disabled, use --security-enable-system-operator flag to enable")
-    results := list.New()
-    for el := context.MatchingNodes.Front(); el != nil; el = el.Next() {
-        candidate := el.Value.(*CandidateNode)
-        results.PushBack(candidate.CreateReplacement(ScalarNode, "!!null", "null"))
-    }
-    return context.ChildContext(results), nil
+    return Context{}, fmt.Errorf("system operations are disabled, use --security-enable-system-operator to enable")
 }

If the null-return behaviour is intentional, document the divergence from the env/file pattern in a code comment.

I2. Multi-result arg expressions silently truncated to first match (important, regression) Claude Codex Gemini ¶

When the args expression produces multiple results (e.g. system("echo"; .args[]) on {"args":["a","b"]}), only the first result is used; the rest are silently dropped. The command side (resolveCommandNode at line 50-51) logs a debug message for multiple results, but the args side does not. A user writing system("cmd"; .items[]) gets only the first item with no indication that the rest were dropped.

pkg/yqlib/operator_system.go

			if argsNodes.MatchingNodes.Front() != nil {
				args = resolveSystemArgs(argsNodes.MatchingNodes.Front().Value.(*CandidateNode))
			}

Fix: Add a debug log matching the command-resolution pattern, or flatten all matching arg nodes into argv:

+if argsNodes.MatchingNodes.Len() > 1 {
+    log.Debugf("system operator: args expression returned %d results, using first", argsNodes.MatchingNodes.Len())
+}
 if argsNodes.MatchingNodes.Front() != nil {
     args = resolveSystemArgs(argsNodes.MatchingNodes.Front().Value.(*CandidateNode))
 }

I3. Command validation accepts non-string scalars (important, regression) Codex ¶

The check rejects !!null and non-scalars but allows !!int, !!bool, and other scalar tags through. system(123) tries to exec "123" rather than failing validation, contradicting both the error text and the documented contract. In practice the OS returns a "not found" error, but the validation should catch this before reaching exec.

pkg/yqlib/operator_system.go

	cmdNode := commandNodes.MatchingNodes.Front().Value.(*CandidateNode)
	if cmdNode.Kind != ScalarNode || cmdNode.Tag == "!!null" {
		return "", fmt.Errorf("system operator: command must be a string scalar")
	}

Fix: Check the tag explicitly:

-if cmdNode.Kind != ScalarNode || cmdNode.Tag == "!!null" {
+if cmdNode.Kind != ScalarNode || cmdNode.guessTagFromCustomType() != "!!str" {
    return "", fmt.Errorf("system operator: command must be a string scalar")
 }

I4. Misleading documentation about semicolons for multiple args (important, regression) Gemini ¶

This suggests multiple arguments can be semicolon-separated: system("cmd"; "arg1"; "arg2"). In yq, ; is the block operator, which evaluates expressions sequentially and yields the right-hand result. Writing system("cmd"; "arg1"; "arg2") silently drops "arg1" and passes only "arg2". Multiple arguments require an array: system("cmd"; ["arg1", "arg2"]).

pkg/yqlib/doc/operators/headers/system-operators.md

- A command string (required)
- An argument or array of arguments separated by `;` (optional)

Fix:

-- An argument or array of arguments separated by `;` (optional)
+- An argument (or an array of arguments), separated from the command by `;` (optional)

I5. Argument resolution silently ignores invalid node types (important, regression) Gemini ¶

When a sequence argument contains a non-scalar or null element, resolveSystemArgs logs a warning and skips it. Silently dropping command arguments alters execution: system("rm"; ["-rf", {bad: node}]) resolves to rm -rf, losing the invalid node without halting.

pkg/yqlib/operator_system.go

			if child.Kind != ScalarNode || child.Tag == "!!null" {
				log.Warningf("system operator: argument must be a non-null scalar; got kind=%v tag=%v - ignoring", child.Kind, child.Tag)
				continue
			}

Fix: Return an error instead of continuing:

 if child.Kind != ScalarNode || child.Tag == "!!null" {
-    log.Warningf("system operator: argument must be a non-null scalar; got kind=%v tag=%v - ignoring", child.Kind, child.Tag)
-    continue
+    return nil, fmt.Errorf("system operator: argument must be a non-null scalar; got kind=%v tag=%v", child.Kind, child.Tag)
 }

This requires changing resolveSystemArgs to return ([]string, error) and updating its callers.

I6. system operator subsumes env/file security restrictions — undocumented (important, gap) Claude (pass 2) ¶

When --security-enable-system-operator is active, --security-disable-env-ops and --security-disable-file-ops become ineffective: system("printenv"; "SECRET") reads environment variables and system("cat"; "/path/to/file") reads arbitrary files. The documentation does not mention this.

cmd/root.go

	rootCmd.PersistentFlags().BoolVarP(&yqlib.ConfiguredSecurityPreferences.DisableEnvOps, "security-disable-env-ops", "", false, "Disable env related operations.")
	rootCmd.PersistentFlags().BoolVarP(&yqlib.ConfiguredSecurityPreferences.DisableFileOps, "security-disable-file-ops", "", false, "Disable file related operations (e.g. load)")
	rootCmd.PersistentFlags().BoolVarP(&yqlib.ConfiguredSecurityPreferences.EnableSystemOps, "security-enable-system-operator", "", false, "Enable system operator to allow execution of external commands.")

Fix: Add a note to headers/system-operators.md:

 **Security warning**: The system operator is disabled by default. You must explicitly pass `--security-enable-system-operator` to use it.
+
+**Note:** When enabled, the system operator can replicate the functionality of `env` and `load`
+operators via external commands. Enabling it effectively overrides `--security-disable-env-ops`
+and `--security-disable-file-ops`.

I7. Multi-node execution path untested (important, gap) Claude (pass 2) ¶

All 7 test scenarios exercise the per-node loop (operator_system.go:79-145) with exactly one matched node. Result accumulation across iterations, per-node context isolation, and failure-on-Nth-node behaviour are untested.

Fix: Add a multi-document or multi-element test:

{
    description: "System operator processes multiple documents",
    skipDoc:     true,
    document:    "a: first",
    document2:   "a: second",
    expression:  `.a = system("` + echoPath + `"; "replaced")`,
    expected: []string{
        "D0, P[], (!!map)::a: replaced\n",
        "D1, P[], (!!map)::a: replaced\n",
    },
},

I8. os/exec import has no build-tag exclusion for tinygo (important, gap) Claude (pass 3) ¶

The system operator is the only file in pkg/yqlib that imports os/exec. Every other heavy dependency (JSON, XML, TOML, Lua, HCL, etc.) has a corresponding no_*.go stub and build tag for tinygo builds. The system operator lacks this mechanism.

pkg/yqlib/operator_system.go

package yqlib

import (
	"bytes"
	"container/list"
	"fmt"
	"os/exec"
	"strings"
)

Fix: Add a yq_nosystem build tag. Create no_system.go with a stub that returns an error, gate operator_system.go with //go:build !yq_nosystem, and update scripts/build-tinygo-yq.sh to include yq_nosystem.

Suggestions ¶

S1. No test verifying stdin piping (suggestion, gap) Claude ¶

The documentation states "The current matched node's value is serialised and piped to the command via stdin" but no test verifies this. A test using cat (which echoes stdin) would confirm the YAML encoding is correctly piped.

S2. No test for --null-input usage (suggestion, gap) Claude ¶

The documentation header shows --null-input as a primary usage pattern, but all test scenarios provide a document. A null-input scenario would verify the operator handles a null scalar node on stdin.

S3. Misleading "parse time" comment (suggestion, regression) Claude ¶

pkg/yqlib/operator_system.go

	// determine at parse time whether we have (command; args) or just (command)
	hasArgs := expressionNode.RHS.Operation.OperationType == blockOpType

This runs during expression evaluation, not at parse time. Fix: remove "at parse time".

S4. CLI security flag not exercised by tests (suggestion, gap) Codex ¶

Tests toggle ConfiguredSecurityPreferences.EnableSystemOps directly instead of testing the Cobra flag wiring at cmd/root.go:215. The test suite would still pass if the flag were renamed or unregistered.

pkg/yqlib/operator_system_test.go

func TestSystemOperatorDisabledScenarios(t *testing.T) {
	originalEnableSystemOps := ConfiguredSecurityPreferences.EnableSystemOps
	defer func() {
		ConfiguredSecurityPreferences.EnableSystemOps = originalEnableSystemOps
	}()

	ConfiguredSecurityPreferences.EnableSystemOps = false

	for _, tt := range systemOperatorDisabledScenarios {
		testScenario(t, &tt)
	}
	documentOperatorScenarios(t, "system-operators", systemOperatorDisabledScenarios)
}

S5. Test doc generation uses host-specific absolute paths (suggestion, regression) Gemini ¶

findExec embeds machine-specific paths (e.g. /usr/bin/echo vs /bin/echo) into generated docs. Running make generate-docs on a different host produces a diff.

pkg/yqlib/operator_system_test.go

func TestSystemOperatorEnabledScenarios(t *testing.T) {
	echoPath := findExec(t, "echo")
	falsePath := findExec(t, "false")

	originalEnableSystemOps := ConfiguredSecurityPreferences.EnableSystemOps
	defer func() {
		ConfiguredSecurityPreferences.EnableSystemOps = originalEnableSystemOps
	}()

	ConfiguredSecurityPreferences.EnableSystemOps = true

	scenarios := []expressionScenario{
		{
			description:    "Run a command with an argument",
			subdescription: "Use `--security-enable-system-operator` to enable the system operator.",
			yqFlags:        "--security-enable-system-operator",
			document:       "country: Australia",

S6. Command stderr discarded on success (suggestion, gap) Gemini ¶

The stderr buffer is checked only when cmd.Output() returns an error. Commands that succeed with warnings on stderr have those warnings silently discarded.

pkg/yqlib/operator_system.go

		cmd.Stderr = &stderr

		output, err := cmd.Output()
		if err != nil {
			stderrStr := strings.TrimSpace(stderr.String())
			if stderrStr != "" {
				return Context{}, fmt.Errorf("system command '%v' failed: %w\nstderr: %v", command, err, stderrStr)
			}
			return Context{}, fmt.Errorf("system command '%v' failed: %w", command, err)
		}

S7. Embedded newline in error value (suggestion, regression) Claude (pass 2) ¶

pkg/yqlib/operator_system.go

			if stderrStr != "" {
				return Context{}, fmt.Errorf("system command '%v' failed: %w\nstderr: %v", command, err, stderrStr)
			}

Go errors are conventionally single-line. No other fmt.Errorf in pkg/yqlib/ embeds a newline. Fix: use a semicolon or parenthetical.

S8. No subprocess timeout (suggestion, enhancement) Claude ¶

A hanging command blocks yq indefinitely. Using exec.CommandContext with a timeout would prevent this. Low priority — matches standard CLI tool conventions.

pkg/yqlib/operator_system.go

		// #nosec G204 - intentional: user must explicitly enable this operator
		cmd := exec.Command(command, args...)
		cmd.Stdin = &stdin
		var stderr bytes.Buffer
		cmd.Stderr = &stderr

		output, err := cmd.Output()

S9. Unnecessary double memory allocation for stdin (suggestion, enhancement) Gemini (pass 2) ¶

encodeToYamlString allocates a string; stdin.WriteString(encoded) copies it into a bytes.Buffer. Use strings.NewReader(encoded) instead to avoid the extra allocation.

pkg/yqlib/operator_system.go

		var stdin bytes.Buffer
		encoded, err := encodeToYamlString(candidate)
		if err != nil {
			return Context{}, err
		}
		stdin.WriteString(encoded)

S10. Doc generation depends on test execution order (suggestion, gap) Claude (pass 2) ¶

TestSystemOperatorDisabledScenarios must run before TestSystemOperatorEnabledScenarios because the first creates the doc file (via os.Create) and the second only appends. This implicit ordering dependency is fragile but matches existing patterns in operator_load_test.go.

pkg/yqlib/operator_system_test.go

	documentOperatorScenarios(t, "system-operators", systemOperatorDisabledScenarios)

pkg/yqlib/operator_system_test.go

	appendOperatorDocumentScenario(t, "system-operators", scenarios)

S11. Multiple semicolons produce a confusing error (suggestion, regression) Claude (pass 3) ¶

system("cmd"; "arg1"; "arg2") creates nested BLOCK nodes. The operator evaluates the inner BLOCK, gets an empty context, and reports "command expression returned no results" — a confusing error for what is really a syntax mistake. A check for nested blocks could produce a targeted error message.

pkg/yqlib/operator_system.go

		if hasArgs {
			block := expressionNode.RHS
			commandNodes, err := d.GetMatchingNodes(nodeContext, block.LHS)
			if err != nil {
				return Context{}, err
			}
			command, err = resolveCommandNode(commandNodes)
			if err != nil {
				return Context{}, err
			}

			argsNodes, err := d.GetMatchingNodes(nodeContext, block.RHS)

S12. flagsPrefix logic duplicated in documentInput (suggestion, regression) Claude (pass 3) ¶

The same 3-line pattern for building flagsPrefix appears in both branches of documentInput. Hoist above the if/else.

pkg/yqlib/operators_test.go

		writeOrPanic(w, "then\n")

		flagsPrefix := ""
		if s.yqFlags != "" {
			flagsPrefix = s.yqFlags + " "
		}
		if s.expression != "" {
			writeOrPanic(w, fmt.Sprintf("```bash\n%vyq %v%v'%v' %v\n```\n", envCommand, flagsPrefix, command, strings.ReplaceAll(s.expression, "'", `'\''`), files))
		} else {
			writeOrPanic(w, fmt.Sprintf("```bash\n%vyq %v%v%v\n```\n", envCommand, flagsPrefix, command, files))
		}
	} else {
		writeOrPanic(w, "Running\n")
		flagsPrefix := ""
		if s.yqFlags != "" {
			flagsPrefix = s.yqFlags + " "
		}

S13. Flag suffix inconsistent with existing security flags (suggestion, regression) Claude (pass 4) ¶

Existing flags use plural abbreviated suffixes: --security-disable-env-ops, --security-disable-file-ops. The new flag uses singular spelled-out --security-enable-system-operator. The Go field EnableSystemOps already follows the *Ops convention. Consider renaming to --security-enable-system-ops for consistency.

cmd/root.go

	rootCmd.PersistentFlags().BoolVarP(&yqlib.ConfiguredSecurityPreferences.DisableEnvOps, "security-disable-env-ops", "", false, "Disable env related operations.")
	rootCmd.PersistentFlags().BoolVarP(&yqlib.ConfiguredSecurityPreferences.DisableFileOps, "security-disable-file-ops", "", false, "Disable file related operations (e.g. load)")
	rootCmd.PersistentFlags().BoolVarP(&yqlib.ConfiguredSecurityPreferences.EnableSystemOps, "security-enable-system-operator", "", false, "Enable system operator to allow execution of external commands.")

S14. YAML encoding error returned without operator context (suggestion, regression) Claude (pass 4) ¶

Every other error path in systemOperator prefixes with "system operator:" or "system command '...' failed:". An encoding error here propagates without context. Fix: wrap with fmt.Errorf("system operator: failed to encode input: %w", err).

pkg/yqlib/operator_system.go

		encoded, err := encodeToYamlString(candidate)
		if err != nil {
			return Context{}, err
		}

Design Observations ¶

Concerns ¶

(in-scope) The inverted security model (enable vs. disable) is sound, but the inverted failure mode (null vs. error) breaks the principle of least surprise. Users who learn that --security-disable-env-ops causes errors will expect that the absence of --security-enable-system-operator also causes errors. Claude
(in-scope) The security model has an implicit hierarchy: system >= env + load. The flags are presented as independent toggles, but enabling system ops effectively grants everything the other two provide. This should be explicit to users. Claude (pass 2)
(future) The operator returns !!str for all output. A future enhancement could support system_yaml/system_json variants that parse command output, similar to load vs load_str. Claude (pass 2)

Strengths ¶

(in-scope) Using exec.Command (no shell) avoids shell injection entirely. Arguments pass directly to the exec syscall. Claude Gemini
(in-scope) Per-node evaluation via SingleReadonlyChildContext correctly scopes expression evaluation, matching the pattern used by loadStringOperator and other operators. Claude Codex
(in-scope) The inverted default (EnableSystemOps: false) correctly makes the more dangerous capability opt-in rather than opt-out. Claude Gemini
(in-scope) The #nosec G204 annotation at line 122 with explanation properly acknowledges the gosec warning and documents the mitigation. Claude
(in-scope) The yqFlags field added to expressionScenario is a clean, reusable solution for doc generation — any operator needing CLI flags in examples can use it. Claude Codex
(in-scope) The blockOpType check for optional arguments follows the pattern used by match, datetime, and with operators. Claude (pass 3)
(in-scope) .system (dot-prefixed) is correctly tokenized as a PathElement, not as the SYSTEM operator, so users with YAML keys named "system" are unaffected. Claude (pass 3)

Testing Assessment ¶

Test coverage is adequate for the core paths: disabled mode, single-arg, no-arg, array args, per-node expression evaluation, command failure, and null command. The highest-risk untested scenarios:

Multi-node execution (I7) — the core for-loop is never tested with >1 node
Stdin piping (S1) — the core feature of passing node YAML to the command's stdin
--null-input mode (S2) — documented primary usage pattern
CLI flag wiring (S4) — Cobra flag binding is untested
Non-string command scalars (I3) — system(123) reaches exec.Command unchecked

Documentation Assessment ¶

The header documentation uses correct UK English ("serialised") and clearly warns about the security gate. The generated examples correctly show the flag for enabled scenarios and omit it for disabled. Two gaps:

The semicolon description at headers/system-operators.md:15 is misleading (I4)
The security subsumption relationship with env/file ops is undocumented (I6)

Commit Structure ¶

The history shows iterative Copilot agent refinement: an initial plan commit (d181cf87, empty), the main implementation, then multiple fixup commits responding to review feedback. Several commits are patches on earlier ones:

6315cfe9, 5ea069a5, 2c8605f6 — filename-only messages with no indication of what changed
e10e8127, 6f94991c — "Apply suggestions from code review" without specifics
d181cf87 — empty plan commit

This is expected for a Copilot-authored PR iterated via review comments. A squash merge would produce a clean single commit.

Acknowledged Limitations ¶

#nosec G204 at operator_system.go:122 — intentional use of variable command paths for exec.Command, gated by explicit opt-in via --security-enable-system-operator.

Unresolved Feedback ¶

All four substantive review comments from @mikefarah have been addressed by subsequent commits:

Per-node evaluation with SingleReadonlyChildContext — commit da611f7a
Missing --security-enable-system-operator in doc examples — commit 884c2d8b
Validate command node type, debug log for multiple results — commit 53abbbae
Remove deprecated --enable-system-operator alias — commit b3b44788

However, the arg side of review comment 3 remains partially unaddressed: the request was to "use the first (and log a debug, not a warning, of all the matches)." The command side logs extra results at debug level (operator_system.go:50-52), but the arg side still truncates silently (operator_system.go:101-103). See I2.

URL: https://github.com/mikefarah/yq/pull/2640#discussion_r3004049491

Agent Performance Retro ¶

Claude Opus 4.6 ¶

Pass 1 (604s, 43 tool calls): Produced 1 important and 5 suggestions. Read 19 files and ran 15 grep searches. Used git blame (2 calls) to verify findings. The strongest unique contribution was I1 (disabled mode returns null), which Claude traced through operator_env.go and operator_load.go to demonstrate the inconsistency. Also identified 4 testing gaps (S1-S4) that no other agent found.

Pass 2 (648s, 38 tool calls): Found the security subsumption issue (I6) — a cross-cutting concern requiring understanding of how exec.Command subsumes env/file operations. Also identified the multi-node test gap (I7) and the embedded-newline convention issue (S7). Used git blame 4 times to verify findings.

Pass 3 (804s, 53 tool calls): Identified the tinygo build-tag gap (I8) by checking all no_*.go stubs and build-tinygo-yq.sh. This required 30 grep searches and 20 file reads — the most thorough exploration of any pass. Also found the multiple-semicolons UX issue (S11), the duplicated flagsPrefix code (S12), and the eval+system composition gap (S20). Used git blame 10 times.

Pass 4 (757s, 54 tool calls): Found the flag naming inconsistency (S13) by comparing all --security-* flags, and two error-handling suggestions (S14, S22). 30 grep searches, 0 important findings — diminishing returns as expected for the final pass.

Coverage misses: None. Claude reviewed all 9 files in every pass.

Codex GPT 5.4 ¶

Pass 1 (249s, 45 tool calls): Produced 2 important and 1 suggestion. Codex was the only agent to identify the non-string scalar validation gap (I3), tracing through YAML tag handling. It also independently found the arg truncation issue (I2) and the CLI flag test gap (S4). Used git blame 6 times — the most of any agent in pass 1. Ran go test (6 calls) to verify the test suite passes.

Pass 2 (200s, 46 tool calls): Found no new issues beyond the 13 already known. Thorough investigation (23 sed -n for targeted line reads, 11 rg searches, 3 git blame) but the review was exhausted. Dropped after this pass.

Coverage misses: None. Codex reviewed all 9 files.

Gemini 3.1 Pro ¶

Pass 1 (394s, 26 tool calls): Most productive initial pass with 5 important and 2 suggestions. Unique contributions: the misleading docs finding (I4) — an insight requiring understanding of yq's block operator semantics — and the invalid arg type issue (I5). Also independently found I1 (disabled returns null) and S6 (stderr discarded on success). Notably ran ./yq 10 times for live verification, the only agent to do so.

Pass 2 (350s, 27 tool calls): Found 3 new issues (double allocation S9, stdout discarded on failure, missing command abort handling). Ran go test 13 times for verification. However, Gemini's I3 (missing command aborts evaluation) is a design choice, not a bug — erroring on a missing command is the safer default.

Pass 3 (301s, 0 useful output): Hit Gemini Pro quota exhaustion (TerminalQuotaError: QUOTA_EXHAUSTED). Dropped.

Coverage misses: In pass 1, did not run git blame (expected — known Gemini quota limitation). This means Gemini cannot distinguish regressions from pre-existing issues. In this PR all changed files are new, so the distinction is moot.

Tool call highlights:

git blame usage: Claude used blame in every pass (2, 4, 10, 2 = 18 total). Codex used it in passes 1-2 (6, 3 = 9 total). Gemini never used blame — a known quota trade-off.
Live verification: Gemini was the only agent to run ./yq directly (10 times in pass 1), testing operator behaviour against the actual binary rather than relying solely on code reading.
Test execution: Codex ran go test 6 times in pass 1; Gemini ran it 13 times in pass 2. Claude never ran tests, relying on code analysis.

Summary ¶

	Claude Opus 4.6	Codex GPT 5.4	Gemini 3.1 Pro
Duration (pass 1)	604s	249s	394s
Findings (pass 1)	1I 5S	2I 1S	5I 2S
Tool calls (pass 1)	43 (Read 19, Grep 15)	45 (exec 43)	26 (shell 15, grep 9)
Duration (pass 2)	648s	200s	350s
Findings (pass 2)	2I 3S	0	3I 2S
Tool calls (pass 2)	38 (Read 15, Grep 10)	46 (exec 44)	27 (shell 19, grep 6)
Duration (pass 3)	804s	—	301s (empty)
Findings (pass 3)	1I 4S	—	0 (quota)
Tool calls (pass 3)	53 (Grep 30, Read 20)	—	—
Duration (pass 4)	757s	—	—
Findings (pass 4)	0I 3S	—	—
Tool calls (pass 4)	54 (Grep 30, Read 22)	—	—
Totals	4I 15S	2I 1S	5I 2S + empty
False positives	0	0	1 (I3 pass 2: design choice)
Downgraded	0	0	3 (I1→S9, I2→informational, I3→informational)
Unique insights	I1, I6, I7, I8, S3, S7-S14	I3, I2	I4, I5, S5, S6
Coverage misses	0	0	0

Reconciliation:

Gemini P1 I5 (stderr on success) → S6 (suggestion; standard CLI behaviour)
Gemini P2 I1 (double allocation) → S9 (suggestion; marginal performance impact)
Gemini P2 I2 (stdout on failure) → informational (merged into existing S7 context)
Gemini P2 I3 (missing command aborts) → informational (erroring is the correct defensive behaviour)
Claude P1 S1 (arg truncation) → promoted to I2 (important; Codex's severity assessment was more accurate)

All three agents contributed distinct value. Gemini excelled in pass 1 with the highest finding count and unique documentation insight (I4). Codex was the fastest and most precise, finding the non-string validation gap (I3) that required deep YAML tag knowledge. Claude was the most persistent across all 4 passes and found the cross-cutting issues (I6 security subsumption, I8 tinygo build tag) that required broad codebase exploration.

Review Process Notes ¶

No suggestions for this review — the prompt guided agents well, the findings were high quality, and no prompt gaps were identified.