Deep Review: 20260405-213746-fix-repeat-overflow

Date2026-04-05 21:37
Repomikefarah/yq
Round1
Branchfix-repeat-overflow
Commits2c37aa9d Fix panic and OOM in repeatString for large repeat counts
ReviewersClaude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro
VerdictMerge with fixes — integer overflow in the new size guard defeats the protection this commit introduces
Wall-clock time11 min 51 s

Executive Summary

This commit replaces a fixed repeat-count ceiling (10 million) in repeatString with a result-size check (10 MiB), addressing two OSS-Fuzz panic reports. The size-based approach is the right design, but the new guard multiplies len(stringNode.Value) * count as native int values. On both 64-bit and 32-bit platforms, a sufficiently large count overflows the product to a negative or small positive number, bypassing the guard and reaching strings.Repeat — the exact panic path this commit exists to eliminate.


Critical Issues

C1. Integer overflow in size guard bypasses the check Claude Codex Gemini

len(stringNode.Value) and count are both int. The parseInt function at lib.go:176 accepts values up to math.MaxInt, so count can reach 263−1 on 64-bit platforms. Multiplying a 2-byte string length by 262 produces 263, which overflows int64 to math.MinInt64 (negative). Since negative values are always less than maxResultLen, the guard evaluates to false, and strings.Repeat panics with "strings: Repeat output length overflow" — the exact failure this commit aims to prevent.

	count, err := parseInt(intNode.Value)
	if err != nil {
		return nil, err
	} else if count < 0 {
		return nil, fmt.Errorf("cannot repeat string by a negative number (%v)", count)
	}
	maxResultLen := 10 * 1024 * 1024 // 10 MiB
	if count > 0 && len(stringNode.Value)*count > maxResultLen {
		return nil, fmt.Errorf("result of repeating string (%v bytes) by %v would exceed %v bytes", len(stringNode.Value), count, maxResultLen)
	}
	target.Value = strings.Repeat(stringNode.Value, count)

	return target, nil

Claude verified this experimentally: strLen=2, count=4611686018427387904 → product=-9223372036854775808 → guard bypassed → panic.

The same overflow occurs on 32-bit platforms (the project ships yq_windows_386.exe via scripts/xcompile.sh) at much smaller values. (critical, regression)

Fix: divide instead of multiply — division by a positive count cannot overflow:

 maxResultLen := 10 * 1024 * 1024 // 10 MiB
-if count > 0 && len(stringNode.Value)*count > maxResultLen {
+if count > 0 && len(stringNode.Value) > maxResultLen/count {
     return nil, fmt.Errorf("result of repeating string (%v bytes) by %v would exceed %v bytes", len(stringNode.Value), count, maxResultLen)
 }

When count is enormous, maxResultLen/count truncates to 0, so any non-empty string triggers the guard. When count is 0, the left side of && short-circuits. This pattern is standard for overflow-safe size checks.


Important Issues

I1. No test for the integer overflow path Claude Codex Gemini

All three regression tests use counts (99,999,999 and 100,000,001) far too small to trigger integer overflow on 64-bit. After fixing C1, add a test with a count large enough to confirm the division-based guard catches it:

{
    // Verify the size guard is overflow-safe.
    skipDoc:       true,
    expression:    `"ab" * 4611686018427387904`,
    expectedError: "result of repeating string (2 bytes) by 4611686018427387904 would exceed 10485760 bytes",
},

This test would have caught C1 on the current code. (important, gap)


Suggestions

S1. No boundary test at exactly 10 MiB Codex

No test verifies that a repeated string whose output is exactly 10 * 1024 * 1024 bytes succeeds while one byte over is rejected. This boundary test would confirm the guard uses > (not >=) and that the result is produced. (suggestion, gap)

S2. Grammar: "1 bytes" Claude

The format string "(%v bytes)" at line 161 produces "(1 bytes)" for single-byte strings. Since this message is primarily diagnostic, this is cosmetic. One fix: use "%v-byte" which reads correctly for all values. (suggestion, regression)


Design Observations

Strengths


Testing Assessment

Existing tests cover: negative count, count exceeding the limit for both short and long strings, and the two specific OSS-Fuzz inputs. Missing scenarios, ranked by risk:

  1. Integer overflow bypass — a count near math.MaxInt64 with a multi-byte string (see I1). This is the exact attack vector the fix targets.
  2. Boundary at the 10 MiB limit — product exactly at the limit succeeds; product one byte over is rejected (see S1).

Documentation Assessment

No documentation updates needed. The change is internal, and the test descriptions document the behaviour.


Commit Structure

Single commit for a single fix — clean and appropriate. The commit message accurately describes the change.


Agent Performance Retro

Claude

Claude identified the integer overflow as critical and, uniquely, verified it experimentally by writing and running a Go program in .scratch/ that demonstrated the overflow and resulting panic. This experimental verification elevated the finding from theoretical to proven. Claude also spotted the "1 bytes" grammar issue (S2), which no other agent raised. Traced parseInt in lib.go to confirm the input range, and searched for other strings.Repeat call sites and similar size limits in the codebase. Ran git blame 3 times to confirm regression classification.

Tool call highlights: 14 tool calls. 3 git blame invocations confirming the changed lines. 2 scratch files written and executed to reproduce the overflow experimentally — the most rigorous verification approach of the three agents.

Codex

Codex found the same overflow issue but rated it important and framed it as 32-bit-only, missing that the same overflow occurs on 64-bit platforms at larger values. Codex ran the existing test suite (go test -run TestMultiplyOperatorScenarios) and attempted a 32-bit cross-compile (GOARCH=386), which failed on the macOS host. Checked scripts/xcompile.sh to confirm the project ships a 32-bit binary. Produced a clean, well-structured report in the shortest time. Ran git blame 3 times.

Tool call highlights: 20 tool calls (highest count). Distinctive: tried running tests with GOARCH=386 to reproduce the 32-bit case, and checked xcompile.sh for build targets. Heavy use of sed -n and nl for targeted line reading.

Gemini

Gemini found the overflow and rated it critical, matching Claude's assessment. Gemini also wrote scratch programs to verify the overflow, tested parseInt behaviour, and explored a pre-existing design concern about custom tag handling in repeatString. Raised the constant-extraction suggestion (not included in consolidated findings as YAGNI for a single-use local variable). Did not run git blame, so its classification of C1 as "gap" rather than "regression" was incorrect — the code was introduced by this commit. Modified the actual test file with sed -i to test overflow values, which is unusual in a review worktree.

Tool call highlights: 18 tool calls. No git blame usage (known Gemini limitation). 4 scratch programs written and executed. Used sed -i to mutate the test file in the worktree to test overflow scenarios — creative but atypical for a read-only review.

Summary

Claude Opus 4.6 Codex GPT 5.4 Gemini 3.1 Pro
Duration167 s81 s218 s
Findings1C 0I 2S0C 1I 0S1C 1I 1S
Tool calls14 (Bash 7, Read 4, Grep 3)20 (exec 19, write_stdin 1)18 (shell 9, read 4, write 3, grep 2)
Design observations3 strengths2 strengths1 concern
False positives000
Unique insightsS2 (grammar)boundary testcustom tag concern
Files reviewed2/22/22/2
Coverage misses000
Downgraded1 (S→I)1 (I→C)0
Dropped001 (S1 constant)

Reconciliation: Codex rated the overflow as important (I1); consolidated as critical (C1) because Claude demonstrated the overflow on 64-bit, not just 32-bit. Claude rated the missing overflow test as S1 (suggestion); consolidated as I1 (important) because this test gap directly corresponds to the critical finding. Gemini classified C1 as "gap"; corrected to "regression" — the code was introduced by this commit per git blame. Gemini's S1 (extract constant) dropped — single-use local variable with an inline comment is sufficient.

All three agents independently identified the same core bug. Claude provided the strongest evidence through experimental verification. Codex was fastest and most thorough in tool usage but underrated the severity. Gemini explored the widest surface area, including pre-existing design concerns, but lacked git blame data for accurate classification.


Review Process Notes

No skill improvements or repo context updates warranted. All three agents independently found the same critical issue from the inlined diff alone — the prompt guided them effectively for this change.