Deep Review: 20260405-213746-fix-repeat-overflow
| Date | 2026-04-05 21:37 |
| Repo | mikefarah/yq |
| Round | 1 |
| Branch | fix-repeat-overflow |
| Commits | 2c37aa9d Fix panic and OOM in repeatString for large repeat counts |
| Reviewers | Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro |
| Verdict | Merge with fixes — integer overflow in the new size guard defeats the protection this commit introduces |
| Wall-clock time | 11 min 51 s |
Executive Summary ¶
This commit replaces a fixed repeat-count ceiling (10 million) in repeatString with a result-size check (10 MiB), addressing two OSS-Fuzz panic reports. The size-based approach is the right design, but the new guard multiplies len(stringNode.Value) * count as native int values. On both 64-bit and 32-bit platforms, a sufficiently large count overflows the product to a negative or small positive number, bypassing the guard and reaching strings.Repeat — the exact panic path this commit exists to eliminate.
Critical Issues ¶
len(stringNode.Value) and count are both int. The parseInt function at lib.go:176 accepts values up to math.MaxInt, so count can reach 263−1 on 64-bit platforms. Multiplying a 2-byte string length by 262 produces 263, which overflows int64 to math.MinInt64 (negative). Since negative values are always less than maxResultLen, the guard evaluates to false, and strings.Repeat panics with "strings: Repeat output length overflow" — the exact failure this commit aims to prevent.
count, err := parseInt(intNode.Value)
if err != nil {
return nil, err
} else if count < 0 {
return nil, fmt.Errorf("cannot repeat string by a negative number (%v)", count)
}
maxResultLen := 10 * 1024 * 1024 // 10 MiB
if count > 0 && len(stringNode.Value)*count > maxResultLen {
return nil, fmt.Errorf("result of repeating string (%v bytes) by %v would exceed %v bytes", len(stringNode.Value), count, maxResultLen)
}
target.Value = strings.Repeat(stringNode.Value, count)
return target, nil
Claude verified this experimentally: strLen=2, count=4611686018427387904 → product=-9223372036854775808 → guard bypassed → panic.
The same overflow occurs on 32-bit platforms (the project ships yq_windows_386.exe via scripts/xcompile.sh) at much smaller values. (critical, regression)
Fix: divide instead of multiply — division by a positive count cannot overflow:
maxResultLen := 10 * 1024 * 1024 // 10 MiB
-if count > 0 && len(stringNode.Value)*count > maxResultLen {
+if count > 0 && len(stringNode.Value) > maxResultLen/count {
return nil, fmt.Errorf("result of repeating string (%v bytes) by %v would exceed %v bytes", len(stringNode.Value), count, maxResultLen)
}
When count is enormous, maxResultLen/count truncates to 0, so any non-empty string triggers the guard. When count is 0, the left side of && short-circuits. This pattern is standard for overflow-safe size checks.
Important Issues ¶
All three regression tests use counts (99,999,999 and 100,000,001) far too small to trigger integer overflow on 64-bit. After fixing C1, add a test with a count large enough to confirm the division-based guard catches it:
{
// Verify the size guard is overflow-safe.
skipDoc: true,
expression: `"ab" * 4611686018427387904`,
expectedError: "result of repeating string (2 bytes) by 4611686018427387904 would exceed 10485760 bytes",
},
This test would have caught C1 on the current code. (important, gap)
Suggestions ¶
No test verifies that a repeated string whose output is exactly 10 * 1024 * 1024 bytes succeeds while one byte over is rejected. This boundary test would confirm the guard uses > (not >=) and that the result is produced. (suggestion, gap)
The format string "(%v bytes)" at line 161 produces "(1 bytes)" for single-byte strings. Since this message is primarily diagnostic, this is cosmetic. One fix: use "%v-byte" which reads correctly for all values. (suggestion, regression)
Design Observations ¶
Strengths
- Replacing the count-based limit with a result-size limit aligns the guard with the actual allocation risk. A 1-byte string repeated 10 million times (10 MB) and a 10-byte string repeated 1 million times (10 MB) now receive identical treatment. Claude Codex Gemini
- The regression tests link to specific OSS-Fuzz issue URLs, making future failures easy to trace back to the original reports. Claude Codex
- The old error message claimed a limit of "100 million" but the code checked
> 10000000(10 million) — a 10x discrepancy. This change eliminates that confusion. Claude
Testing Assessment ¶
Existing tests cover: negative count, count exceeding the limit for both short and long strings, and the two specific OSS-Fuzz inputs. Missing scenarios, ranked by risk:
Documentation Assessment ¶
No documentation updates needed. The change is internal, and the test descriptions document the behaviour.
Commit Structure ¶
Single commit for a single fix — clean and appropriate. The commit message accurately describes the change.
Agent Performance Retro ¶
Claude
Claude identified the integer overflow as critical and, uniquely, verified it experimentally by writing and running a Go program in .scratch/ that demonstrated the overflow and resulting panic. This experimental verification elevated the finding from theoretical to proven. Claude also spotted the "1 bytes" grammar issue (S2), which no other agent raised. Traced parseInt in lib.go to confirm the input range, and searched for other strings.Repeat call sites and similar size limits in the codebase. Ran git blame 3 times to confirm regression classification.
Tool call highlights: 14 tool calls. 3 git blame invocations confirming the changed lines. 2 scratch files written and executed to reproduce the overflow experimentally — the most rigorous verification approach of the three agents.
Codex
Codex found the same overflow issue but rated it important and framed it as 32-bit-only, missing that the same overflow occurs on 64-bit platforms at larger values. Codex ran the existing test suite (go test -run TestMultiplyOperatorScenarios) and attempted a 32-bit cross-compile (GOARCH=386), which failed on the macOS host. Checked scripts/xcompile.sh to confirm the project ships a 32-bit binary. Produced a clean, well-structured report in the shortest time. Ran git blame 3 times.
Tool call highlights: 20 tool calls (highest count). Distinctive: tried running tests with GOARCH=386 to reproduce the 32-bit case, and checked xcompile.sh for build targets. Heavy use of sed -n and nl for targeted line reading.
Gemini
Gemini found the overflow and rated it critical, matching Claude's assessment. Gemini also wrote scratch programs to verify the overflow, tested parseInt behaviour, and explored a pre-existing design concern about custom tag handling in repeatString. Raised the constant-extraction suggestion (not included in consolidated findings as YAGNI for a single-use local variable). Did not run git blame, so its classification of C1 as "gap" rather than "regression" was incorrect — the code was introduced by this commit. Modified the actual test file with sed -i to test overflow values, which is unusual in a review worktree.
Tool call highlights: 18 tool calls. No git blame usage (known Gemini limitation). 4 scratch programs written and executed. Used sed -i to mutate the test file in the worktree to test overflow scenarios — creative but atypical for a read-only review.
Summary
| Claude Opus 4.6 | Codex GPT 5.4 | Gemini 3.1 Pro | |
|---|---|---|---|
| Duration | 167 s | 81 s | 218 s |
| Findings | 1C 0I 2S | 0C 1I 0S | 1C 1I 1S |
| Tool calls | 14 (Bash 7, Read 4, Grep 3) | 20 (exec 19, write_stdin 1) | 18 (shell 9, read 4, write 3, grep 2) |
| Design observations | 3 strengths | 2 strengths | 1 concern |
| False positives | 0 | 0 | 0 |
| Unique insights | S2 (grammar) | boundary test | custom tag concern |
| Files reviewed | 2/2 | 2/2 | 2/2 |
| Coverage misses | 0 | 0 | 0 |
| Downgraded | 1 (S→I) | 1 (I→C) | 0 |
| Dropped | 0 | 0 | 1 (S1 constant) |
Reconciliation: Codex rated the overflow as important (I1); consolidated as critical (C1) because Claude demonstrated the overflow on 64-bit, not just 32-bit. Claude rated the missing overflow test as S1 (suggestion); consolidated as I1 (important) because this test gap directly corresponds to the critical finding. Gemini classified C1 as "gap"; corrected to "regression" — the code was introduced by this commit per git blame. Gemini's S1 (extract constant) dropped — single-use local variable with an inline comment is sufficient.
All three agents independently identified the same core bug. Claude provided the strongest evidence through experimental verification. Codex was fastest and most thorough in tool usage but underrated the severity. Gemini explored the widest surface area, including pre-existing design concerns, but lacked git blame data for accurate classification.
Review Process Notes ¶
No skill improvements or repo context updates warranted. All three agents independently found the same critical issue from the inlined diff alone — the prompt guided them effectively for this change.