Deep Review: 20260405-223050-fix-alias-cycle
| Date | 2026-04-05 22:30 |
| Repo | mikefarah/yq |
| Round | 1 |
| Author | Jan Dubois <jan@jandubois.com> |
| Branch | fix-alias-cycle |
| Commits | 9b7ddcf3 Fix stack overflow from circular alias in traverse |
| Reviewers | Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro |
| Verdict | Merge with fixes — the same recursive alias pattern in traverseArrayIndices remains unfixed |
| Wall-clock time | 10 min 44 s |
Executive Summary ¶
This change replaces the recursive AliasNode case in traverse() with an iterative loop that detects alias cycles via a visited set, preventing a stack overflow from circular aliases. The fix is correct for traverse(), but a sibling function traverseArrayIndices() in the same file contains an identical recursive alias pattern that remains vulnerable — reachable through .[], .[0], and every operator built on splat().
Critical Issues ¶
None.
Important Issues ¶
traverseArrayIndices() follows aliases recursively at lines 148-150 with no cycle detection, identical to the pre-fix traverse() code. A circular alias node reached through array indexing (.[], .[0]) or any splat() caller will still crash with a stack overflow.
switch matchingNode.Kind {
case AliasNode:
matchingNode = matchingNode.Alias
return traverseArrayIndices(context, matchingNode, indicesToTraverse, prefs)
case SequenceNode:
return traverseArrayWithIndices(matchingNode, indicesToTraverse, prefs)
case MappingNode:
return traverseMapWithIndices(context, matchingNode, indicesToTraverse, prefs)
}
log.Debugf("OperatorArrayTraverse skipping %v as its a %v", matchingNode, matchingNode.Tag)
return list.New(), nil
}
Codex traced the exposure through: traverseNodesWithArrayIndices (line 121), TRAVERSE_ARRAY dispatch in operation.go:176, lexer injection for array syntax in lexer.go:175-180, and splat() callers in operator_recursive_descent.go:34, operator_collect_object.go:83, operator_compare.go:140, operator_entries.go:140, operator_first.go:19, operator_map.go:13, operator_map.go:40, and operator_filter.go:14. git blame confirms these lines predate this change.
Fix: extract a shared helper that both functions call, eliminating the duplication:
+func resolveAliasChain(node *CandidateNode) (*CandidateNode, error) {
+ if node.Kind != AliasNode {
+ return node, nil
+ }
+ visited := map[*CandidateNode]bool{}
+ for node.Kind == AliasNode {
+ if visited[node] {
+ return nil, fmt.Errorf("alias cycle detected")
+ }
+ visited[node] = true
+ node = node.Alias
+ }
+ return node, nil
+}
Use resolveAliasChain at the top of both traverse() and traverseArrayIndices(), and remove the case AliasNode branch from traverseArrayIndices.
Suggestions ¶
traverse() runs for every node traversal. The map at line 43 allocates even when matchingNode.Kind is not AliasNode and the loop body never executes.
func traverse(context Context, matchingNode *CandidateNode, operation *Operation) (*list.List, error) {
log.Debugf("Traversing %v", NodeToString(matchingNode))
// Follow alias chains iteratively with cycle detection.
visited := map[*CandidateNode]bool{}
for matchingNode.Kind == AliasNode {
if visited[matchingNode] {
return nil, fmt.Errorf("alias cycle detected")
}
visited[matchingNode] = true
log.Debug("its an alias!")
matchingNode = matchingNode.Alias
}
Fix: guard the allocation with an if check:
- visited := map[*CandidateNode]bool{}
- for matchingNode.Kind == AliasNode {
+ if matchingNode.Kind == AliasNode {
+ visited := map[*CandidateNode]bool{}
+ for matchingNode.Kind == AliasNode {
+ if visited[matchingNode] {
+ return nil, fmt.Errorf("alias cycle detected")
+ }
+ visited[matchingNode] = true
+ log.Debug("its an alias!")
+ matchingNode = matchingNode.Alias
+ }
}
This is moot if the shared resolveAliasChain helper from I1 is adopted, since the guard belongs naturally in that function.
The test constructs a single-node cycle (A -> A) at line 693. A two-node cycle (A -> B -> A) would exercise the visited-set tracking across multiple distinct nodes, guarding against future regressions in the map logic.
// Regression test for https://issues.oss-fuzz.com/issues/390467412
// A circular alias (alias pointing back to itself) must not cause a
// stack overflow. traverse() should detect the cycle and return an error.
func TestTraverseAliasCycle(t *testing.T) {
aliasNode := &CandidateNode{
Kind: AliasNode,
}
aliasNode.Alias = aliasNode // circular
op := &Operation{
OperationType: traversePathOpType,
Value: "key",
StringValue: "key",
Preferences: traversePreferences{},
}
_, err := traverse(Context{}, aliasNode, op)
if err == nil {
t.Fatal("expected error for alias cycle, got nil")
}
if err.Error() != "alias cycle detected" {
t.Fatalf("expected 'alias cycle detected', got %q", err.Error())
}
}
Fix: add a second test case with a two-node cycle:
func TestTraverseAliasCycleChain(t *testing.T) {
nodeA := &CandidateNode{Kind: AliasNode}
nodeB := &CandidateNode{Kind: AliasNode}
nodeA.Alias = nodeB
nodeB.Alias = nodeA // A -> B -> A
op := &Operation{
OperationType: traversePathOpType,
Value: "key",
StringValue: "key",
Preferences: traversePreferences{},
}
_, err := traverse(Context{}, nodeA, op)
if err == nil {
t.Fatal("expected error for alias cycle chain, got nil")
}
if err.Error() != "alias cycle detected" {
t.Fatalf("expected 'alias cycle detected', got %q", err.Error())
}
}
Design Observations ¶
Concerns
- (in-scope) Alias dereferencing is duplicated across traversal helpers —
traverse()(lines 42-50),traverseArrayIndices()(lines 148-150), andtraverseMergeAnchor()(lines 340-355). A sharedresolveAliasChainhelper would centralise cycle detection so future alias-handling changes need not audit every dereferencing site. This would have prevented the partial fix. Claude Codex
Strengths
- Replacing recursion with an iterative loop and visited set is the right approach: bounded stack usage, O(N) memory for N-node chains, and a clean error return instead of a crash. Claude Codex
- Placing alias resolution before the kind-based switch cleanly separates concerns: resolve first, then dispatch on the resolved kind. Claude
Testing Assessment ¶
The regression test directly constructs a circular alias and verifies the error — the right approach for crash prevention. Untested scenarios, ranked by risk:
- Circular alias through
traverseArrayIndices(the gap in I1) — same crash class, different code path, no test coverage. - Operators built on
splat()with circular aliases —map,filter,first,to_entries,collect_object,compare, and recursive descent are all exposed. - Multi-node alias chain through
traverse()(S2) — code handles it correctly, but no test confirms this. - The test calls
traverse()directly, bypassing the parser and dispatcher, so it does not prove the public expression surface is safe.
Agent Performance Retro ¶
Claude
Claude read both changed files, grepped for AliasNode and .Alias across the codebase, ran git blame on the flagged lines, and executed the test suite to verify the tests pass. It identified the traverseArrayIndices gap (I1), the hot-path map allocation (S1), and the missing multi-node cycle test (S2). All three findings survived consolidation. Claude also read encoder_properties.go and encoder_shellvariables.go to check alias handling in encoders — thorough context exploration.
Codex
Codex performed the most thorough call-site tracing, following traverseArrayIndices through traverseNodesWithArrayIndices, the TRAVERSE_ARRAY operator dispatch, the lexer, and all splat() callers across 10+ operator files. It ran git blame three times and used rg extensively (7 searches). Its I1 finding was broader than Claude's, enumerating every exposed operator. It produced no suggestions beyond the shared finding.
Gemini
Gemini hit a TerminalQuotaError (429, QUOTA_EXHAUSTED) on first API call and produced no output. Dropped from the review.
Tool call highlights
- Claude ran
git blametwice and executedgo test— the only agent to verify tests pass. It used 3 grep searches to find all alias-related patterns. - Codex ran 36 tool calls (all
exec_command), including 7rgsearches and 3git blamecalls. The high call count reflects thorough call-site tracing rather than thrashing — each search targeted a different pattern. - Gemini made no tool calls due to quota exhaustion.
Summary
| Claude Opus 4.6 | Codex GPT 5.4 | Gemini 3.1 Pro | |
|---|---|---|---|
| Duration | 165 s | 195 s | 115 s (failed) |
| Findings | 0C 1I 2S | 0C 1I 0S | — |
| Tool calls | 21 (Bash 10, Read 8, Grep 3) | 36 (exec_command 36) | 0 |
| Design observations | 2 | 2 | — |
| False positives | 0 | 0 | — |
| Unique insights | S1, S2 | splat() caller enumeration | — |
| Files reviewed | 2/2 | 2/2 | — |
| Coverage misses | 0 | 0 | — |
Reconciliation: No severity changes during consolidation. Both agents rated the traverseArrayIndices gap as important; the consolidated I1 merges their findings (Claude identified the gap, Codex added the full call-site trace). Claude's S1 and S2 passed through as-is. No false positives, no downgrades, no drops.
Review Process Notes ¶
Repo context updates
- yq's alias dereferencing occurs in multiple functions across
operator_traverse_path.go(at leasttraverse,traverseArrayIndices, and merge handling). Future reviews of alias-related changes should verify that all dereferencing sites are handled consistently.