Deep Review: 20260405-223050-fix-alias-cycle

Date	2026-04-05 22:30
Repo	mikefarah/yq
Round	1
Author	Jan Dubois <jan@jandubois.com>
Branch	`fix-alias-cycle`
Commits	`9b7ddcf3` Fix stack overflow from circular alias in traverse
Reviewers	Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro
Verdict	Merge with fixes — the same recursive alias pattern in `traverseArrayIndices` remains unfixed
Wall-clock time	10 min 44 s

Executive Summary ¶

This change replaces the recursive AliasNode case in traverse() with an iterative loop that detects alias cycles via a visited set, preventing a stack overflow from circular aliases. The fix is correct for traverse(), but a sibling function traverseArrayIndices() in the same file contains an identical recursive alias pattern that remains vulnerable — reachable through .[], .[0], and every operator built on splat().

Critical Issues ¶

None.

Important Issues ¶

I1. traverseArrayIndices retains the same recursive alias pattern Claude Codex (important, gap) ¶

traverseArrayIndices() follows aliases recursively at lines 148-150 with no cycle detection, identical to the pre-fix traverse() code. A circular alias node reached through array indexing (.[], .[0]) or any splat() caller will still crash with a stack overflow.

pkg/yqlib/operator_traverse_path.go

	switch matchingNode.Kind {
	case AliasNode:
		matchingNode = matchingNode.Alias
		return traverseArrayIndices(context, matchingNode, indicesToTraverse, prefs)
	case SequenceNode:
		return traverseArrayWithIndices(matchingNode, indicesToTraverse, prefs)
	case MappingNode:
		return traverseMapWithIndices(context, matchingNode, indicesToTraverse, prefs)
	}
	log.Debugf("OperatorArrayTraverse skipping %v as its a %v", matchingNode, matchingNode.Tag)
	return list.New(), nil
}

Codex traced the exposure through: traverseNodesWithArrayIndices (line 121), TRAVERSE_ARRAY dispatch in operation.go:176, lexer injection for array syntax in lexer.go:175-180, and splat() callers in operator_recursive_descent.go:34, operator_collect_object.go:83, operator_compare.go:140, operator_entries.go:140, operator_first.go:19, operator_map.go:13, operator_map.go:40, and operator_filter.go:14. git blame confirms these lines predate this change.

Fix: extract a shared helper that both functions call, eliminating the duplication:

+func resolveAliasChain(node *CandidateNode) (*CandidateNode, error) {
+	if node.Kind != AliasNode {
+		return node, nil
+	}
+	visited := map[*CandidateNode]bool{}
+	for node.Kind == AliasNode {
+		if visited[node] {
+			return nil, fmt.Errorf("alias cycle detected")
+		}
+		visited[node] = true
+		node = node.Alias
+	}
+	return node, nil
+}

Use resolveAliasChain at the top of both traverse() and traverseArrayIndices(), and remove the case AliasNode branch from traverseArrayIndices.

Suggestions ¶

S1. Unnecessary map allocation on the hot path Claude (suggestion, regression) ¶

traverse() runs for every node traversal. The map at line 43 allocates even when matchingNode.Kind is not AliasNode and the loop body never executes.

pkg/yqlib/operator_traverse_path.go

func traverse(context Context, matchingNode *CandidateNode, operation *Operation) (*list.List, error) {
	log.Debugf("Traversing %v", NodeToString(matchingNode))

	// Follow alias chains iteratively with cycle detection.
	visited := map[*CandidateNode]bool{}
	for matchingNode.Kind == AliasNode {
		if visited[matchingNode] {
			return nil, fmt.Errorf("alias cycle detected")
		}
		visited[matchingNode] = true
		log.Debug("its an alias!")
		matchingNode = matchingNode.Alias
	}

Fix: guard the allocation with an if check:

-	visited := map[*CandidateNode]bool{}
-	for matchingNode.Kind == AliasNode {
+	if matchingNode.Kind == AliasNode {
+		visited := map[*CandidateNode]bool{}
+		for matchingNode.Kind == AliasNode {
+			if visited[matchingNode] {
+				return nil, fmt.Errorf("alias cycle detected")
+			}
+			visited[matchingNode] = true
+			log.Debug("its an alias!")
+			matchingNode = matchingNode.Alias
+		}
 	}

This is moot if the shared resolveAliasChain helper from I1 is adopted, since the guard belongs naturally in that function.

S2. Test covers only a self-referencing cycle Claude (suggestion, gap) ¶

The test constructs a single-node cycle (A -> A) at line 693. A two-node cycle (A -> B -> A) would exercise the visited-set tracking across multiple distinct nodes, guarding against future regressions in the map logic.

pkg/yqlib/operator_traverse_path_test.go

// Regression test for https://issues.oss-fuzz.com/issues/390467412
// A circular alias (alias pointing back to itself) must not cause a
// stack overflow. traverse() should detect the cycle and return an error.
func TestTraverseAliasCycle(t *testing.T) {
	aliasNode := &CandidateNode{
		Kind: AliasNode,
	}
	aliasNode.Alias = aliasNode // circular

	op := &Operation{
		OperationType: traversePathOpType,
		Value:         "key",
		StringValue:   "key",
		Preferences:   traversePreferences{},
	}
	_, err := traverse(Context{}, aliasNode, op)
	if err == nil {
		t.Fatal("expected error for alias cycle, got nil")
	}
	if err.Error() != "alias cycle detected" {
		t.Fatalf("expected 'alias cycle detected', got %q", err.Error())
	}
}

Fix: add a second test case with a two-node cycle:

func TestTraverseAliasCycleChain(t *testing.T) {
	nodeA := &CandidateNode{Kind: AliasNode}
	nodeB := &CandidateNode{Kind: AliasNode}
	nodeA.Alias = nodeB
	nodeB.Alias = nodeA // A -> B -> A

	op := &Operation{
		OperationType: traversePathOpType,
		Value:         "key",
		StringValue:   "key",
		Preferences:   traversePreferences{},
	}
	_, err := traverse(Context{}, nodeA, op)
	if err == nil {
		t.Fatal("expected error for alias cycle chain, got nil")
	}
	if err.Error() != "alias cycle detected" {
		t.Fatalf("expected 'alias cycle detected', got %q", err.Error())
	}
}

Design Observations ¶

Concerns

(in-scope) Alias dereferencing is duplicated across traversal helpers — traverse() (lines 42-50), traverseArrayIndices() (lines 148-150), and traverseMergeAnchor() (lines 340-355). A shared resolveAliasChain helper would centralise cycle detection so future alias-handling changes need not audit every dereferencing site. This would have prevented the partial fix. Claude Codex

Strengths

Replacing recursion with an iterative loop and visited set is the right approach: bounded stack usage, O(N) memory for N-node chains, and a clean error return instead of a crash. Claude Codex
Placing alias resolution before the kind-based switch cleanly separates concerns: resolve first, then dispatch on the resolved kind. Claude

Testing Assessment ¶

The regression test directly constructs a circular alias and verifies the error — the right approach for crash prevention. Untested scenarios, ranked by risk:

Circular alias through traverseArrayIndices (the gap in I1) — same crash class, different code path, no test coverage.
Operators built on splat() with circular aliases — map, filter, first, to_entries, collect_object, compare, and recursive descent are all exposed.
Multi-node alias chain through traverse() (S2) — code handles it correctly, but no test confirms this.
The test calls traverse() directly, bypassing the parser and dispatcher, so it does not prove the public expression surface is safe.

Agent Performance Retro ¶

Claude

Claude read both changed files, grepped for AliasNode and .Alias across the codebase, ran git blame on the flagged lines, and executed the test suite to verify the tests pass. It identified the traverseArrayIndices gap (I1), the hot-path map allocation (S1), and the missing multi-node cycle test (S2). All three findings survived consolidation. Claude also read encoder_properties.go and encoder_shellvariables.go to check alias handling in encoders — thorough context exploration.

Codex

Codex performed the most thorough call-site tracing, following traverseArrayIndices through traverseNodesWithArrayIndices, the TRAVERSE_ARRAY operator dispatch, the lexer, and all splat() callers across 10+ operator files. It ran git blame three times and used rg extensively (7 searches). Its I1 finding was broader than Claude's, enumerating every exposed operator. It produced no suggestions beyond the shared finding.

Gemini

Gemini hit a TerminalQuotaError (429, QUOTA_EXHAUSTED) on first API call and produced no output. Dropped from the review.

Tool call highlights

Claude ran git blame twice and executed go test — the only agent to verify tests pass. It used 3 grep searches to find all alias-related patterns.
Codex ran 36 tool calls (all exec_command), including 7 rg searches and 3 git blame calls. The high call count reflects thorough call-site tracing rather than thrashing — each search targeted a different pattern.
Gemini made no tool calls due to quota exhaustion.

Summary

	Claude Opus 4.6	Codex GPT 5.4	Gemini 3.1 Pro
Duration	165 s	195 s	115 s (failed)
Findings	0C 1I 2S	0C 1I 0S	—
Tool calls	21 (Bash 10, Read 8, Grep 3)	36 (exec_command 36)	0
Design observations	2	2	—
False positives	0	0	—
Unique insights	S1, S2	splat() caller enumeration	—
Files reviewed	2/2	2/2	—
Coverage misses	0	0	—

Reconciliation: No severity changes during consolidation. Both agents rated the traverseArrayIndices gap as important; the consolidated I1 merges their findings (Claude identified the gap, Codex added the full call-site trace). Claude's S1 and S2 passed through as-is. No false positives, no downgrades, no drops.

Review Process Notes ¶

Repo context updates

yq's alias dereferencing occurs in multiple functions across operator_traverse_path.go (at least traverse, traverseArrayIndices, and merge handling). Future reviews of alias-related changes should verify that all dereferencing sites are handled consistently.