Deep Review: 20260405-223050-fix-alias-cycle

Date2026-04-05 22:30
Repomikefarah/yq
Round1
AuthorJan Dubois <jan@jandubois.com>
Branchfix-alias-cycle
Commits9b7ddcf3 Fix stack overflow from circular alias in traverse
ReviewersClaude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro
VerdictMerge with fixes — the same recursive alias pattern in traverseArrayIndices remains unfixed
Wall-clock time10 min 44 s

Executive Summary

This change replaces the recursive AliasNode case in traverse() with an iterative loop that detects alias cycles via a visited set, preventing a stack overflow from circular aliases. The fix is correct for traverse(), but a sibling function traverseArrayIndices() in the same file contains an identical recursive alias pattern that remains vulnerable — reachable through .[], .[0], and every operator built on splat().


Critical Issues

None.


Important Issues

I1. traverseArrayIndices retains the same recursive alias pattern Claude Codex (important, gap)

traverseArrayIndices() follows aliases recursively at lines 148-150 with no cycle detection, identical to the pre-fix traverse() code. A circular alias node reached through array indexing (.[], .[0]) or any splat() caller will still crash with a stack overflow.

	switch matchingNode.Kind {
	case AliasNode:
		matchingNode = matchingNode.Alias
		return traverseArrayIndices(context, matchingNode, indicesToTraverse, prefs)
	case SequenceNode:
		return traverseArrayWithIndices(matchingNode, indicesToTraverse, prefs)
	case MappingNode:
		return traverseMapWithIndices(context, matchingNode, indicesToTraverse, prefs)
	}
	log.Debugf("OperatorArrayTraverse skipping %v as its a %v", matchingNode, matchingNode.Tag)
	return list.New(), nil
}

Codex traced the exposure through: traverseNodesWithArrayIndices (line 121), TRAVERSE_ARRAY dispatch in operation.go:176, lexer injection for array syntax in lexer.go:175-180, and splat() callers in operator_recursive_descent.go:34, operator_collect_object.go:83, operator_compare.go:140, operator_entries.go:140, operator_first.go:19, operator_map.go:13, operator_map.go:40, and operator_filter.go:14. git blame confirms these lines predate this change.

Fix: extract a shared helper that both functions call, eliminating the duplication:

+func resolveAliasChain(node *CandidateNode) (*CandidateNode, error) {
+	if node.Kind != AliasNode {
+		return node, nil
+	}
+	visited := map[*CandidateNode]bool{}
+	for node.Kind == AliasNode {
+		if visited[node] {
+			return nil, fmt.Errorf("alias cycle detected")
+		}
+		visited[node] = true
+		node = node.Alias
+	}
+	return node, nil
+}

Use resolveAliasChain at the top of both traverse() and traverseArrayIndices(), and remove the case AliasNode branch from traverseArrayIndices.


Suggestions

S1. Unnecessary map allocation on the hot path Claude (suggestion, regression)

traverse() runs for every node traversal. The map at line 43 allocates even when matchingNode.Kind is not AliasNode and the loop body never executes.

func traverse(context Context, matchingNode *CandidateNode, operation *Operation) (*list.List, error) {
	log.Debugf("Traversing %v", NodeToString(matchingNode))

	// Follow alias chains iteratively with cycle detection.
	visited := map[*CandidateNode]bool{}
	for matchingNode.Kind == AliasNode {
		if visited[matchingNode] {
			return nil, fmt.Errorf("alias cycle detected")
		}
		visited[matchingNode] = true
		log.Debug("its an alias!")
		matchingNode = matchingNode.Alias
	}

Fix: guard the allocation with an if check:

-	visited := map[*CandidateNode]bool{}
-	for matchingNode.Kind == AliasNode {
+	if matchingNode.Kind == AliasNode {
+		visited := map[*CandidateNode]bool{}
+		for matchingNode.Kind == AliasNode {
+			if visited[matchingNode] {
+				return nil, fmt.Errorf("alias cycle detected")
+			}
+			visited[matchingNode] = true
+			log.Debug("its an alias!")
+			matchingNode = matchingNode.Alias
+		}
 	}

This is moot if the shared resolveAliasChain helper from I1 is adopted, since the guard belongs naturally in that function.

S2. Test covers only a self-referencing cycle Claude (suggestion, gap)

The test constructs a single-node cycle (A -> A) at line 693. A two-node cycle (A -> B -> A) would exercise the visited-set tracking across multiple distinct nodes, guarding against future regressions in the map logic.

// Regression test for https://issues.oss-fuzz.com/issues/390467412
// A circular alias (alias pointing back to itself) must not cause a
// stack overflow. traverse() should detect the cycle and return an error.
func TestTraverseAliasCycle(t *testing.T) {
	aliasNode := &CandidateNode{
		Kind: AliasNode,
	}
	aliasNode.Alias = aliasNode // circular

	op := &Operation{
		OperationType: traversePathOpType,
		Value:         "key",
		StringValue:   "key",
		Preferences:   traversePreferences{},
	}
	_, err := traverse(Context{}, aliasNode, op)
	if err == nil {
		t.Fatal("expected error for alias cycle, got nil")
	}
	if err.Error() != "alias cycle detected" {
		t.Fatalf("expected 'alias cycle detected', got %q", err.Error())
	}
}

Fix: add a second test case with a two-node cycle:

func TestTraverseAliasCycleChain(t *testing.T) {
	nodeA := &CandidateNode{Kind: AliasNode}
	nodeB := &CandidateNode{Kind: AliasNode}
	nodeA.Alias = nodeB
	nodeB.Alias = nodeA // A -> B -> A

	op := &Operation{
		OperationType: traversePathOpType,
		Value:         "key",
		StringValue:   "key",
		Preferences:   traversePreferences{},
	}
	_, err := traverse(Context{}, nodeA, op)
	if err == nil {
		t.Fatal("expected error for alias cycle chain, got nil")
	}
	if err.Error() != "alias cycle detected" {
		t.Fatalf("expected 'alias cycle detected', got %q", err.Error())
	}
}

Design Observations

Concerns

  1. (in-scope) Alias dereferencing is duplicated across traversal helpers — traverse() (lines 42-50), traverseArrayIndices() (lines 148-150), and traverseMergeAnchor() (lines 340-355). A shared resolveAliasChain helper would centralise cycle detection so future alias-handling changes need not audit every dereferencing site. This would have prevented the partial fix. Claude Codex

Strengths

  1. Replacing recursion with an iterative loop and visited set is the right approach: bounded stack usage, O(N) memory for N-node chains, and a clean error return instead of a crash. Claude Codex
  2. Placing alias resolution before the kind-based switch cleanly separates concerns: resolve first, then dispatch on the resolved kind. Claude

Testing Assessment

The regression test directly constructs a circular alias and verifies the error — the right approach for crash prevention. Untested scenarios, ranked by risk:

  1. Circular alias through traverseArrayIndices (the gap in I1) — same crash class, different code path, no test coverage.
  2. Operators built on splat() with circular aliases — map, filter, first, to_entries, collect_object, compare, and recursive descent are all exposed.
  3. Multi-node alias chain through traverse() (S2) — code handles it correctly, but no test confirms this.
  4. The test calls traverse() directly, bypassing the parser and dispatcher, so it does not prove the public expression surface is safe.

Agent Performance Retro

Claude

Claude read both changed files, grepped for AliasNode and .Alias across the codebase, ran git blame on the flagged lines, and executed the test suite to verify the tests pass. It identified the traverseArrayIndices gap (I1), the hot-path map allocation (S1), and the missing multi-node cycle test (S2). All three findings survived consolidation. Claude also read encoder_properties.go and encoder_shellvariables.go to check alias handling in encoders — thorough context exploration.

Codex

Codex performed the most thorough call-site tracing, following traverseArrayIndices through traverseNodesWithArrayIndices, the TRAVERSE_ARRAY operator dispatch, the lexer, and all splat() callers across 10+ operator files. It ran git blame three times and used rg extensively (7 searches). Its I1 finding was broader than Claude's, enumerating every exposed operator. It produced no suggestions beyond the shared finding.

Gemini

Gemini hit a TerminalQuotaError (429, QUOTA_EXHAUSTED) on first API call and produced no output. Dropped from the review.

Tool call highlights

Summary

Claude Opus 4.6 Codex GPT 5.4 Gemini 3.1 Pro
Duration165 s195 s115 s (failed)
Findings0C 1I 2S0C 1I 0S
Tool calls21 (Bash 10, Read 8, Grep 3)36 (exec_command 36)0
Design observations22
False positives00
Unique insightsS1, S2splat() caller enumeration
Files reviewed2/22/2
Coverage misses00

Reconciliation: No severity changes during consolidation. Both agents rated the traverseArrayIndices gap as important; the consolidated I1 merges their findings (Claude identified the gap, Codex added the full call-site trace). Claude's S1 and S2 passed through as-is. No false positives, no downgrades, no drops.


Review Process Notes

Repo context updates