# Deep Review: 20260405-232458-pr-2639 | | | |---|---| | **Date** | 2026-04-05 23:24 | | **Repo** | [mikefarah/yq](https://github.com/mikefarah/yq) | | **Round** | 1 | | **Mode** | Thorough (3 passes) | | **Author** | @copilot-swe-agent (bot) | | **PR** | [#2639](https://github.com/mikefarah/yq/pull/2639) — Add string slicing support | | **Branch** | `copilot/add-string-slice-support` | | **Commits** | `341e2524` Fix array slice out-of-bounds panic with very negative indices
`778088d7` Update pkg/yqlib/operator_slice.go
`9a9399ad` Fix sliceStringNode signature and fix test descriptions/expressions
`6d345ac7` Add string slicing support to yq
`26929879` Initial plan | | **Reviewers** | Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro | | **Verdict** | **Merge with fixes** — fix byte-vs-rune length inconsistency and add missing string edge-case tests | | **Wall-clock time** | `54 min 32 s` | --- ## Executive Summary This PR adds rune-based string slicing to yq (`.[0:5]`, `.[4:]`, `.[-5:]` on strings), fixes `traverseArrayOperator` to evaluate the LHS before slicing, and hardens array slicing against very negative indices. The string slicing implementation is correct for the common case, but it exposes a byte-vs-rune inconsistency with existing operators (`length`, `match`) that produces silently wrong results for multibyte Unicode strings. The PR also needs stronger edge-case test coverage and minor structural fixes. --- ## Critical Issues None. --- ## Important Issues I1. **Byte-vs-rune length inconsistency produces wrong results for multibyte strings** — `pkg/yqlib/operator_slice.go:20`, `pkg/yqlib/operator_length.go:20` [Claude Opus 4.6, Gemini 3.1 Pro] ```go // operator_length.go:20 — returns byte count length = len(candidate.Value) // "café" → 5 // operator_slice.go:20 — uses rune count runes := []rune(lhsNode.Value) // "café" → 4 runes length := len(runes) ``` The `length` operator returns byte count, but `sliceStringNode` uses rune count. For multibyte strings, `length`-based arithmetic produces wrong slice results. This is not merely masked by clamping — it actively produces incorrect output: ``` $ echo '"café"' | yq '(. | length) as $n | .[($n - 2):]' é # WRONG — expected "fé" (last 2 chars) $ echo '"café"' | yq '.[-2:]' fé # correct with negative index ``` The `match` operator's byte offsets at `operator_strings.go:292` have the same issue: feeding `match("latte").offset` into a slice on a multibyte string yields wrong boundaries. jq is internally consistent: `"café" | length` returns 4 (characters), and slicing is character-based. This PR correctly chose rune-based slicing but did not update the pre-existing byte-based `length` operator. (important, gap) Fix: Update `lengthOperator` to use `len([]rune(candidate.Value))` for non-null scalars, and update `addMatch` to convert byte offsets to rune offsets via `utf8.RuneCountInString`. Note: changing `length` is a behavioural change for multibyte strings and warrants a changelog entry. ```diff // operator_length.go case ScalarNode: if candidate.Tag == "!!null" { length = 0 } else { - length = len(candidate.Value) + length = len([]rune(candidate.Value)) } ``` --- I2. **Inconsistent index-clamping structure between string and array paths** — `pkg/yqlib/operator_slice.go:31-36` vs `89-97` [Claude Opus 4.6, Gemini 3.1 Pro] ```go // String path (line 31-36) — no explicit clamp to 0 if relativeSecondNumber < 0 { relativeSecondNumber = length + secondNumber } else if relativeSecondNumber > length { relativeSecondNumber = length } ``` The string path uses `else if`, so a very negative `secondNumber` that remains negative after adjustment never gets clamped to 0 explicitly. It works only because the later `secondNumber < firstNumber` guard at line 43 catches it. The array path (line 89-97) uses separate `if` blocks that handle this directly. (important, regression) Fix: Align the string path with the array path: ```diff relativeSecondNumber := secondNumber if relativeSecondNumber < 0 { relativeSecondNumber = length + secondNumber - } else if relativeSecondNumber > length { + } + if relativeSecondNumber < 0 { + relativeSecondNumber = 0 + } else if relativeSecondNumber > length { relativeSecondNumber = length } ``` --- I3. **Missing test coverage for string edge cases tested for arrays** — `pkg/yqlib/operator_slice_test.go` [Claude Opus 4.6, Codex GPT 5.4] The PR added array tests for very negative indices (`.[-100:]`, `.[:-100]`) but no corresponding string tests. Also missing: Unicode strings with implicit end index, empty string slicing, and non-string scalar slicing. Untested scenarios ranked by risk: 1. Unicode string with implicit end index (`.greeting[2:]` on `"héllo"`) — exercises the byte-vs-rune mismatch 2. Very negative string start (`.country[-100:]`) — tests clamping parity with arrays 3. Very negative string end (`.country[:-100]`) — tests clamping parity with arrays 4. Empty string slicing (`.val[0:5]` on `""`) 5. Non-string scalar slicing (`.value[0:2]` on `!!int`) (important, gap) Fix: Add test scenarios with `skipDoc: true` for each of these cases. --- I4. **Slicing non-string scalars produces invalid tagged output** — `pkg/yqlib/operator_slice.go:74-78` [Claude Opus 4.6] (pass 2) ```go if lhsNode.Kind == ScalarNode && lhsNode.guessTagFromCustomType() == "!!str" { results.PushBack(sliceStringNode(lhsNode, firstNumber, secondNumber)) continue } // ... falls through to array path for !!int, !!bool, !!float ... ``` When `.value[0:2]` targets a `!!int` scalar, the string guard correctly rejects it, but the code falls through to the array path. Since `Content` is nil for scalar nodes, the function returns an empty sequence tagged with the scalar's type (e.g., `!!int []`). (important, gap) Fix: Add an explicit guard before the array fallthrough: ```diff if lhsNode.Kind == ScalarNode && lhsNode.guessTagFromCustomType() == "!!str" { results.PushBack(sliceStringNode(lhsNode, firstNumber, secondNumber)) continue } +if lhsNode.Kind == ScalarNode { + return Context{}, fmt.Errorf("cannot slice scalar of type %v, only arrays and strings can be sliced", lhsNode.Tag) +} ``` --- I5. **Slice index expressions evaluated against LHS instead of original context** — `pkg/yqlib/operator_traverse_path.go:82-86`, `pkg/yqlib/operator_slice.go:64-69` [Gemini 3.1 Pro] (pass 2) ```go // operator_traverse_path.go — passes lhsContext to sliceArrayOperator lhsContext, err := d.GetMatchingNodes(context, expressionNode.LHS) return sliceArrayOperator(d, lhsContext, expressionNode.RHS.RHS) ``` `sliceArrayOperator` uses `context` to evaluate index expressions via `getSliceNumber`. After the PR, `context` is `lhsContext` (the sliced node), not the original document context. For expressions like `.a[0:.max]`, `.max` is now evaluated against `.a` instead of the root — producing `"cannot index array with 'max'"`. Before the PR, the same expression sliced the *wrong target* (root document instead of `.a`) — so this was never correct either. The PR fixed one half (correct target) without fixing the other (correct index context). (important, gap) Fix: Pass both the original context (for evaluating indices) and the LHS context (for providing nodes) to `sliceArrayOperator`. One approach: evaluate indices in `traverseArrayOperator` before calling `sliceArrayOperator`, then pass the resolved integers. --- ## Suggestions S1. **Commit structure needs cleanup before merge** — [Claude Opus 4.6] The 5 commits include an empty "Initial plan" commit, a lint-fix commit, and a panic-fix commit that should all be squashed into the main feature commit before merging. `778088d7` (Mike Farah's direct edit) can remain separate. S2. **Lexer `openCollect` change affects standalone `[: expr]`** — `pkg/yqlib/lexer.go:130` [Claude Opus 4.6, Gemini 3.1 Pro] (pass 2) ```go if index > 0 && (tokens[index-1].TokenType == traverseArrayCollect || tokens[index-1].TokenType == openCollect) { ``` The added `openCollect` condition correctly enables `.field[:N]` but also fires on standalone `[` (array literal start). `[: "b"]` now silently produces `[{0: b}]` instead of an error. Unlikely to affect real users, but broadens accepted syntax unintentionally. (suggestion, gap) Fix: Tighten the condition to require that `openCollect` follows a traversal token. S3. **Doc header "Splice" misleading for strings** — `pkg/yqlib/doc/operators/headers/slice-array.md:1` [Claude Opus 4.6] (pass 2) The header was updated from "Slice/Splice Array" to "Slice/Splice Array or String". "Splice" implies in-place mutation, which is not supported for strings. Consider "Slice Array or String" instead. S4. **Documentation uses only ASCII examples** — `pkg/yqlib/doc/operators/slice-array.md` [Claude Opus 4.6] (pass 3) All four string slicing doc sections use `country: Australia` (ASCII-only). Since the feature's distinguishing value is rune-based Unicode handling, one documented example with a multibyte string would demonstrate this. --- ## Design Observations **Concerns** - The string/array type dispatch at `operator_slice.go:74` uses a "check-and-continue" pattern that leaves all non-`!!str` scalar types falling silently into array logic. Every new scalar type yq supports in the future will produce invalid output. A guard-and-error before the array path (I4) would make the function fail-safe by default. [Claude Opus 4.6] (in-scope) - The `length` operator returns byte count for strings while the new slicing uses rune count. This semantic split means users cannot predict which convention a given string operator uses. Fixing `length` and `match` offsets to use runes as part of this PR would make string operations coherent. Deferring the fix means every new string operator must decide "bytes or runes?" independently. [Claude Opus 4.6, Gemini 3.1 Pro] (in-scope) - Duplicated index-normalization logic between `sliceStringNode` and the array path in `sliceArrayOperator` invites divergence. A shared `normalizeSliceIndices(first, second, length int) (int, int)` helper would eliminate the inconsistency. [Claude Opus 4.6] (in-scope) **Strengths** - Using `[]rune` for string slicing correctly handles Unicode. This aligns with jq's character-based string indexing. - The `traverseArrayOperator` fix (evaluating LHS before calling `sliceArrayOperator`) solves the root cause of `.field[N:M]` operating on the wrong node. - The lexer change (adding `openCollect` to the implicit-0 check) is well-scoped and necessary for `.field[:N]` syntax. - `CreateReplacement` preserves the original scalar's tag and style, avoiding quoting/formatting surprises. --- ## Testing Assessment Test coverage for core string slicing (both ends, negative indices, out-of-bounds, Unicode, full-string `[:]`) is solid. Gaps ranked by risk: 1. Unicode string with implicit end index (`.greeting[2:]`) — exercises byte-vs-rune mismatch path 2. Non-string scalar slicing (`.value[0:2]` on `!!int`) — produces invalid output (I4) 3. Very negative string indices (`.country[-100:]`, `.country[:-100]`) — tests clamping parity 4. Empty string slicing — boundary condition for zero-length rune slices 5. Expression-based slice bounds (`.a[0:.max]`) — exercises the context evaluation issue (I5) --- ## Documentation Assessment The generated `slice-array.md` documentation correctly demonstrates string slicing examples. The header update to "Slice/Splice Array or String" is accurate for slicing but misleading for the "Splice" part regarding strings (S3). All examples use ASCII-only strings (S4). --- ## Commit Structure - `26929879 Initial plan` — empty commit with no file changes; should be dropped. - `9a9399ad` — fixes lint error and test mistakes from the main feature commit; should be squashed into `6d345ac7`. - `341e2524` — fixes panic from the main feature commit; should be squashed into `6d345ac7`. - `778088d7` — Mike Farah's direct edit (tag/style preservation); reasonable as a separate commit. --- ## Agent Performance Retro ### [Claude] Claude produced the most findings across all three passes and demonstrated the deepest analysis. In pass 1, it identified the byte-vs-rune length mismatch (I1), the inconsistent clamping logic (I2), and the test coverage gaps (I3). In pass 2, it found the non-string scalar fallthrough (I4) and the lexer side-effect (S2). In pass 3, it deepened I1 with concrete reproduction cases showing that the mismatch produces wrong output (not just a theoretical concern), and added documentation suggestions (S4). Claude ran `git blame` 4 times in pass 1 to verify regression classification. It read 10 files and used Grep 7 times to trace call sites, demonstrating thorough cross-file analysis. **Follow-up pass performance:** Pass 2 produced 1 important + 2 suggestions (all new). Pass 3 produced 1 important (restated with stronger evidence) + 2 suggestions (new). Pass 3's key contribution was proving I1 causes real data corruption, not just a theoretical concern. ### [Codex] Codex found no issues at any severity level. It reviewed all 7 files and concluded "no correctness regressions in the shipped changes." Its testing assessment correctly noted the Unicode implicit-end gap and expression-valued bounds gap, but it did not promote these to findings. Codex ran 40 tool calls but relied heavily on `nl` (12 calls) and `printf` (10) for file reading rather than targeted analysis. It ran `git blame` only once. **Coverage miss:** Codex marked `operator_slice.go` as "Reviewed, no issues" — but Claude and Gemini both found important issues there (I1, I2, I4). Dropped after pass 1 (0 findings). ### [Gemini] Gemini found the byte-vs-rune length mismatch (I1, shared with Claude) and the clamping inconsistency (S1, shared with Claude at different severity) in pass 1. In pass 2, it identified the context evaluation issue (I5) and independently found the lexer side-effect (S2, shared with Claude). Pass 3 explored operator interactions (assignment, deletion, maps, null indices) but these were pre-existing gaps outside the PR scope. Gemini never ran `git blame` (known limitation due to quota constraints). It made 37 tool calls in pass 1, focusing on `grep_search` (9) and shell commands (23). **Follow-up pass performance:** Pass 2 produced 1 critical (downgraded to I5) + 1 important (deduplicated to S2). Pass 3 produced 4 importants, but 1 was a duplicate and the rest were pre-existing gaps — downgraded to suggestions or dropped during consolidation. ### Tool call highlights - **Claude** ran `git blame` consistently (4 times P1, 3 times P2, 5 times P3) — the only agent to systematically verify regression classification. - **Codex** never searched code with grep/rg — it relied entirely on sequential file reading, which may explain why it missed cross-file issues like the `length` operator inconsistency. - **Gemini** never ran `git blame` (known quota limitation). In pass 3, it used 72 tool calls — the highest of any agent/pass — but most findings were pre-existing gaps. ### Summary | | Claude Opus 4.6 | Codex GPT 5.4 | Gemini 3.1 Pro | |---|---|---|---| | Duration (pass 1) | 462s | 217s | 470s | | Findings (pass 1) | 0C 3I 1S | 0C 0I 0S | 0C 1I 1S | | Tool calls (pass 1) | 24 (Read 10, Grep 7) | 40 (exec 33, write 7) | 37 (shell 23, grep 9) | | Duration (pass 2) | 781s | — | 484s | | Findings (pass 2) | 0C 1I 2S | — | 0C 1I 0S + 1 dup | | Tool calls (pass 2) | 58 | — | 49 | | Duration (pass 3) | 895s | — | 623s | | Findings (pass 3) | 0C 0I 2S + 1 restated | — | 0C 0I 0S + 3 dropped + 1 dup | | Tool calls (pass 3) | 69 | — | 72 | | **Totals** | 0C 4I 5S | 0C 0I 0S | 0C 2I 1S | | Design observations | 3 concerns, 4 strengths | 0 concerns, 3 strengths | 1 concern, 2 strengths | | False positives | 0 | 0 | 0 | | Unique insights | I2, I3, I4, S1, S3, S4 | 0 | I5 | | Files reviewed | 7 | 7 | 7 | | Coverage misses | 0 | 1 (operator_slice.go) | 0 | | Downgraded | 0 | 0 | 1 (C→I: context evaluation) | | Dropped | 0 | 0 | 4 (pass 3 pre-existing gaps) | | Severity inflation | 0 | 0 | 1 (C1 pass 2) | **Reconciliation:** Gemini P2 C1 (context evaluation): critical → important I5 (pre-existing limitation, not a regression — old code also didn't handle this case correctly). Gemini P3 I1 (splice assignment): dropped (pre-existing "Splice" feature never existed, already covered by S3). Gemini P3 I2 (del on slice): dropped (pre-existing behaviour unchanged by this PR). Gemini P3 I3 (slicing maps): dropped (duplicate of I4 non-string scalar handling). Gemini P3 I4 (null indices): dropped (pre-existing gap, not related to this change). Claude provided the most value overall, identifying 4 of 5 important findings and all 4 suggestions. Gemini's unique contribution was I5 (context evaluation). Codex provided no actionable findings. --- ## Review Process Notes ### Repo context updates - yq's `length` operator returns byte count for strings (`len(candidate.Value)`), not rune/character count. String slicing uses rune count. Future reviews of string-related operators should check for byte-vs-rune consistency. - The `sliceArrayOperator` function handles both arrays and strings. When reviewing changes to it, check that all scalar types (not just `!!str`) are accounted for — non-string scalars fall through to array logic and produce invalid output. --- ## Appendix: Original Reviews ### [Claude] — Pass 1 ````markdown --- ## Deep Code Review: PR #2639 — Add string slicing support | Attribute | Value | |-----------|-------| | **PR** | mikefarah/yq#2639 | | **Author** | copilot-swe-agent (bot) | | **Verdict** | Approve with suggestions — no correctness bugs that surface today, but one latent semantic mismatch and test coverage gaps should be addressed | ### Executive Summary This PR adds rune-based string slicing to yq (e.g., `.country[4:]`), fixes a bug where `.field[N:M]` slicing operated on the document root instead of the field value, and hardens array slicing against very negative indices. The implementation is functionally correct. The key concern is a byte-vs-rune length mismatch for implicit end indices on Unicode strings, currently masked by clamping but semantically wrong. --- ### Findings #### Critical Issues None. #### Important Issues I1. **Implicit end index uses byte length, not rune length, for string slices** — `operator_slice.go:69` + `operator_length.go:20` *(important, gap)* When the end index is omitted (e.g., `.greeting[2:]`), the lexer inserts a `length` operation (`lexer.go:154`). The `lengthOperator` at `operator_length.go:20` computes `len(candidate.Value)` — byte length, not rune length. But `sliceStringNode` at `operator_slice.go:21` uses `len([]rune(...))` — rune length. For multi-byte strings like `"héllo"` (5 runes, 6 bytes), the implicit end index is 6 while the rune array has only 5 elements. The clamping at line 34 (`relativeSecondNumber > length`) corrects this, so no incorrect output is produced today. However: - The intermediate value is semantically wrong - If clamping were weakened in a refactor, this would panic with an out-of-bounds slice - No test exercises a Unicode string with an implicit end index (e.g., `.greeting[2:]` on `héllo`) ```go // operator_length.go:20 — returns byte length length = len(candidate.Value) // "héllo" → 6 // operator_slice.go:21 — uses rune length runes := []rune(lhsNode.Value) // "héllo" → 5 runes length := len(runes) // 5 ``` Fix: The cleanest fix within the PR's scope is to add a test that explicitly exercises this path (Unicode string with omitted end index), documenting that clamping is load-bearing. A more thorough fix would be to make `sliceStringNode` independent of the `length` operator's value — for example, by always computing and using the rune length internally regardless of what `secondNumber` is. The existing clamping already does this, but a comment explaining *why* the clamp is necessary (byte-vs-rune mismatch from the implicit end index) would prevent accidental removal. --- I2. **Duplicated and structurally inconsistent index-clamping logic** — `operator_slice.go:19-51` vs `operator_slice.go:79-100` *(important, regression)* The string path (`sliceStringNode`, lines 19-51) and the array path (lines 79-100) each implement their own index normalization with subtly different structures. The array path (introduced in commit `341e2524`) uses separate `if` blocks for post-adjustment clamping: ```go // Array path (line 89-97) — explicit clamp to 0 after adjustment if relativeSecondNumber < 0 { relativeSecondNumber = len(lhsNode.Content) + secondNumber } if relativeSecondNumber < 0 { // separate check: clamp to 0 relativeSecondNumber = 0 } else if relativeSecondNumber > len(lhsNode.Content) { relativeSecondNumber = len(lhsNode.Content) } ``` The string path (line 31-36) uses `else if`, meaning a very negative `secondNumber` that's still negative after adjustment is never explicitly clamped to 0: ```go // String path (line 31-36) — no explicit clamp to 0 if relativeSecondNumber < 0 { relativeSecondNumber = length + secondNumber // could still be negative } else if relativeSecondNumber > length { // skipped: we took the if branch relativeSecondNumber = length } ``` The string path works only because the later `if relativeSecondNumber < relativeFirstNumber` guard at line 43-44 catches the negative value. This implicit dependency is fragile. Fix: Extract a shared helper for index normalization, or align the string path's structure with the array path: ```diff relativeSecondNumber := secondNumber if relativeSecondNumber < 0 { relativeSecondNumber = length + secondNumber - } else if relativeSecondNumber > length { + } + if relativeSecondNumber < 0 { + relativeSecondNumber = 0 + } else if relativeSecondNumber > length { relativeSecondNumber = length } ``` --- I3. **Missing test coverage for string edge cases that are tested for arrays** — `operator_slice_test.go` *(important, gap)* The PR added array tests for very negative indices (`.[-100:]` at line 103, `.[:-100]` at line 110) but no corresponding string tests. Also missing: a Unicode string test with an implicit end index (which would exercise the byte-vs-rune mismatch in I1). Untested string scenarios: | Expression | Input | Expected | |---|---|---| | `.country[-100:]` | `Australia` | `Australia` (clamp start to 0) | | `.country[:-100]` | `Australia` | empty string (clamp end to 0) | | `.greeting[2:]` | `héllo` | `llo` (implicit end, Unicode) | | `.greeting[:3]` | `héllo` | `hél` (implicit start, Unicode) | Fix: Add these test scenarios with `skipDoc: true`. #### Suggestions S1. **Commit structure needs cleanup before merge** *(suggestion, regression)* The 5 commits include: - `26929879 Initial plan` — empty commit, no file changes - `9a9399ad` — fixes a lint error and test descriptions from the main feature commit `6d345ac7` - `341e2524` — fixes a panic introduced by the same feature commit These are fixup commits that should be squashed into the main feature commit before merging. The "Initial plan" commit should be dropped entirely. --- ### Design Observations **Concerns** - The duplicated index-normalization logic (I2) is a natural candidate for a shared `normalizeSliceIndices(firstNumber, secondNumber, length int) (int, int)` helper. Both the string and array paths perform identical steps: resolve negative indices, clamp to bounds, ensure `start <= end`. A single helper would eliminate the structural inconsistency and make it impossible for the two paths to diverge. *(in-scope)* - The `length` operator (`operator_length.go:20`) returns byte length for strings (`len(candidate.Value)`) rather than rune/character count. This diverges from jq where `"héllo" | length` returns 5. While this pre-dates the PR, the new string slicing feature makes the inconsistency more visible — users might expect `length` and string slicing to agree on what "length" means. *(future)* **Strengths** - Using `[]rune` for string slicing is the correct approach for Unicode handling. This aligns with jq's character-based (not byte-based) string indexing semantics. - The `traverseArrayOperator` fix (evaluating LHS before calling `sliceArrayOperator`) is clean and correct — it solves the root cause of `.field[N:M]` operating on the wrong node without disrupting the existing `.[N:M]` path, since evaluating `SELF` returns the current context unchanged. - The lexer change (adding `openCollect` to the implicit-0 check) is well-scoped and necessary for `.field[:N]` syntax. --- ### Testing Assessment Existing tests pass and cover the core string slicing scenarios well: explicit indices, omitted start, omitted end, negative indices, both indices, full slice `[:]`, out-of-bounds clamping, and basic Unicode. Gaps ranked by risk: 1. Unicode string with implicit end index (`.greeting[2:]`) — exercises the byte-vs-rune mismatch (I1) 2. Very negative string start (`.country[-100:]`) — tests clamping parity with arrays 3. Very negative string end (`.country[:-100]`) — tests clamping parity with arrays 4. Unicode string with implicit start and end (`.greeting[:]`) — exercises both clamping paths on multi-byte input ### Documentation Assessment The generated documentation in `slice-array.md` correctly matches the test scenarios. The header update to "Slice/Splice Array or String" is accurate for slicing; the "Splice" part refers to the pre-existing array insertion example and is unchanged. ### Commit Structure - `26929879 Initial plan` — empty commit with no file changes; should be dropped. - `9a9399ad Fix sliceStringNode signature and fix test descriptions` — fixes lint error and test mistakes from `6d345ac7`; should be squashed into it. - `341e2524 Fix array slice out-of-bounds panic with very negative indices` — fixes a panic from `6d345ac7`; should be squashed into it. - `778088d7 Update pkg/yqlib/operator_slice.go` — Mike Farah's direct edit (using `CreateReplacement`); reasonable as a separate commit since it's from the maintainer. ### Unresolved Feedback All PR review comments have been addressed by subsequent commits: - Test description and expression fixed (`9a9399ad`) - `unparam` lint error fixed (`9a9399ad`) - Array panic with very negative indices fixed (`341e2524`) ### Coverage Summary | File | Status | |------|--------| | `pkg/yqlib/operator_slice.go` | **I1, I2** | | `pkg/yqlib/operator_slice_test.go` | **I3** | | `pkg/yqlib/operator_traverse_path.go` | Reviewed, no issues | | `pkg/yqlib/lexer.go` | Reviewed, no issues | | `pkg/yqlib/doc/operators/slice-array.md` | Reviewed, no issues | | `pkg/yqlib/doc/operators/headers/slice-array.md` | Reviewed, no issues | | `project-words.txt` | Trivial | ```` ### [Claude] — Pass 2 ````markdown Now I have enough context. Let me write the review. --- ## Executive Summary This PR adds string slicing support to yq by routing `!!str` scalar nodes to a new rune-based `sliceStringNode` function, fixing `traverseArrayOperator` to evaluate the LHS before slicing, and extending the lexer's implicit-`0` insertion to work with `.field[:N]` syntax. The prior review pass found the byte-vs-rune length mismatch, structural inconsistency in clamping logic, and missing string edge-case tests. This pass identifies a silent wrong-result bug for non-string scalars, a subtle lexer side effect, and a documentation inaccuracy. ## Findings ### Critical Issues None. ### Important Issues I1. **Slicing non-string scalars (`!!int`, `!!bool`, `!!float`) silently produces invalid tagged output** — `operator_slice.go:74-109` *(important, gap)* ```go if lhsNode.Kind == ScalarNode && lhsNode.guessTagFromCustomType() == "!!str" { results.PushBack(sliceStringNode(lhsNode, firstNumber, secondNumber)) continue } // ... falls through to array path ... sliceArrayNode := lhsNode.CreateReplacement(SequenceNode, lhsNode.Tag, "") ``` When `.value[0:2]` targets a `!!int` scalar (or `!!bool`, `!!float`), the guard at line 74 correctly identifies it's not a string, but the code falls through to the array path. Since `Content` is nil for scalar nodes, `len(lhsNode.Content)` is 0, and the function returns an empty sequence tagged with the scalar's type — e.g. `!!int []`. This is semantically invalid YAML: a sequence node carrying a scalar type tag. Before the PR, the same expression operated on the *wrong node* (the root document map via the un-evaluated context), which was also broken. The PR's `traverseArrayOperator` fix at `operator_traverse_path.go:82-86` correctly routes to the field value now, but this new code path was not accounted for. ``` $ echo 'value: 42' | yq '.value[0:2]' !!int [] $ echo 'value: true' | yq '.value[0:2]' !!bool [] ``` Fix: add explicit handling before the array fallthrough. Either return an error or, to match jq semantics (which coerce scalars to their string representation for slicing), treat non-string scalars as strings: ```diff if lhsNode.Kind == ScalarNode && lhsNode.guessTagFromCustomType() == "!!str" { results.PushBack(sliceStringNode(lhsNode, firstNumber, secondNumber)) continue } + if lhsNode.Kind == ScalarNode { + return Context{}, fmt.Errorf("cannot slice scalar of type %v, only arrays and strings can be sliced", lhsNode.Tag) + } ``` ### Suggestions S1. **Lexer `openCollect` change alters standalone `[: expr]` from error to silent acceptance** — `lexer.go:130` *(suggestion, gap)* ```go if index > 0 && (tokens[index-1].TokenType == traverseArrayCollect || tokens[index-1].TokenType == openCollect) { ``` The added `openCollect` condition correctly enables `.field[:N]` syntax but also affects standalone `[: expr]` collect expressions. Before the PR, `[: "b"]` returned an error (`':' expects 2 args but there is 1`). After the PR, it silently produces `[{0: b}]` because `0` is inserted as an implicit first operand: ``` # before PR $ echo null | yq '[: "b"]' Error: ':' expects 2 args but there is 1 # after PR $ echo null | yq '[: "b"]' - - [{0: b}] ``` This is unlikely to affect real users, but it broadens accepted syntax in an unintended context where `[` is a collect bracket, not a slice bracket. Fix: tighten the condition to only apply when `openCollect` was preceded by a `CheckForPostTraverse` token (meaning the `[` follows a traversal like `.field[`). This could be done by checking `postProcessedTokens` for a recently-inserted `traverseArrayOpType`: ```diff -if index > 0 && (tokens[index-1].TokenType == traverseArrayCollect || tokens[index-1].TokenType == openCollect) { +if index > 0 && (tokens[index-1].TokenType == traverseArrayCollect || + (tokens[index-1].TokenType == openCollect && len(postProcessedTokens) >= 2 && + tokenIsOpType(postProcessedTokens[len(postProcessedTokens)-2], traverseArrayOpType))) { ``` S2. **Doc header still references "Splice" but string slicing is not splicing** — `doc/operators/headers/slice-array.md:1` *(suggestion, gap)* ```markdown # Slice/Splice Array or String ``` "Splice" refers to in-place mutation of arrays (inserting/removing elements), which doesn't apply to strings. The header was updated from "Slice/Splice Array" to "Slice/Splice Array or String", but "Splice" is misleading for the string case. Consider: ```diff -# Slice/Splice Array or String +# Slice Array or String ``` Or keep the current title if splice operations on arrays are still documented under this same page. ## Design Observations **Concerns** - The fallthrough from string detection to array slicing at `operator_slice.go:74-78` uses a "check-and-continue" pattern that leaves all non-`!!str` scalar types (int, bool, float, null) falling silently into array logic designed for `SequenceNode`. This structure means *every new scalar type* that yq might support in the future will silently produce invalid output. A guard-and-error before the array path (as suggested in I1) would make the function fail-safe by default. *(in-scope)* **Strengths** - The rune-based string slicing in `sliceStringNode` correctly handles Unicode by converting to `[]rune` before indexing, which is the right approach for user-visible character boundaries. - The `traverseArrayOperator` fix at `operator_traverse_path.go:82-86` is clean — evaluating the LHS before dispatching to `sliceArrayOperator` directly addresses the root cause of `.field[m:n]` operating on the wrong node. The exclusive branching avoids any double-evaluation concern. ## Testing Assessment Test coverage for the new string slicing feature is reasonable for ASCII strings. Untested scenarios (ranked by risk): 1. **Slicing non-string scalars** — `.value[0:2]` on `!!int`, `!!bool`, `!!float` has no test and produces invalid output (see I1). 2. (Already noted in prior pass I3: missing very-negative and Unicode implicit-index edge cases.) ## Documentation Assessment The generated `slice-array.md` documentation correctly demonstrates string slicing examples. The header doc's use of "Splice" for strings is addressed in S2. ## Commit Structure Already addressed in prior pass S1 (fixup commits should be squashed, plan commit dropped). ## Unresolved Feedback All three of mikefarah's review comments have been addressed by subsequent commits: - Test expression fix → 9a9399ad - Linter error fix → 9a9399ad (signature change) - Very-negative array panic → 341e2524 No unresolved substantive feedback remains. ## Coverage Summary | File | Status | |------|--------| | `pkg/yqlib/operator_slice.go` | **I1** (non-string scalar fallthrough) | | `pkg/yqlib/operator_slice_test.go` | Reviewed, no *new* issues (prior I3 covers gaps) | | `pkg/yqlib/operator_traverse_path.go` | Reviewed, no issues | | `pkg/yqlib/lexer.go` | **S1** (openCollect side-effect) | | `pkg/yqlib/doc/operators/headers/slice-array.md` | **S2** (splice terminology) | | `pkg/yqlib/doc/operators/slice-array.md` | Reviewed, no issues | | `project-words.txt` | Trivial | ```` ### [Claude] — Pass 3 ````markdown Now I have all the information I need. Let me write the review. --- ## Executive Summary This PR adds string slicing to yq by routing `!!str` scalars to a new `sliceStringNode` function, fixing LHS evaluation in `traverseArrayOperator`, and extending the lexer for `.field[:N]` syntax. The core implementation is correct for ASCII strings and passes all tests. However, introducing rune-based string slicing creates a coherence gap with existing byte-based operators (`length`, `match`), producing silently wrong results for multibyte strings in practical usage patterns. **Verdict:** Merge after addressing I1 (important — silent data corruption for multibyte strings). ## Findings ### Critical Issues None. ### Important Issues I1. **Byte-vs-rune inconsistency is NOT masked by clamping and produces silently wrong results** — `operator_slice.go:19-51`, `operator_length.go:20`, `operator_strings.go:292` (important, gap) ```go // operator_slice.go:20 — slicing uses rune count runes := []rune(lhsNode.Value) length := len(runes) ``` ```go // operator_length.go:20 — length uses byte count length = len(candidate.Value) ``` Prior pass I1 identified this mismatch but assessed it as "masked by clamping." That assessment is only correct for the implicit end index (where byte count >= rune count guarantees the clamp fires). For any arithmetic involving `length` or `match` offsets as start indices, the mismatch produces **silently wrong results**: Verified reproduction with `"café"` (4 runes, 5 bytes): ``` $ echo '"café"' | yq '(. | length) as $n | .[($n - 2):]' é # WRONG — expected "fé" (last 2 chars) $ echo '"café"' | yq '.[-2:]' fé # correct with negative index ``` And with `match` byte offsets on `"café latte"` (10 runes, 11 bytes): ``` $ echo 'text: "café latte"' | yq '.text | match("latte").offset' 6 # byte offset (rune offset is 5) $ echo 'text: "café latte"' | yq '.text[0:6]' café l # WRONG — includes first char of "latte" $ echo 'text: "café latte"' | yq '.text[0:5]' café # correct rune offset ``` jq is internally consistent: `"café" | length` returns 4 (characters), and slicing is character-based. This PR correctly chose rune-based slicing (matching jq), but the pre-existing `length` operator at `operator_length.go:20` and `match` offset at `operator_strings.go:292` still use byte-based counting. Fix: Update `lengthOperator` for `ScalarNode` (non-null) to use `len([]rune(candidate.Value))` instead of `len(candidate.Value)`. Similarly, update `addMatch` at `operator_strings.go:292-294` to convert byte offsets to rune offsets using `utf8.RuneCountInString(value[:offset])`. This aligns yq with jq behavior. Note: changing `length` is a behavioral change for multibyte strings, so it warrants a changelog entry. ```diff // operator_length.go case ScalarNode: if candidate.Tag == "!!null" { length = 0 } else { - length = len(candidate.Value) + length = len([]rune(candidate.Value)) } ``` ### Suggestions S1. **Missing test coverage for empty string and Unicode implicit-end slicing** — `operator_slice_test.go` (suggestion, gap) No tests exercise string slicing on: - Empty string values (`""[0:2]`, `""[:]`, `""[0:0]`) - Unicode strings with implicit end index (`"héllo"[2:]`), which is where the byte-vs-rune mismatch would be observable once I1 is fixed - Spread over mixed-type collections (`.[] | .[0:3]` on a document containing both strings and arrays) Fix: Add test scenarios: ```go { skipDoc: true, document: `val: ""`, expression: `.val[0:5]`, expected: []string{"D0, P[val], (!!str)::\n"}, }, { skipDoc: true, document: `val: ""`, expression: `.val[:]`, expected: []string{"D0, P[val], (!!str)::\n"}, }, { skipDoc: true, document: `greeting: héllo`, expression: `.greeting[2:]`, expected: []string{"D0, P[greeting], (!!str)::llo\n"}, }, ``` S2. **Documentation uses only ASCII examples** — `doc/operators/slice-array.md` (suggestion, gap) All four string slicing doc sections use `country: Australia` (ASCII-only). Since the feature's key differentiator from a naive byte-slice is Unicode correctness, one documented example with a multibyte string would demonstrate the rune-based semantics. Fix: Change the negative-index example or add a new doc test with a Unicode string: ```go { description: "Slicing Unicode strings", subdescription: "Slicing is character-based, not byte-based", document: `greeting: café latte`, expression: `.greeting[0:4]`, expected: []string{"D0, P[greeting], (!!str)::café\n"}, }, ``` ## Design Observations **Concerns** - **(in-scope)** The rune-based slicing is the correct choice (matching jq), but it creates a semantic split with the rest of yq's string operators. `length`, `match` offsets, and `match` lengths all use byte-based counting. Fixing `length` and `match` for rune-based semantics as part of this PR would make string operations coherent. Deferring the fix means every new string operator must decide "bytes or runes?" independently, and users cannot predict which convention a given operator uses. **Strengths** - The `sliceStringNode` function correctly handles all clamping edge cases (very negative indices, out-of-bounds, reversed ranges) and the test suite covers these well. - The fix to `traverseArrayOperator` (evaluating LHS before passing to `sliceArrayOperator`) is the right approach — it fixes the root cause of `.field[N:M]` operating on the wrong node. - The `[]rune` conversion ensures correct Unicode handling, which is the non-obvious design decision that makes or breaks this feature. ## Testing Assessment Test coverage is solid for the core string slicing behavior (both ends, negative indices, out-of-bounds, Unicode, full-string `[:]`). Untested scenarios ranked by risk: 1. Unicode strings with implicit end index (`"héllo"[2:]`) — exercises the byte-length-to-rune-count path 2. Empty string slicing — boundary condition for zero-length rune slices 3. Mixed-type spread + slicing — exercises the string/array routing in a multi-node context 4. Very negative second index on strings (`"hello"[:-100]`) — exercises the indirect clamp-via-firstNumber path 5. String slice with variable-based index (`$var as $i | .field[$i:]`) — exercises variable preservation in lhsContext ## Documentation Assessment The generated documentation correctly reflects the test scenarios. The header update from "Slice/Splice Array" to "Slice/Splice Array or String" is accurate. As noted in S2, a Unicode example would strengthen the docs since rune-based behavior is the feature's core value proposition. ## Commit Structure As noted in prior pass S1: the "Initial plan" commit (26929879) should be dropped, and the fixup commits (9a9399ad, 778088d7, 341e2524) should be squashed into the main feature commit before merge. ## Unresolved Feedback All substantive review comments from mikefarah have been addressed by subsequent commits: - Test expression mismatch → fixed in 9a9399ad - `unparam` lint error → fixed in 9a9399ad (removed error return from `sliceStringNode`) - Array panic with very negative indices → fixed in 341e2524 ## Coverage Summary | File | Status | |---|---| | `pkg/yqlib/operator_slice.go` | **I1** | | `pkg/yqlib/operator_slice_test.go` | **S1** | | `pkg/yqlib/operator_traverse_path.go` | Reviewed, no new issues | | `pkg/yqlib/lexer.go` | Reviewed, no new issues | | `pkg/yqlib/doc/operators/slice-array.md` | **S2** | | `pkg/yqlib/doc/operators/headers/slice-array.md` | Reviewed, no new issues | | `project-words.txt` | Trivial | ```` ### [Codex] — Pass 1 ````markdown ### Executive Summary This PR extends slice semantics from arrays to strings, fixes traversal so `.field[lo:hi]` slices the selected scalar instead of the enclosing document, and broadens lexer support for omitted lower bounds in `.field[:N]`. I reviewed all seven changed files, traced the parser and evaluator paths through `lexer.go`, `operator_traverse_path.go`, `operator_slice.go`, and related helpers, and did not identify any correctness regressions in the shipped changes. ### Findings **Critical Issues** None. **Important Issues** None. **Suggestions** None. ### Design Observations **Strengths** - The change keeps the feature in the existing slice pipeline instead of adding a separate string-only operator. Routing through [`pkg/yqlib/operator_traverse_path.go`](/private/tmp/review-codex-kzutvn/pkg/yqlib/operator_traverse_path.go) and [`pkg/yqlib/operator_slice.go`](/private/tmp/review-codex-kzutvn/pkg/yqlib/operator_slice.go) preserves the established AST shape and minimises parser/evaluator churn. - [`sliceStringNode`](/private/tmp/review-codex-kzutvn/pkg/yqlib/operator_slice.go#L19) uses rune indexing rather than byte indexing, which is the right trade-off for user-visible string slicing in a YAML/JSON processor. - The follow-up array clamp fix in [`pkg/yqlib/operator_slice.go`](/private/tmp/review-codex-kzutvn/pkg/yqlib/operator_slice.go#L79) closes the panic path for very negative indices without changing normal slice semantics. ### Testing Assessment 1. Unicode strings with an omitted upper bound, such as `.greeting[1:]`, are not covered. This is the one path that combines rune-based slicing with the lexer’s implicit `length` insertion. 2. Expression-valued bounds on strings, such as `.country[(1):(length-1)]`, are not covered. The implementation reuses `getSliceNumber`, so this is likely fine, but it is a distinct execution path from the new literal-bound tests. 3. Optional traversal combined with slicing, such as `.missing[:2]?`, is not covered. The slice fast-path in [`pkg/yqlib/operator_traverse_path.go`](/private/tmp/review-codex-kzutvn/pkg/yqlib/operator_traverse_path.go#L81) is separate from the normal indexed traversal path and is worth pinning down with an explicit scenario. ### Documentation Assessment The operator header and generated operator page are consistent with the implementation and examples. No additional documentation gaps identified in the reviewed files. ### Commit Structure The history is acceptable. The later commits are review/fixup follow-ups to the initial feature commit, and they stay within the same review scope. ### Unresolved Feedback None. The substantive review comments called out in the PR context appear addressed by `9a9399ad` and `341e2524`. ### Coverage Summary - `pkg/yqlib/doc/operators/headers/slice-array.md` — Reviewed, no issues - `pkg/yqlib/doc/operators/slice-array.md` — Reviewed, no issues - `pkg/yqlib/lexer.go` — Reviewed, no issues - `pkg/yqlib/operator_slice.go` — Reviewed, no issues - `pkg/yqlib/operator_slice_test.go` — Reviewed, no issues - `pkg/yqlib/operator_traverse_path.go` — Reviewed, no issues - `project-words.txt` — Trivial ```` ### [Gemini] — Pass 1 ````markdown ### Executive Summary This PR introduces string slicing functionality mirroring jq's semantics, correctly preserving the scalar node's tags and styles. It processes indices using `[]rune` to safely handle multibyte Unicode characters, and it correctly patches `operator_traverse_path.go` so that chained array lookups like `.country[1:3]` evaluate against the child context (`country`) instead of mistakenly slicing the root document. ### Findings #### Critical Issues None. #### Important Issues I1. **String length operator returns bytes, causing Unicode string slices to fail** — `pkg/yqlib/operator_slice.go:19-50` (important, gap) ```go func sliceStringNode(lhsNode *CandidateNode, firstNumber int, secondNumber int) *CandidateNode { runes := []rune(lhsNode.Value) length := len(runes) // ... ``` While `sliceStringNode` correctly converts strings to runes and indexes against the rune length, the pre-existing `length` operator (in `pkg/yqlib/operator_length.go`) calculates the string length in **bytes** instead of runes. This mismatch breaks explicit mathematical operations when a string contains multibyte Unicode characters. For example, `héllo` contains 5 runes but 6 bytes. A user executing `yq '.country | .[length - 1:]'` expects to return `"o"` (the last character). However, `length` resolves to 6, `6 - 1 = 5`, and `sliceStringNode` evaluates `runes[5:5]` which returns an empty string `""`. Fix: Update `lengthOperator` in `pkg/yqlib/operator_length.go` to return the rune count for strings: ```diff case ScalarNode: if candidate.Tag == "!!null" { length = 0 } else { - length = len(candidate.Value) + length = len([]rune(candidate.Value)) } ``` #### Suggestions S1. **Explicitly clamp negative end index to zero** — `pkg/yqlib/operator_slice.go:32-36` (suggestion, enhancement) ```go relativeSecondNumber := secondNumber if relativeSecondNumber < 0 { relativeSecondNumber = length + secondNumber } else if relativeSecondNumber > length { relativeSecondNumber = length } ``` If `secondNumber` is extremely negative (e.g., `-100`), `length + secondNumber` will still be negative, so `relativeSecondNumber` remains negative here. It safely avoids panicking later on only because the subsequent check `if relativeSecondNumber < relativeFirstNumber` will bump it up to `relativeFirstNumber` (which is clamped to `0`). However, for readability and symmetry with the bounds-clamping you fixed in `sliceArrayOperator`, it is better to explicitly clamp it to `0`. Fix: ```diff relativeSecondNumber := secondNumber if relativeSecondNumber < 0 { relativeSecondNumber = length + secondNumber } + if relativeSecondNumber < 0 { + relativeSecondNumber = 0 - } else if relativeSecondNumber > length { + } else if relativeSecondNumber > length { relativeSecondNumber = length } ``` ### Design Observations **Strengths** - Utilizing `CreateReplacement` ensures that the newly created string slice correctly preserves the YAML style (single quotes, double quotes, block literals) and tag metadata of the original scalar. - The `operator_traverse_path.go` fix correctly isolates the LHS context before passing it into `sliceArrayOperator`, addressing an underlying bug where `.path[1:]` incorrectly fell back to evaluating against the root document. ### Testing Assessment 1. **Length interactions**: While standard bounds and omittances (e.g., `.[1:]`) are extensively tested and implicitly pass a `lengthOp` bound without issues, consider adding an explicit test using the `length` operator computationally (e.g., `.[0 : length - 1]`) on a multibyte Unicode string to ensure the lengths remain aligned. ### Documentation Assessment The documentation additions accurately capture the new behavior, including omitted index semantics and negative indices counting backwards. ### Commit Structure Clean and well isolated. The fix for the `unparam` linter issue and previous code review comments were addressed cleanly in later commits. ### Acknowledged Limitations None. ### Unresolved Feedback None. ### Coverage Summary - `pkg/yqlib/doc/operators/headers/slice-array.md` — Trivial - `pkg/yqlib/doc/operators/slice-array.md` — Trivial - `pkg/yqlib/lexer.go` — Reviewed, no issues - `pkg/yqlib/operator_slice.go` — I1, S1 - `pkg/yqlib/operator_slice_test.go` — Reviewed, no issues - `pkg/yqlib/operator_traverse_path.go` — Reviewed, no issues - `project-words.txt` — Trivial ```` ### [Gemini] — Pass 2 ````markdown ### Executive Summary This PR introduces string slicing support to the slice array operator and corrects a bug where `.path[1:3]` would slice the root document instead of the target array/string. However, the evaluation context for slice indices was inadvertently changed during the fix, breaking indexing operations that reference other parts of the document. Additionally, a lexer rule intended to support omitted indices in slicing introduces silent syntax transformations for all array literals. ### Findings #### Critical Issues C1. **`sliceArrayOperator` evaluates indices against the sliced node instead of the original context** — `pkg/yqlib/operator_traverse_path.go:81-87`, `pkg/yqlib/operator_slice.go:64-69` (critical, regression) ```go // pkg/yqlib/operator_traverse_path.go if expressionNode.RHS != nil && expressionNode.RHS.RHS != nil && expressionNode.RHS.RHS.Operation.OperationType == createMapOpType { lhsContext, err := d.GetMatchingNodes(context, expressionNode.LHS) if err != nil { return Context{}, err } return sliceArrayOperator(d, lhsContext, expressionNode.RHS.RHS) } ``` The PR fixes a bug where `.[1:3]` improperly sliced the root document by evaluating `expressionNode.LHS` first and passing the resolved `lhsContext` to `sliceArrayOperator`. However, `sliceArrayOperator` then reuses this `lhsContext` as the context for evaluating the slice indices (`expressionNode.LHS` and `RHS` via `getSliceNumber`). As a result, an expression like `.a[0:.max]` now evaluates `.max` against `.a` (the array) rather than the root context `.`. This causes a runtime error (`cannot index array with 'max'`) or silent incorrect results if the array happens to contain a matching child. This breaks consistency with array indexing (`.a[.max]` correctly evaluates `.max` against `.`) and diverges from `jq` semantics where slice bounds are evaluated against the parent context. Fix: Modify `sliceArrayOperator` (which is only called by `traverseArrayOperator`) to accept two contexts: the original evaluation context (for evaluating index expressions) and the sliced context (`lhsContext`) for providing the candidate nodes. Alternatively, evaluate the `LHS` and `RHS` index expressions inside `traverseArrayOperator` against the original context and pass the resolved integers to `sliceArrayOperator`. #### Important Issues I1. **`openCollect` lexer match causes non-slice array literals to silently convert `:` to `0:`** — `pkg/yqlib/lexer.go:127-134` (important, regression) ```go if tokenIsOpType(currentToken, createMapOpType) { log.Debugf("tokenIsOpType: createMapOpType") // check the previous token is '[', means we are slice, but dont have a first number if index > 0 && (tokens[index-1].TokenType == traverseArrayCollect || tokens[index-1].TokenType == openCollect) { log.Debugf("previous token is : traverseArrayOpType") // need to put the number 0 before this token, as that is implied postProcessedTokens = append(postProcessedTokens, &token{TokenType: operationToken, Operation: createValueOperation(0, "0")}) } } ``` To support `.country[:5]` (where `[` parses as `openCollect`), the lexer now blindly inserts `0` before any `:` token that immediately follows an `openCollect` token. Because `[` is the start of *any* array literal, this rule applies globally, turning syntax errors into silent map mutations. For example, `[:"val"]` is no longer a syntax error; it parses as `[0:"val"]` and evaluates to an array containing a map `[{"0": "val"}]`. Similarly, `[:]` silently evaluates to `[{"0": length}]`. This tightly couples slice syntax sugar to unrelated array literals. Fix: Do not blindly insert `0` based on `openCollect`. Instead, check if the `[` is actually part of a traversal. Since a slice `[` is preceded by a path token (`CheckForPostTraverse`), you can inspect the previous token in the original stream: ```go isSlice := tokens[index-1].TokenType == traverseArrayCollect if !isSlice && tokens[index-1].TokenType == openCollect && index > 1 && tokens[index-2].CheckForPostTraverse { isSlice = true } if index > 0 && isSlice { // insert 0 } ``` #### Suggestions None. ### Design Observations #### Concerns None. #### Strengths None. ### Testing Assessment 1. **Expressions referencing the parent context inside slice bounds**: The test suite lacks coverage for slice indices that reference the document context, such as `.a[0:.max]`. Adding tests for this would have caught the context evaluation regression (C1). 2. **Array literal edge cases**: Missing test coverage for syntax edge cases like `[:5]` or `[:]` outside the context of a slice operation (e.g., `.country | [:5]`) which would have caught the lexer pollution (I1). ### Documentation Assessment None. ### Commit Structure None. ### Acknowledged Limitations None. ### Unresolved Feedback None. ### Coverage Summary - `pkg/yqlib/doc/operators/headers/slice-array.md` — Reviewed, no issues - `pkg/yqlib/doc/operators/slice-array.md` — Reviewed, no issues - `pkg/yqlib/lexer.go` — I1 - `pkg/yqlib/operator_slice.go` — C1 - `pkg/yqlib/operator_slice_test.go` — Reviewed, no issues - `pkg/yqlib/operator_traverse_path.go` — C1 - `project-words.txt` — Trivial ```` ### [Gemini] — Pass 3 ````markdown ### Executive Summary This PR introduces string slicing support to yq, expanding the array slicing syntax to handle string values safely using rune-based calculations. While the string slicing logic correctly matches `jq` for basic operations, the underlying `sliceArrayOperator` architecture exhibits critical gaps when interacting with assignment (`=`) and deletion (`del()`) operators, silently failing to splice or incorrectly deleting the entire parent node. ### Findings #### Critical Issues None. #### Important Issues I1. **Splice assignment silently fails on strings and arrays** — `pkg/yqlib/operator_slice.go:53-108` ```go if lhsNode.Kind == ScalarNode && lhsNode.guessTagFromCustomType() == "!!str" { results.PushBack(sliceStringNode(lhsNode, firstNumber, secondNumber)) continue } ``` The documentation asserts this operator can "Slice/Splice Array or String," but splice assignments (e.g., `.country[0:3] = "XXX"`) silently fail to modify the original document. Both `sliceStringNode` and the array logic return entirely new, disconnected `Replacement` nodes (`lhsNode.CreateReplacement`). `operator_assign` applies the update to this detached node, leaving the original string or array untouched. `jq` correctly errors out with "Cannot update string slices", while `yq` silently ignores the mutation. Fix: Either implement true splice assignment by retaining back-references and merging the updated sub-slice, or explicitly detect and throw an error when `operator_assign` attempts to mutate a slice result. If splicing is unsupported, remove "Splice" from the documentation headers. I2. **Deleting a slice deletes the entire parent string or array** — `pkg/yqlib/operator_slice.go:106` ```go return context.ChildContext(results), nil ``` When `del(.country[0:3])` or `del(.array[0:2])` is executed, `yq` deletes the ENTIRE `country` or `array` property from the document instead of erroring. The `context.ChildContext(results)` call propagates the original parent's path to the new slice replacement node. `deleteChildOperator` receives this node, extracts its inherited path (which points to the parent string/array), and deletes the whole parent. `jq` prevents this by erroring ("Cannot delete fields from string"). Fix: Ensure the `context.ChildContext` returned by `sliceArrayOperator` clears or isolates the path for replacement nodes so they cannot be targeted by `del`, or have `deleteChildOperator` explicitly error when attempting to delete a slice. I3. **Slicing mapping nodes produces corrupted interleaved sequences** — `pkg/yqlib/operator_slice.go:94-101` ```go var newResults []*CandidateNode for i := relativeFirstNumber; i < relativeSecondNumber; i++ { newResults = append(newResults, lhsNode.Content[i]) } sliceArrayNode := lhsNode.CreateReplacement(SequenceNode, lhsNode.Tag, "") ``` While Pass 2 noted non-string scalars yield `!!int []` (I4), applying the slice operator to a `MappingNode` (`!!map`) produces a far worse outcome. For mappings, `lhsNode.Content` contains an interleaved array of keys and values. Slicing grabs arbitrary chunks of this raw key/value array and places them into a new `SequenceNode` that incorrectly retains the `!!map` tag (e.g., `!!map [key, value]`). This creates an invalid YAML structure that corrupts output formatting. `jq` correctly throws "Cannot index object with object" or similar type mismatch errors. Fix: Add an explicit guard in `sliceArrayOperator` to verify `lhsNode.Kind == SequenceNode` (after the string check) before proceeding with array slicing. Return a clear error if the user attempts to slice a `MappingNode`. I4. **Explicit `null` indices cause parsing errors instead of omitting bounds** — `pkg/yqlib/operator_slice.go:11-13` ```go func getSliceNumber(d *dataTreeNavigator, context Context, node *CandidateNode, expressionNode *ExpressionNode) (int, error) { ... return parseInt(result.MatchingNodes.Front().Value.(*CandidateNode).Value) } ``` In `jq`, indexing with `null` represents an omitted boundary (e.g., `.[null:2]` is equivalent to `.[0:2]`, and `.[1:null]` is `.[1:]`). In `yq`, if an index explicitly evaluates to `null`, `parseInt` attempts to parse the string `"null"`, which fails with `strconv.ParseInt: parsing "null": invalid syntax`. This breaks behavioral parity with `jq` for both strings and arrays. Fix: Update `getSliceNumber` to inspect the evaluated node's tag. If `result.MatchingNodes.Front().Value.(*CandidateNode).Tag == "!!null"`, treat it as an omitted index rather than delegating to `parseInt`. #### Suggestions None. ### Design Observations **Concerns** - **Operator Lineage and Identity** (in-scope): The architecture's reliance on `context.ChildContext` allows detached, newly-allocated nodes (like slices) to inherit the exact path identity of their origin. This decoupling of node instance and path identity is what causes catastrophic bugs like `del()` deleting the entire parent array. The framework should strictly distinguish between path references (which point to original document locations) and computed values (which have no mutable path). **Strengths** - The Unicode handling in `sliceStringNode` using `[]rune` correctly accounts for multi-byte characters and perfectly matches `jq`'s string slicing mechanics based on codepoints. ### Testing Assessment The test coverage in `operator_slice_test.go` exercises the basic string slicing bounds, but misses several edge cases: 1. **Interaction with assignment**: Splicing a string (`.country[0:3] = "X"`) or an array is untested. 2. **Interaction with deletion**: Deleting a slice (`del(.country[0:3])`) is untested. 3. **Explicit null boundaries**: Slicing with `null` indices (`.[null:2]`) is untested. 4. **Invalid types**: Slicing mapping nodes (`.[0:2]` on a map) is untested. ### Documentation Assessment The documentation explicitly claims support for "Slice/Splice Array or String", but splicing operations silently fail. The documentation should either be updated to remove "Splice" or the functionality must be implemented. ### Commit Structure No new commit structure issues found beyond what was reported in Pass 1. ### Acknowledged Limitations None. ### Unresolved Feedback None. ### Coverage Summary - `pkg/yqlib/doc/operators/headers/slice-array.md` — I1 - `pkg/yqlib/doc/operators/slice-array.md` — I1 - `pkg/yqlib/lexer.go` — Reviewed, no issues - `pkg/yqlib/operator_slice.go` — I1, I2, I3, I4 - `pkg/yqlib/operator_slice_test.go` — Reviewed, no issues - `pkg/yqlib/operator_traverse_path.go` — Reviewed, no issues - `project-words.txt` — Trivial ````