Deep Review: 20260427-144006-pr-37

Date2026-04-27 14:40
Reporancher-sandbox/rancher-desktop-opensuse
Round1 (of target)
Author@Nino-K
PR#37 — Add rdd-guest socket bridge for Windows Docker socket forwarding
Commits88abad6e4 Add rdd-guest socket bridge for Windows Docker socket forwarding
ReviewersClaude Opus 4.7 (effort: xhigh), Codex GPT 5.5 (effort: xhigh), Gemini 3.1 Pro (effort: default)
VerdictMerge with fixes — drop the committed binary, restrict the vsock listener to host CID, and guard the type assertions before merging.
Wall-clock time22 min 3 s


Executive Summary

PR #37 introduces rdd-guest, a small Go daemon that runs in the WSL2 utility VM, listens on Hyper-V vsock port 6660, and proxies bytes to /var/run/docker.sock. The Dockerfile builds it from src/rdd-guest, the kiwi build copies it into /usr/local/bin, and config.sh enables a systemd unit gated by ConditionVirtualization=wsl so only WSL profiles activate the bridge.

The core proxy logic is sound — half-close propagation via CloseWrite() matches the Docker API's hijacked-stream contract, and the signal.NotifyContext shutdown path closes the listener cleanly. Three issues warrant fixes before merge: the listener accepts connections from any vsock CID (Critical, security), a 3.5 MB statically-linked x86-64 ELF binary slipped in alongside the source (Important, hygiene), and the halfCloser type assertions are unguarded (Important, robustness). Five smaller suggestions cover documentation drift, module hygiene, and a graceful-shutdown gap.

Structure: 1 critical, 2 important, 5 suggestions, plus design observations on the upstream-import TODO and unit ordering.


Critical Issues

C1. Vsock listener accepts connections from any CID Gemini

func main() {
	ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
	defer stop()

	l, err := vsock.Listen(vsockPort, nil)
	if err != nil {
		log.Fatalf("vsock listen: %v", err)
	}
	defer l.Close()

vsock.Listen with a nil config binds to VMADDR_CID_ANY, so the listener accepts connections from CID 2 (the Windows host, the intended caller) and from any process inside the same lightweight utility VM. WSL2 runs all distros — the rancher-desktop distro plus user distros such as Ubuntu — in a shared LxVM with a single kernel and a single vsock namespace. An unprivileged process in any of those distros can dial vsock.Local (CID 1) on port 6660 and reach this listener; rdd-guest runs as root and unconditionally proxies to /var/run/docker.sock, granting that process root-equivalent Docker API access and bypassing the socket's root:docker 0660 permissions.

The bridge's only legitimate caller is the Windows-side daemon over Hyper-V vsock (peer CID = VMADDR_CID_HOST). Filter on the peer CID at accept time:

 		conn, err := l.Accept()
 		if err != nil {
 			if ctx.Err() != nil {
 				return
 			}
 			log.Printf("rdd-guest: accept: %v", err)
 			continue
 		}
+		addr, ok := conn.RemoteAddr().(*vsock.Addr)
+		if !ok || addr.ContextID != vsock.Host {
+			log.Printf("rdd-guest: rejected connection from %v", conn.RemoteAddr())
+			conn.Close()
+			continue
+		}
 		go handleConn(conn)

Severity rests on Rancher Desktop's threat model. If multi-distro isolation is not a security boundary in this product (the user already has root in their own distro and Rancher Desktop intentionally shares Docker with sibling distros via other channels), the finding drops to defense-in-depth. Worth confirming with the daemon-side design before merging — but the check is one line and fails closed, so the safer default is to land it.


Important Issues

I1. Pre-built ELF binary committed to the source tree Claude Codex Gemini
ELF>mH@@8@@@@@@PPxx@x@dd@@iippOpOвв0$0d0d  Qtdx@xd@$%@Y@+pOp· 3'U'B> d $ Hd$ R0d0$`4d4$x m 5d 5$P x7d7$M d$h  d $
  f &V $``$p())@))ڸ0V0u3c4>
4p-`p6	i.8$SGoH5oPQsjBx78KJe0gAIfY/26Q_HFfyKXLI83bh9iH8/wKaDG8eR3ejsleCMPMhM/Uwh4J7YwtxTkU9bWOQbJGNU?J1kyV;$oLI;fUHHHH?HH88HHH<HH)HH)HHsHHIIIIH1IIAs
MAIMAIMuAH؉DHMH]HxHWHD$/HD$"UHH$H5ۄ$H9HvHH^]HHH\]LHD$H\$HD$̐PtPPHHw<H
$HPH7HP@1HP8+HPP%HPpHP8HP8HPPHP01@Htrfu11zHHHH11HUHHt111]IHHIHYtH|6IHHH@HH!H@ǀuHu	x_1ɉ]ÐbI;f|UHHt11
11]HKH4IH4q|H|KAHHHH@HH!HAuȐHH@H|HHH9rH]{aHD$-HD$fI;fUHt11
11]HKH4IH4q|HAHHHH@HH!HAuH11IJH4
H<IH<yDDH|LEAIHIH@HI!LAuHH@H|HH@H9r]``HD$,HD$Ld$M;fUHHH$H$H fH ,HD$EfD$MHD$;fD$CHE1AʀFTEIIHAHtI

A 3.5 MB statically-linked Linux x86-64 ELF binary was committed alongside the source. The Dockerfile builds the same binary from source in the gobuild stage at Dockerfile:20-22 (go build -ldflags '-s -w' -o /go/bin/rdd-guest .) and copies the freshly-built artifact into the kiwi description at Dockerfile:37; the checked-in copy is never consumed. The sibling src/rd-init/ directory contains no equivalent binary, so this is unintentional. Three downsides: the binary permanently bloats the repo (git cannot re-deduplicate binary blobs the way it does text), it ships in every clone and release tarball, and it risks shipping a stale or differently-built artifact than the Dockerfile produces if anyone reaches for it directly. The committed binary is also x86-64 only, so an aarch64 user who runs it locally would get an exec error.

Fix: remove the binary and add an ignore rule covering both per-module binaries.

--- a/.gitignore
+++ b/.gitignore
 /root/build/
 *.qcow2.xz
 *.raw.xz
 *.tar.xz
+/src/rdd-guest/rdd-guest
+/src/rd-init/rd-init
git rm src/rdd-guest/rdd-guest

(important, regression)

I2. Unchecked type assertions to halfCloser will panic on upstream type changes Claude Gemini
		log.Printf("rdd-guest: dial docker: %v", err)
		return
	}
	defer dockerConn.Close()

	pipe(vsockConn.(halfCloser), dockerConn.(halfCloser))
}

// pipe bidirectionally proxies between a and b until both directions are done.
func pipe(a, b halfCloser) {
	done := make(chan error, 2)

Both assertions are unguarded. Today they succeed: *vsock.Conn from mdlayher/vsock v1.2.1 and *net.UnixConn from net.Dialer.DialContext("unix", …) both expose CloseWrite() error. But the static contracts do not require it — net.Listener.Accept() returns net.Conn and net.Dialer.DialContext returns net.Conn, neither of which guarantees CloseWrite. A future minor-version bump in mdlayher/vsock that wraps the conn (e.g., a context-cancellable adapter) crashes the agent on the first incoming connection. Systemd's Restart=on-failure then flaps the service, and every retry hits the same panic.

Fix: use the comma-ok idiom and drop the connection on mismatch.

-	pipe(vsockConn.(halfCloser), dockerConn.(halfCloser))
+	vsockHC, ok := vsockConn.(halfCloser)
+	if !ok {
+		log.Printf("rdd-guest: vsock conn does not support CloseWrite")
+		return
+	}
+	dockerHC, ok := dockerConn.(halfCloser)
+	if !ok {
+		log.Printf("rdd-guest: docker conn does not support CloseWrite")
+		return
+	}
+	pipe(vsockHC, dockerHC)

(important, gap)


Suggestions

S1. Package comment says "Lima/WSL2" but the service is WSL-only Codex
// Package main is the rdd-guest agent that runs inside the Lima/WSL2 VM.
// It listens on a vsock port and forwards connections to the Docker socket,
// enabling the Windows host to reach /var/run/docker.sock via Hyper-V vsock.
//
package main

The systemd unit at root/usr/local/lib/systemd/system/rdd-guest.service:4 is gated by ConditionVirtualization=wsl, and config.sh:120 enables it only inside the wsl kiwi profile. The package doc oversells the supported runtime path.

[Unit]
Description=Rancher Desktop Guest Socket Bridge
Documentation=https://github.com/rancher-sandbox/rancher-desktop-daemon
ConditionVirtualization=wsl

[Service]
ExecStart=/usr/local/bin/rdd-guest
Restart=on-failure
RestartSec=5s
if [[ ${kiwi_profiles:-} =~ wsl ]]; then
    # Enable network namespace
    systemctl enable network-setup
    systemctl enable rancher-desktop-guest-agent.service
    systemctl enable wsl-proxy.service
    systemctl enable rdd-guest.service
    # Do not manage /tmp; that is managed by WSL.
    mkdir -p /usr/local/lib/tmpfiles.d
    touch /usr/local/lib/tmpfiles.d/fs-tmp.conf
fi
-// Package main is the rdd-guest agent that runs inside the Lima/WSL2 VM.
+// Package main is the rdd-guest agent that runs inside the WSL2 VM.

(suggestion, gap)

S2. TODO comment swallows the halfCloser doc comment Claude
		}
		go handleConn(conn)
	}
}

// TODO: once rancher-desktop-daemon is public, replace the inlined halfCloser,
// pipe(), and handleConn() here with a direct import of pkg/socketbridge, which
// contains the implementations (HalfCloser interface, Pipe function).
// halfCloser is a net.Conn that can independently close the write side.
type halfCloser interface {
	net.Conn
	CloseWrite() error
}

A Go doc comment is the contiguous comment block immediately preceding a declaration, so the entire four-line block becomes the godoc for halfCloser. The TODO also covers pipe and handleConn, not just the type, so it belongs at file scope. Separate them with a blank line.

 // TODO: once rancher-desktop-daemon is public, replace the inlined halfCloser,
 // pipe(), and handleConn() here with a direct import of pkg/socketbridge, which
 // contains the implementations (HalfCloser interface, Pipe function).
+
 // halfCloser is a net.Conn that can independently close the write side.
 type halfCloser interface {

(suggestion, gap)

S3. Documentation= URL points to a non-public repository Claude
[Unit]
Description=Rancher Desktop Guest Socket Bridge
Documentation=https://github.com/rancher-sandbox/rancher-desktop-daemon
ConditionVirtualization=wsl

[Service]
ExecStart=/usr/local/bin/rdd-guest
Restart=on-failure

gh repo view rancher-sandbox/rancher-desktop-daemon returns "Could not resolve to a Repository" — the daemon repo is still private (consistent with the TODO at main.go:55). Operators clicking the link from systemctl status rdd-guest get a 404 until the daemon repo is opened. Drop the line until the repo is public, or point at a tracking issue in this repo instead.


	// Trigger udevadm to cause the network interface to change name.
	slog.InfoContext(ctx, "triggering udevadm to rename interfaces")
	if err := runCommand(ctx, "/usr/bin/udevadm", "control", "--reload"); err != nil {
		return fmt.Errorf("failed to reload udev: %w", err)
	}
	err = runCommand(ctx, "/usr/bin/udevadm", "trigger", "--type=devices", "--subsystem-match=net")
	if err != nil {
		return fmt.Errorf("failed to rename network devices: %w", err)
	}

(suggestion, gap)

S4. New module diverges from src/rd-init on Go toolchain and golang.org/x/sys Claude
module github.com/rancher-sandbox/rancher-desktop-opensuse/src/rdd-guest

go 1.23.0

require github.com/mdlayher/vsock v1.2.1

require (
	github.com/mdlayher/socket v0.5.1 // indirect

src/rd-init/go.mod declares go 1.25.0 and golang.org/x/sys v0.43.0 (recently bumped in 69c42c26d and 7bc2b6722). The new module is below both. Same image, same builder (opensuse/bci/golang:stable), but two different toolchain floors: Dependabot will raise PRs against this module independently, and the next x/sys advisory needs two separate bumps.

-go 1.23.0
+go 1.25.0
…
-	golang.org/x/sys v0.21.0 // indirect
+	golang.org/x/sys v0.43.0 // indirect

After bumping, run go mod tidy to refresh go.sum.

(suggestion, gap)

S5. Active connections are severed at shutdown Gemini
	}
	defer l.Close()

	log.Printf("rdd-guest: listening on vsock port %d", vsockPort)

	go func() {
		<-ctx.Done()
		l.Close()
	}()

	for {
		conn, err := l.Accept()
		if err != nil {
			if ctx.Err() != nil {

On SIGTERM, l.Close() unblocks Accept and main returns. Active proxy connections — long-running docker logs -f, docker events, hijacked docker exec -it streams — are abruptly severed when the program exits. Track in-flight handleConn goroutines with a sync.WaitGroup and wait for them (with a deadline) before returning, so the bridge drains gracefully on systemd reload/restart.

(suggestion, gap)


Design Observations

Concerns

Strengths


Testing Assessment

No tests added. The repo has no Go test suite for src/rd-init either, so this is a continuation of existing convention rather than a regression. Codex ran go test ./... in src/rdd-guest (passes, no test files) and smoke-built for linux/amd64 and linux/arm64. Verification today is by booting a WSL image. Scenarios that would matter most if a regression slipped in:

  1. Vsock peer-CID rejection (highest risk after C1): dial from CID 1 (loopback inside the VM) and verify the connection is rejected before rdd-guest dials docker.sock.
  2. Half-close on hijacked streams: docker exec -it from the host, exercising both CloseWrite paths in pipe().
  3. Dial failure on missing socket: with the containerd engine selected and /var/run/docker.sock absent, the bridge should log per-connection dial errors and not crash.
  4. Listener restart: systemctl restart rdd-guest should rebind port 6660 cleanly with no stale-socket error.
  5. Concurrent connections: many parallel host-side Docker calls.
  6. Graceful shutdown: in-flight long-poll connection survives a systemctl reload (depends on S5).

Documentation Assessment


Commit Structure

Single commit, message accurate, scope coherent. The only blemish is the unintended binary in I1 — once that is dropped, the commit is clean.


Acknowledged Limitations


Unresolved Feedback

None — PR has no prior review comments.


Agent Performance Retro

Claude

Surfaced the broadest set of issues — caught the binary commit, the unguarded type assertions, the godoc-swallowing TODO, the dead Documentation= URL, and the cross-module Go/x/sys drift between rdd-guest and rd-init (a finding that required reading two go.mod files and the recent dependabot history). Calibration was a half-step hot: the binary commit was rated Critical, which oversells a hygiene problem that doesn't break runtime — Important fits better. Missed the vsock peer-CID security concern that Gemini caught.

Codex

Found the binary commit and the Lima/WSL2 doc-comment mismatch and verified locally with go test ./... and cross-arch smoke builds, which is more verification than the other two agents performed. Calibration ran cold the other way: the binary commit landed at Suggestion when it warrants Important, and Codex reported zero Critical or Important findings overall, missing both the vsock authorization gap (security) and the type-assertion panic risk (robustness). Useful but underweight for this PR.

Gemini

Sole agent to surface the vsock peer-CID issue (C1) and articulate the WSL2 multi-distro architecture that makes it exploitable — the highest-impact finding of this review. Also caught the binary commit and the type-assertion risk, and proposed a graceful-shutdown improvement neither other agent flagged. Skipped git blame per its known quota behavior, but that did not affect this review since every changed line is new code. Strongest pass on the security/robustness axis, lighter on documentation and module hygiene.

Summary

Claude Opus 4.7Codex GPT 5.5Gemini 3.1 Pro
Duration5m 21s5m 07s3m 32s
Findings2I 3S1I 1S1C 2I 1S
Tool calls36 (Read 17, Bash 16, Grep 3)60 (shell 59, stdin 1)8 (runshellcommand 7, read_file 1)
Design observations622
False positives000
Unique insights412
Files reviewed777
Coverage misses000
Totals2I 3S1I 1S1C 2I 1S
Downgraded1 (C→I)00
Dropped100

Reconciliation: Claude raised the binary commit at Critical; consolidated to Important (I1) since it is a hygiene/repo-bloat issue with no runtime impact. Codex raised the same finding at Suggestion; promoted to Important to match the consolidated severity. No findings were dropped as false positives.

Gemini was the most valuable agent on this PR — the vsock authorization finding is the only one that genuinely changes whether the PR ships safely, and it required domain knowledge about WSL2's shared utility VM that the other two agents did not bring. Claude provided the broadest coverage and caught the cross-module hygiene issues. Codex's verification work (running the test suite, cross-arch builds) was useful but its severity calibration was too soft for the actual issues present.


Review Process Notes

Skill improvements

Repo context updates