Deep Review: 20260327-223819-pr-4637
| Date | 2026-03-27 22:38 |
| Repo | lima-vm/lima |
| Round | 2 (of PR #4637) |
| Author | @unsuman |
| PR | #4637 — Ensure only RAW images in cache(imgconv/raw) while keeping the original image |
| Branch | feature/disk-cache |
| Commits | e03a4c24 Ensure only RAW images in cache while keeping orginal image |
| Reviewers | Claude Opus 4.6, Codex GPT 5.4, Gemini 3.1 Pro |
| Verdict | Merge with fixes — Sound design, but the conversion logic overwrites digest state and breaks cache freshness for digest-less images |
| Wall-clock time | 20 min 50 s |
Executive Summary ¶
This PR adds on-demand conversion of non-raw disk images to raw format in the download cache. Drivers declare which formats they support via a new SupportedImageFormats field, and the downloader converts only when necessary. The design preserves the original cached image for drivers that support it (e.g., QEMU with qcow2). The main issues center on the conversion logic overwriting expectedDigest and shadDigest, which breaks both user-provided digest validation and Last-Modified freshness checks for digest-less images. A crash between writing the raw file and its digest file leaves the cache in an unverifiable state.
Critical Issues ¶
None.
Important Issues ¶
pkg/downloader/downloader.go, important, regression
} else {
origStat, _ := os.Stat(shadData)
if origStat.ModTime().After(rawStat.ModTime()) {
needConvert = true
}
}
if needConvert {
logrus.Infof("Converted raw image is missing or stale; (re)converting now.")
converted, rawDigest, err := ensureRawInCache(ctx, shadData, imageFormat, o.expectedDigest)
if err != nil {
return nil, err
}
shadData = converted
o.expectedDigest = rawDigest
shadDigest = rawImgConvDigestPath
} else {
shadData = rawImgConvPath
if currentDigestData, err := os.ReadFile(rawImgConvDigestPath); err == nil {
currentDigest := strings.TrimSpace(string(currentDigestData))
if d, err := digest.Parse(currentDigest); err == nil {
o.expectedDigest = d
shadDigest = rawImgConvDigestPath
}
}
}
}
}
When an image requires conversion, lines 311–313 replace o.expectedDigest with the raw image's digest and redirect shadDigest to rawImgConvDigestPath. The original user-provided digest (from the template) is never validated against the cached source image. Later, validateCachedDigest at line 329 compares the raw digest against its own file — a self-confirming check.
ext := path.Ext(remote)
logrus.Debugf("file %q is cached as %q", localPath, shadData)
if _, err := os.Stat(shadDigest); err == nil {
logrus.Debugf("Comparing digest %q with the cached digest file %q, not computing the actual digest of %q",
o.expectedDigest, shadDigest, shadData)
if err := validateCachedDigest(shadDigest, o.expectedDigest); err != nil {
return nil, err
}
if err := copyLocal(ctx, localPath, shadData, ext, o.decompress, "", ""); err != nil {
return nil, err
}
} else {
if match, lmCached, lmRemote, err := matchLastModified(ctx, shadTime, remote); err != nil {
logrus.WithError(err).Info("Failed to retrieve last-modified for cached digest-less image; using cached image.")
} else if match {
if err := copyLocal(ctx, localPath, shadData, ext, o.decompress, o.description, o.expectedDigest); err != nil {
return nil, err
}
} else {
logrus.Infof("Re-downloading digest-less image: last-modified mismatch (cached: %q, remote: %q)", lmCached, lmRemote)
return nil, nil
}
}
res := &Result{
If an upstream image at the same URL is updated and the user changes the expected digest in their template, the stale cached image is served without detection.
Fix: Validate the original cached file against the caller's expectedDigest before running conversion logic. Move the image-conversion block after the existing digest validation (lines 329–340), or validate explicitly within the conversion block:
if needConvert {
+ if o.expectedDigest != "" && shadDigest != "" {
+ if err := validateCachedDigest(shadDigest, o.expectedDigest); err != nil {
+ return nil, nil // force re-download
+ }
+ }
converted, rawDigest, err := ensureRawInCache(ctx, shadData, imageFormat, o.expectedDigest)
pkg/downloader/downloader.go, important, regression
} else {
shadData = rawImgConvPath
if currentDigestData, err := os.ReadFile(rawImgConvDigestPath); err == nil {
currentDigest := strings.TrimSpace(string(currentDigestData))
if d, err := digest.Parse(currentDigest); err == nil {
o.expectedDigest = d
shadDigest = rawImgConvDigestPath
}
}
}
}
}
ensureRawInCache renames the raw file into place (line 453) before writing raw.digest (line 467). If the process crashes between these two operations, raw.digest is missing on restart. The else block at line 314 sets shadData to the raw path but silently falls through when ReadFile fails, leaving shadDigest and o.expectedDigest unchanged. The subsequent validation at line 329 then validates the original digest file against the original expected digest — which passes — but copies the unverified raw file (via shadData).
return "", "", fmt.Errorf("failed to close raw tmp file %q: %w", rawPathTmp, err)
}
if err := os.Rename(rawPathTmp, rawImgConvPath); err != nil {
return "", "", fmt.Errorf("failed to replace original with raw image: %w", err)
}
algo := digest.Canonical
if originalDigest != "" {
algo = originalDigest.Algorithm()
}
rawDigest, err := calculateFileDigest(rawImgConvPath, algo)
if err != nil {
return "", "", fmt.Errorf("failed to calculate digest of raw image: %w", err)
}
rawDigestPath := filepath.Join(imgConvPath, "raw.digest")
if err := os.WriteFile(rawDigestPath, []byte(rawDigest.String()), 0o644); err != nil {
return "", "", fmt.Errorf("failed to write raw digest file: %w", err)
}
return rawImgConvPath, rawDigest, nil
}
Fix: Treat a raw file without a valid raw.digest as incomplete. Force reconversion:
} else {
shadData = rawImgConvPath
if currentDigestData, err := os.ReadFile(rawImgConvDigestPath); err == nil {
currentDigest := strings.TrimSpace(string(currentDigestData))
if d, err := digest.Parse(currentDigest); err == nil {
o.expectedDigest = d
shadDigest = rawImgConvDigestPath
+ } else {
+ needConvert = true
}
+ } else {
+ needConvert = true
}
}
Then re-execute conversion if needConvert was set in the else block.
pkg/downloader/downloader.go, important, regression
} else {
shadData = rawImgConvPath
if currentDigestData, err := os.ReadFile(rawImgConvDigestPath); err == nil {
currentDigest := strings.TrimSpace(string(currentDigestData))
if d, err := digest.Parse(currentDigest); err == nil {
o.expectedDigest = d
shadDigest = rawImgConvDigestPath
}
}
}
}
}
When a cached raw image exists with a valid raw.digest, lines 318–320 populate o.expectedDigest and shadDigest. The subsequent check at line 329 (os.Stat(shadDigest) succeeds) routes into the digest-validation branch and skips the matchLastModified check at line 339. Images that previously revalidated via Last-Modified become permanently cached after conversion, never detecting upstream updates.
ext := path.Ext(remote)
logrus.Debugf("file %q is cached as %q", localPath, shadData)
if _, err := os.Stat(shadDigest); err == nil {
logrus.Debugf("Comparing digest %q with the cached digest file %q, not computing the actual digest of %q",
o.expectedDigest, shadDigest, shadData)
if err := validateCachedDigest(shadDigest, o.expectedDigest); err != nil {
return nil, err
}
if err := copyLocal(ctx, localPath, shadData, ext, o.decompress, "", ""); err != nil {
return nil, err
}
} else {
if match, lmCached, lmRemote, err := matchLastModified(ctx, shadTime, remote); err != nil {
logrus.WithError(err).Info("Failed to retrieve last-modified for cached digest-less image; using cached image.")
} else if match {
if err := copyLocal(ctx, localPath, shadData, ext, o.decompress, o.description, o.expectedDigest); err != nil {
return nil, err
}
} else {
logrus.Infof("Re-downloading digest-less image: last-modified mismatch (cached: %q, remote: %q)", lmCached, lmRemote)
return nil, nil
}
}
res := &Result{
Fix: Preserve remote freshness and local integrity as separate checks. For callers without a user-provided digest, run matchLastModified before serving imgconv/raw; use raw.digest only for local integrity verification:
+ originalExpectedDigest := o.expectedDigest
// ... conversion block ...
+ // After conversion, restore original digest state for freshness logic
+ if originalExpectedDigest == "" {
+ o.expectedDigest = ""
+ shadDigest = ""
+ }
pkg/downloader/downloader.go and pkg/fileutils/download.go, important, regression
func isNonISOImage(description, remote string, supportedImageFormats []string) bool {
return strings.Contains(description, "image") && supportedImageFormats != nil && path.Ext(remote) != ".iso"
}
downloader.WithCache(),
downloader.WithDecompress(decompress),
downloader.WithDescription(fmt.Sprintf("%s (%s)", description, path.Base(f.Location))),
downloader.WithExpectedDigest(f.Digest),
}
if strings.Contains(description, "image") && supportedImageFormats != nil {
opts = append(opts, downloader.WithImageFormats(supportedImageFormats))
}
Both locations parse the human-readable description string to decide whether to trigger image conversion. A future caller that uses "image" in a description for a non-disk artifact (e.g., "container image manifest") would trigger unwanted conversion. This matches AkihiroSuda's unresolved review comment: "description is not expected to be parsed."
Fix: Remove description parsing. Guard by supportedImageFormats alone, as AkihiroSuda suggested:
// fileutils/download.go
-if strings.Contains(description, "image") && supportedImageFormats != nil {
+if len(supportedImageFormats) > 0 {
// downloader.go
func isNonISOImage(description, remote string, supportedImageFormats []string) bool {
- return strings.Contains(description, "image") && supportedImageFormats != nil && path.Ext(remote) != ".iso"
+ return supportedImageFormats != nil && path.Ext(remote) != ".iso"
}
pkg/driverutil/disk.go, important, regression
}
} else {
format, err := nativeimgutil.DetectFormat(imagePath)
if err != nil {
return err
}
if format == string(diskImageFormat) {
if err = os.Rename(imagePath, diskPath); err != nil {
return err
}
} else {
if err := diskUtil.Convert(ctx, diskImageFormat, imagePath, diskPath, &diskSizeInBytes, false); err != nil {
return fmt.Errorf("failed to convert %q to %q: %w", imagePath, diskPath, err)
}
os.Remove(imagePath)
}
}
The old code always called diskUtil.Convert, which both converted and resized to diskSizeInBytes. The new rename path at line 105 skips the resize. This works today because prepareDisk at pkg/instance/start.go:135 handles resizing after CreateDisk. But EnsureDisk was previously self-contained; it now silently depends on its caller to resize, an implicit contract change.
Fix: Either add the resize call or document the dependency:
+ // Resize handled by prepareDisk() after CreateDisk().
if format == string(diskImageFormat) {
if err = os.Rename(imagePath, diskPath); err != nil {
return err
}
Suggestions ¶
pkg/downloader/downloader.go, suggestion, regression
return nil, err
}
} else {
origStat, _ := os.Stat(shadData)
if origStat.ModTime().After(rawStat.ModTime()) {
needConvert = true
}
}
If os.Stat fails (e.g., file deleted between the existence check at line 278 and line 299), origStat is nil and origStat.ModTime() panics. Within the dir lock this is extremely unlikely, but the fix is trivial.
Fix:
-origStat, _ := os.Stat(shadData)
-if origStat.ModTime().After(rawStat.ModTime()) {
+origStat, err := os.Stat(shadData)
+if err != nil {
+ return nil, err
+}
+if origStat.ModTime().After(rawStat.ModTime()) {
pkg/downloader/downloader.go, suggestion, regression
// Ensure the image is sparse to save cache space.
rawTmpF, err := os.OpenFile(rawPathTmp, os.O_RDWR, 0o644)
if err != nil {
return "", "", fmt.Errorf("failed to open raw tmp file %q: %w", rawPathTmp, err)
}
fi, err := rawTmpF.Stat()
if err != nil {
_ = rawTmpF.Close()
return "", "", fmt.Errorf("failed to stat raw tmp file %q: %w", rawPathTmp, err)
}
if err := diskUtil.MakeSparse(ctx, rawTmpF, fi.Size()); err != nil {
logrus.WithError(err).Warnf("Failed to make %q sparse (non-fatal)", rawPathTmp)
}
The native conversion path (nativeimgutil.convertTo) already creates a sparse file via makeSparse before copying data. The explicit MakeSparse call is redundant for the native path. It provides a safety net for the qemu-img fallback, but deserves a comment explaining why.
Fix: Add a comment explaining the redundancy rationale, or remove and rely on qemu-img convert producing sparse output by default.
pkg/downloader/downloader.go, suggestion, regression
// WithDecompress decompress the download from the cache.
func WithDecompress(decompress bool) Opt {
return func(o *options) error {
o.decompress = decompress
return nil
}
}
func WithImageFormats(supportedImageFormats []string) Opt {
return func(o *options) error {
o.supportedImageFormats = supportedImageFormats
return nil
}
}
All other With* option functions have docstrings (lines 81, 93, 102, 110, 125). This one lacks one. Add a Godoc comment explaining the option's purpose.
pkg/driver/qemu/qemu_driver.go, suggestion, enhancement
info.Features = driver.DriverFeatures{
DynamicSSHAddress: false,
SkipSocketForwarding: false,
CanRunGUI: false,
SupportedImageFormats: []string{string(qcow2.Type), string(rawImage.Type), string(vmdk.Type), string(vhdx.Type)}, // Not a comprehensive list of supported formats
}
return info
The QEMU format list omits natively supported formats like vdi, vpc, and parallels. A user downloading a vdi image would trigger an unnecessary raw conversion. The existing comment acknowledges the limitation, so this is low priority.
Design Observations ¶
Concerns ¶
- (in-scope) The conversion code overwrites
o.expectedDigestandshadDigestingetCached, mixing two concerns: "what format should the cached file be?" and "is the cached file fresh and authentic?" A small refactoring — validate the original cached file first, then convert — would eliminate findings 1, 2, and 3 as a class. Claude Codex - (in-scope) The
Cachedfunction (downloader.go:345-399) returnsCachePath: shadDatawithout awareness of image conversions. Any future caller usingCachedfor images would receive the original qcow2 path instead of the converted raw. Currently safe becauseCachedis used only for nerdctl archives. A comment noting this limitation would help. Claude - (future) Cache conversion always targets
rawregardless of driver preference. For VZ withdiskImageFormat: asif, the flow is: download qcow2 → convert to raw in cache → copy to instance dir → convert raw to asif inEnsureDisk. This double conversion wastes I/O. A more general cache keyed by target format (e.g.,imgconv/raw,imgconv/asif) would avoid this, but is acceptable for a first iteration since asif is new. Claude
Strengths ¶
- Drivers declare supported formats via
SupportedImageFormats, the downloader decides when to convert, andEnsureDiskhandles the final step. This separation respects the existing layering. Claude Gemini - Preserving the original cached artifact alongside
imgconv/rawsolves the qemu-vs-vz determinism problem raised in the PR discussion. Codex - The write-tmp-then-rename pattern in
ensureRawInCacheprevents serving partial conversions to concurrent readers. Claude
Testing Assessment ¶
The new tests cover the key scenarios: fresh download with conversion (with and without digest), cache hit returning pre-converted raw, and pre-populated cache requiring on-demand conversion. The tests are meaningful and do not test mocked behavior.
Untested scenarios ranked by risk:
- Cache corruption/crash recovery: raw file exists but
raw.digestis missing — the integrity gap from Finding 2 - Changed expected digest for cached URL: updating
expectedDigestin the template while the URL remains unchanged — the validation bypass from Finding 1 - Digest-less image freshness: verifying that
matchLastModifiedstill triggers re-download after conversion — the staleness issue from Finding 3 - Driver that supports the original format: e.g., QEMU requesting a qcow2 — should return the original without conversion
- EnsureDisk rename path with resize: verifying disk reaches configured size when the image format already matches
Documentation Assessment ¶
The internals.md change correctly documents the new imgconv/ cache subdirectory. The placement and formatting match the surrounding documentation style. No additional documentation gaps.
Commit Structure ¶
Single commit bundles the entire feature: driver interface changes, downloader conversion logic, EnsureDisk optimization, tests, and documentation. Acceptable for a feature of this size. The commit message has a typo ("orginal" should be "original").
Acknowledged Limitations ¶
pkg/driver/qemu/qemu_driver.go:697—// Not a comprehensive list of supported formats
The QEMU format list intentionally omits uncommon formats. Now that format support drives conversion behavior, under-reporting can cause unnecessary conversions, but the comment acknowledges the tradeoff.
Unresolved Feedback ¶
- Description parsing — @AkihiroSuda: "description is not expected to be parsed" and suggested
if len(supportedImageFormats) > 0 {- https://github.com/lima-vm/lima/pull/4637#discussion_r2922336259
- https://github.com/lima-vm/lima/pull/4637#discussion_r2922337464
Actionable. Same issue as Important Finding #4.
- Pass non-nil only for images — @AkihiroSuda: "Just make sure to pass non-nil
supportedImageFormatsonly for images"Actionable. Closely related to #1 — fixing Finding #4 by removing the description check and guarding on
len(supportedImageFormats) > 0naturally addresses this. - Slice ordering — @AkihiroSuda: "Does the order of the slice matter?"
Low priority. The slice is only used with
slices.Contains; ordering has no effect on current behavior. Worth a brief reply in the PR but requires no code change. - VZ format preference — @AkihiroSuda: "Maybe put asif before raw, assuming that asif is more preferable"
Low priority. Same reason as #3 —
slices.Containsignores order. Good hygiene for future-proofing if the slice ever drives format selection, but no current impact.
Agent Performance Retro ¶
Claude Opus 4.6 ¶
Claude Opus 4.6 produced the most thorough review. It identified the description-string parsing issue (Finding 4), the digest integrity gap (Finding 2), and the EnsureDisk resize concern (Finding 5). It also contributed the most suggestions (4). It traced into callers (prepareDisk) to correctly assess that the resize skip is mitigated, avoiding a false critical. It missed the digest-overwrite issue (Finding 1) and the Last-Modified staleness regression (Finding 3).
Duration: 10 min 17 s. Coverage: all 14 files reviewed. No false positives.
Codex GPT 5.4 ¶
Codex GPT 5.4 delivered a focused, high-signal review. Its unique contribution was Finding 3 (digest-less images become permanently stale) — a subtle interaction between the conversion logic and the existing freshness check that neither Claude nor Gemini caught. It also independently identified the digest integrity gap (Finding 2) and the description-parsing issue (as a suggestion). It was the most concise of the three agents.
Duration: 5 min 46 s. Coverage: all 14 files reviewed. No false positives.
Gemini 3.1 Pro ¶
Gemini 3.1 Pro made the strongest claim — that the digest overwrite (Finding 1) and the EnsureDisk resize skip are critical. The resize claim was overrated: Gemini stated "the VM will boot with a disk size equal to the base image's original size," which is incorrect because prepareDisk handles resizing. The digest overwrite claim was valid but better rated as important (requires same-URL image update with changed digest). Gemini also flagged compressed image format detection as important, but no Lima templates use compressed disk images, making it a theoretical gap. It missed the Last-Modified staleness issue entirely.
Duration: 7 min 33 s. Coverage: marked 10 of 14 files as "Trivial" without reviewing their diffs in detail. No false positives in the findings raised, but severity inflation on two findings.
Summary ¶
| Metric | Claude Opus 4.6 | Codex GPT 5.4 | Gemini 3.1 Pro |
|---|---|---|---|
| Duration | 10:17 | 5:46 | 7:33 |
| Critical | 0 | 0 | 2 (downgraded) |
| Important | 3 | 2 | 2 |
| Suggestion | 4 | 1 | 1 |
| Design observations | 3 | 2 | 1 |
| False positives | 0 | 0 | 0 |
| Unique insights | 0 | 1 | 1 |
| Files reviewed | 14 | 14 | 14 |
| Coverage misses | 0 | 0 | 0 |
Codex provided the best signal-to-noise ratio: the fastest completion time with the most impactful unique finding (the Last-Modified staleness regression). Claude provided the deepest review overall, correctly assessing severity and tracing mitigating code paths. Gemini contributed the only finding about digest validation bypass but overrated the severity of two findings.
The multi-agent approach proved its value: the three most important findings (digest overwrite, integrity gap, staleness regression) were each strongest in a different agent's review.
Review Process Notes ¶
No suggestions for skill improvements or repo context updates from this review.