ICML 2026  /  Spatial–Spectral Fusion

Solving Spatial–Spectral Fusion with
Latent Spectral Operators

Wei Li, Jieyuan Pei, Junnan Xu, Xuanfeng Ding, Junwei Zhu, Wanjun Chen, Jianwei Zheng*

College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou  ·  Zhejiang Key Laboratory of Visual Information Intelligent Processing
*Corresponding author: zjw@zjut.edu.cn

LSO learns spatial–spectral fusion as a mapping between spectral functions in a compact latent space — projected via cross-attention with spectral prompts and solved by a trigonometric basis solver.

2 benchmarks | SOTA on CAVE & Harvard | 1.94 M params | Cross-scale ×4 → ×32 transfer
What We Learn

From coordinate-bound convolutions to spectral-function operators.

Existing deep spatial–spectral fusion methods learn the fusion mapping directly in the coordinate domain with convolutions and attentions. They are tied to a single spatial resolution and provide limited control over the frequency content of reconstructions, often producing severe spectral distortion. LSO instead treats fusion as an operator between spectral functions, learned in a compact latent space.

01

Coordinate-tied mapping

Convolution and attention bake spatial coordinates into the operator, hurting scale transfer across ×4, ×8, ×16, and ×32.

02

Uncontrolled frequency leakage

Without an explicit frequency parameterization, networks cannot expose a clean capacity–stability trade-off for multi-frequency content.

03

Spectral distortion

Band-wise reconstruction error rises in narrow spectral regions, undermining downstream HSI analysis tasks.

Method

Latent Spectral Operators: project, solve, project back.

LSO compresses high-dimensional spatial–spectral observations into a compact latent representation through a cross-attention projection, where learned latent tokens act as spectral prompts. A hierarchical, patch-based architecture aggregates rich multi-scale cues before a structured operator solves the latent fusion mapping.

Overall LSO framework: cross-attention projection with spectral prompts feeds a hierarchical patch-based backbone and a trigonometric basis solver.
Overall design of Latent Spectral Operator.
Cross-Attention Projection
Spectral prompts compress observations
Hierarchical Patch-Based
Multi-scale cues at varying resolutions
Trigonometric Basis Solver
Controllable multi-frequency operator
Hierarchical Projection Network

CoordToLatent → Solve → LatentToCoord.

The spatial domain and the latent spectral domain are bridged by two complementary projections. CoordToLatent uses cross-attention against learned spectral prompts to compress high-resolution observations into a fixed-size latent. The Solver applies a trigonometric basis expansion to the latent code, which naturally supports multi-frequency modeling with a capacity–stability trade-off governed by the number of basis functions. Finally, LatentToCoord projects the solved latent back to the coordinate domain to produce the fused HSI.

Hierarchical Projection Network: detailed CoordToLatent, Solve, and LatentToCoord blocks.
Overall design of the Hierarchical Projection Network.
Solve(z) = z + γ + Σq αq·sin(q·z) + Σq βq·cos(q·z)

The number of basis terms q controls capacity: more terms increase expressivity, fewer terms improve stability. This trade-off is supported by the theoretical analysis in the paper.

Results

State-of-the-art on CAVE and Harvard, across scales.

LSO is evaluated against eight competitive baselines on the CAVE and Harvard benchmarks at ×4 and ×8 magnification. Mean values from the paper are shown: PSNR is higher better, while SAM and ERGAS are lower better.

PSNR
50.728 BEST
SAM
2.381 BEST
ERGAS
1.170 BEST
MethodParamsFLOPsPSNR SAM ERGAS
Bicubic--31.1754.6839.629
DCT8.152457.4350.1962.4861.198
LFormer2.28306.9449.8472.5841.313
MIMO4.9849.0850.1012.7671.237
FeINFN3.17382.7250.3582.5371.210
KNLConv1.73114.0448.5702.7901.557
SpecSolver3.10364.7550.5682.4891.212
Otias2.99278.3550.2132.5381.211
Ours (LSO)1.94222.8550.7282.3811.170
CAVE qualitative results at x4 and x8 with band-wise error maps.
CAVE qualitative results with error maps — LSO yields the cleanest residuals.
Harvard qualitative results with error maps.
Harvard qualitative results with error maps.
Per-band spectral error: CAVE (left) and Harvard (right).
Per-band spectral error on CAVE (left) and Harvard (right). LSO suppresses error across the full spectrum.
Efficiency

Compact operators without giving up accuracy.

LSO is significantly more compact than most competitors while still being state-of-the-art on every metric. The latent operator design keeps the parameter budget below 2 M without sacrificing reconstruction quality.

Parameters
1.94 M
Compact latent design
FLOPs
222.85 G
Default configuration in the paper
Best Metrics
12 / 12
Across CAVE & Harvard at ×4 and ×8
Generalization

Trained at ×4, stable at ×8 / ×16 / ×32.

Decoupling features from the coordinate space lets the operator transfer across spatial resolutions. Trained on CAVE at ×4, LSO maintains stable PSNR when tested at ×8, ×16, and ×32 — substantially reducing the resolution-induced degradation observed in coordinate-domain baselines.

PSNR vs magnification factor for cross-scale generalization.
Cross-scale generalization: PSNR as a function of magnification factor. LSO degrades more gracefully than coordinate-domain baselines.
Real Data

Blind real HSI recovery.

On real hyperspectral acquisitions, LSO produces the sharpest reconstructions with the fewest spectral artifacts, confirming that the latent operator generalizes beyond simulated degradations.

Blind real HSI recovery comparisons.
Blind real HSI recovery — LSO produces the sharpest reconstructions.
Additional real-data comparisons.
Additional real-data comparisons across diverse scenes.
Citation

Reference

Wei Li, Jieyuan Pei, Junnan Xu, Xuanfeng Ding, Junwei Zhu, Wanjun Chen, and Jianwei Zheng. Zhejiang University of Technology. Correspondence: Jianwei Zheng (zjw@zjut.edu.cn).

@inproceedings{lso2026,
  title     = {Solving Spatial-Spectral Fusion with Latent Spectral Operators},
  author    = {Li, Wei and Pei, Jieyuan and Xu, Junnan and Ding, Xuanfeng and Zhu, Junwei and Chen, Wanjun and Zheng, Jianwei},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
  series    = {Proceedings of Machine Learning Research},
  publisher = {PMLR},
  year      = {2026}
}