Latent Spectral Operators

What We Learn

From coordinate-bound convolutions to spectral-function operators.

Existing deep spatial–spectral fusion methods learn the fusion mapping directly in the coordinate domain with convolutions and attentions. They are tied to a single spatial resolution and provide limited control over the frequency content of reconstructions, often producing severe spectral distortion. LSO instead treats fusion as an operator between spectral functions, learned in a compact latent space.

Coordinate-tied mapping

Convolution and attention bake spatial coordinates into the operator, hurting scale transfer across ×4, ×8, ×16, and ×32.

Uncontrolled frequency leakage

Without an explicit frequency parameterization, networks cannot expose a clean capacity–stability trade-off for multi-frequency content.

Spectral distortion

Band-wise reconstruction error rises in narrow spectral regions, undermining downstream HSI analysis tasks.

Method

Latent Spectral Operators: project, solve, project back.

LSO compresses high-dimensional spatial–spectral observations into a compact latent representation through a cross-attention projection, where learned latent tokens act as spectral prompts. A hierarchical, patch-based architecture aggregates rich multi-scale cues before a structured operator solves the latent fusion mapping.

Overall design of Latent Spectral Operator.

Cross-Attention Projection

Spectral prompts compress observations

Hierarchical Patch-Based

Multi-scale cues at varying resolutions

Trigonometric Basis Solver

Controllable multi-frequency operator

Hierarchical Projection Network

CoordToLatent → Solve → LatentToCoord.

The spatial domain and the latent spectral domain are bridged by two complementary projections. CoordToLatent uses cross-attention against learned spectral prompts to compress high-resolution observations into a fixed-size latent. The Solver applies a trigonometric basis expansion to the latent code, which naturally supports multi-frequency modeling with a capacity–stability trade-off governed by the number of basis functions. Finally, LatentToCoord projects the solved latent back to the coordinate domain to produce the fused HSI.

Hierarchical Projection Network: detailed CoordToLatent, Solve, and LatentToCoord blocks.

Overall design of the Hierarchical Projection Network.


      Solve(z) = z + γ + Σ_q α_q·sin(q·z) + Σ_q β_q·cos(q·z)

The number of basis terms q controls capacity: more terms increase expressivity, fewer terms improve stability. This trade-off is supported by the theoretical analysis in the paper.

Results

State-of-the-art on CAVE and Harvard, across scales.

LSO is evaluated against eight competitive baselines on the CAVE and Harvard benchmarks at ×4 and ×8 magnification. Mean values from the paper are shown: PSNR is higher better, while SAM and ERGAS are lower better.

PSNR

50.728 BEST

SAM

2.381 BEST

ERGAS

1.170 BEST

Method	Params	FLOPs	PSNR ↑	SAM ↓	ERGAS ↓
Bicubic	-	-	31.175	4.683	9.629
DCT	8.15	2457.43	50.196	2.486	1.198
LFormer	2.28	306.94	49.847	2.584	1.313
MIMO	4.98	49.08	50.101	2.767	1.237
FeINFN	3.17	382.72	50.358	2.537	1.210
KNLConv	1.73	114.04	48.570	2.790	1.557
SpecSolver	3.10	364.75	50.568	2.489	1.212
Otias	2.99	278.35	50.213	2.538	1.211
Ours (LSO)	1.94	222.85	50.728	2.381	1.170

PSNR

48.118 BEST

SAM

2.961 BEST

ERGAS

1.675 BEST

Method	Params	FLOPs	PSNR ↑	SAM ↓	ERGAS ↓
Bicubic	-	-	27.383	6.449	14.349
DCT	8.15	2457.43	47.441	3.164	1.791
LFormer	2.28	306.94	47.302	3.474	1.813
MIMO	4.98	49.08	47.442	3.985	1.758
FeINFN	3.17	382.72	47.476	3.565	1.755
KNLConv	1.73	114.04	46.150	3.639	1.989
SpecSolver	3.10	364.75	47.831	3.527	1.805
Otias	2.99	278.35	47.769	3.201	1.757
Ours (LSO)	1.94	222.85	48.118	2.961	1.675

PSNR

49.182 BEST

SAM

2.144 BEST

ERGAS

1.658 BEST

Method	Params	FLOPs	PSNR ↑	SAM ↓	ERGAS ↓
Bicubic	-	-	37.173	2.645	5.445
DCT	8.15	2457.43	49.068	2.163	1.756
LFormer	2.28	306.94	48.938	2.182	1.706
MIMO	4.98	49.08	49.019	2.179	1.688
FeINFN	3.17	382.72	49.001	2.155	1.683
KNLConv	1.73	114.04	48.536	2.203	1.940
SpecSolver	3.10	364.75	48.919	2.169	1.771
Otias	2.99	278.35	49.046	2.151	1.728
Ours (LSO)	1.94	222.85	49.182	2.144	1.658

PSNR

47.957 BEST

SAM

2.438 BEST

ERGAS

1.879 BEST

Method	Params	FLOPs	PSNR ↑	SAM ↓	ERGAS ↓
Bicubic	-	-	34.051	3.186	7.587
DCT	8.15	2457.43	47.836	2.578	2.088
LFormer	2.28	306.94	46.651	2.742	2.305
MIMO	4.98	49.08	47.839	2.475	1.958
FeINFN	3.17	382.72	47.833	2.466	1.933
KNLConv	1.73	114.04	46.706	2.588	2.487
SpecSolver	3.10	364.75	47.652	2.533	2.052
Otias	2.99	278.35	47.848	2.459	1.907
Ours (LSO)	1.94	222.85	47.957	2.438	1.879

CAVE qualitative results at x4 and x8 with band-wise error maps.

CAVE qualitative results with error maps — LSO yields the cleanest residuals.

Harvard qualitative results with error maps.

Per-band spectral error: CAVE (left) and Harvard (right).

Per-band spectral error on CAVE (left) and Harvard (right). LSO suppresses error across the full spectrum.

Efficiency

Compact operators without giving up accuracy.

LSO is significantly more compact than most competitors while still being state-of-the-art on every metric. The latent operator design keeps the parameter budget below 2 M without sacrificing reconstruction quality.

Parameters

1.94 M

Compact latent design

FLOPs

222.85 G

Default configuration in the paper

Best Metrics

12 / 12

Across CAVE & Harvard at ×4 and ×8

Generalization

Trained at ×4, stable at ×8 / ×16 / ×32.

Decoupling features from the coordinate space lets the operator transfer across spatial resolutions. Trained on CAVE at ×4, LSO maintains stable PSNR when tested at ×8, ×16, and ×32 — substantially reducing the resolution-induced degradation observed in coordinate-domain baselines.

PSNR vs magnification factor for cross-scale generalization.

Cross-scale generalization: PSNR as a function of magnification factor. LSO degrades more gracefully than coordinate-domain baselines.

Real Data

Blind real HSI recovery.

On real hyperspectral acquisitions, LSO produces the sharpest reconstructions with the fewest spectral artifacts, confirming that the latent operator generalizes beyond simulated degradations.

Blind real HSI recovery — LSO produces the sharpest reconstructions.

Additional real-data comparisons across diverse scenes.

Citation

Reference

Wei Li, Jieyuan Pei, Junnan Xu, Xuanfeng Ding, Junwei Zhu, Wanjun Chen, and Jianwei Zheng. Zhejiang University of Technology. Correspondence: Jianwei Zheng (zjw@zjut.edu.cn).

@inproceedings{lso2026,
  title     = {Solving Spatial-Spectral Fusion with Latent Spectral Operators},
  author    = {Li, Wei and Pei, Jieyuan and Xu, Junnan and Ding, Xuanfeng and Zhu, Junwei and Chen, Wanjun and Zheng, Jianwei},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
  series    = {Proceedings of Machine Learning Research},
  publisher = {PMLR},
  year      = {2026}
}