LSFFNet: Large-kernel Small-span Feature Fusion Network

Qikai Zhou

doi:10.63313/AJET.9053

Authors

Qikai Zhou College of Computer Science and Technology, Qingdao University, Qingdao, 266071, China Author

DOI:

https://doi.org/10.63313/AJET.9053

Keywords:

Optical Remote Sensing Images, Super-Resolution, Lightweight Model, Feature Fusion

Abstract

High-resolution optical remote sensing images are crucial for improving ground object interpretation and supporting precise earth observation. However, existing mainstream super-resolution methods struggle to adapt to the inherent characteristics of optical remote sensing images, such as large scale variation, weak texture details and complex imaging degradation processes. These methods commonly suffer from blurred high-frequency details, high computational complexity and difficulty in lightweight deployment for practical applications. To address these limitations, this paper proposes a lightweight Large-kernel Small-span Feature Fusion Network (LSFFNet) for optical remote sensing image super-resolution reconstruction, targeting practical scenarios with limited computing resources. A Large-kernel Small-span Feature Extraction Block (LSBlock) is designed in the proposed model. By adopting a small number of large-sized depthwise separable convolutions, multi-scale contextual information can be captured with extremely low parameter overhead. Meanwhile, an Attention Multi-level Feature Fusion Block (AFFBlock) is constructed. Integrating channel and spatial dual attention mechanisms, it enables adaptive selective fusion of multi-layer features and mitigates feature information loss effectively.

Experimental results on multiple remote sensing datasets demonstrate that compared with state-of-the-art methods, the proposed LSFFNet achieves comparable or even better quantitative performance with fewer parameters and lower computational cost, striking a favorable balance between reconstruction quality and inference efficiency. Quantitative evaluations show that our method outperforms several existing mainstream and lightweight models in terms of PSNR and SSIM. Qualitative visual comparisons further verify its superior capability in edge restoration, texture preservation and artifact suppression. The designed LSFFNet also provides a valuable reference for performance optimization of lightweight super-resolution models in remote sensing tasks.

References

[1] Qin J, Xiong J, Liang Z. CNN–Transformer gated fusion network for medical image super-resolution[J]. Scientific Reports, 2025, 15(1): 15338.

[2] Kang X, Duan P, Li J, et al. Efficient swin transformer for remote sensing image super-resolution[J]. IEEE Transactions on Image Processing, 2024, 33: 6367-6379.

[3] Zhu C, Liu Y, Huang S, et al. Taming a diffusion model to revitalize remote sensing image super-resolution[J]. Remote Sensing, 2025, 17(8): 1348.

[4] Li W, Guo H, Liu X, et al. Efficient face super-resolution via wavelet-based feature enhancement network[C]//Proceedings of the 32nd ACM international conference on multimedia. 2024: 4515-4523.

[5] Ahn N, Kang B, Sohn K A. Fast, accurate, and lightweight super-resolution with cascading residual network[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 252-268.

[6] Hui Z, Wang X, Gao X. Fast and accurate single image super-resolution via information distillation network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 723-731.

[7] Hui Z, Gao X, Yang Y, et al. Lightweight image super-resolution with information multi-distillation network[C]//Proceedings of the 27th acm international conference on multimedia. 2019: 2024-2032.

[8] Liu J, Tang J, Wu G. Residual feature distillation network for lightweight image super-resolution[C]//European conference on computer vision. Cham: Springer International Publishing, 2020: 41-55.

[9] Kong F, Li M, Liu S, et al. Residual local feature network for efficient super-resolution[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 766-776.

[10] Wang S, Zhou T, Lu Y, et al. Contextual transformation network for lightweight remote-sensing image super-resolution[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-13.

[11] Wang Z, Li L, Xue Y, et al. Feature enhancement network for lightweight remote-sensing image super-resolution[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-12.

[12] Mehta S, Rastegari M. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer[J]. arXiv preprint arXiv:2110.02178, 2021.

[13] Pan J, Bulat A, Tan F, et al. Edgevits: Competing light-weight cnns on mobile devices with vision transformers[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 294-311.

[14] Li Y, Yuan G, Wen Y, et al. Efficientformer: Vision transformers at mobilenet speed[J]. Advances in neural information processing systems, 2022, 35: 12934-12949.

[15] Vasu P K A, Gabriel J, Zhu J, et al. Fastvit: A fast hybrid vision transformer using structural reparameterization[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2023: 5785-5795.

[16] Tolstikhin I O, Houlsby N, Kolesnikov A, et al. Mlp-mixer: An all-mlp architecture for vision[J]. Advances in neural information processing systems, 2021, 34: 24261-24272.

[17] Fan Q, Huang H, Zhou X, et al. Lightweight vision transformer with bidirectional interaction[J]. Advances in Neural Information Processing Systems, 2023, 36: 15234-15251.

[18] Yang J, Li C, Dai X, et al. Focal modulation networks[J]. Advances in Neural Information Processing Systems, 2022, 35: 4203-4217.

[19] Yang D, Solihin M I, Zhao Y, et al. Model compression for real-time object detection using rigorous gradation pruning[J]. Iscience, 2025, 28(1).

[20] Zawish M, Davy S, Abraham L. Complexity-driven model compression for resource-constrained deep learning on edge[J]. IEEE Transactions on Artificial Intelligence, 2024, 5(8): 3886-3901.

[21] Liu C Y, Kuo E J, Abraham Lin C H, et al. Quantum-train: Rethinking hybrid quantum-classical machine learning in the model compression perspective[J]. Quantum Machine Intelligence, 2025, 7(2): 80.

[22] Tian J, Solgi R, Lu J, et al. Flat-llm: Fine-grained low-rank activation space transformation for large language model compression[C]//Findings of the Association for Computational Linguistics: EACL 2026. 2026: 2988-3002.

[23] Ma X, Zhai K, Luo N, et al. Gearbox fault diagnosis under noise and variable operating conditions using multiscale depthwise separable convolution and bidirectional gated recurrent unit with a squeeze-and-excitation attention mechanism[J]. Sensors, 2025, 25(10): 2978.

[24] Ding X, Zhang X, Ma N, et al. Repvgg: Making vgg-style convnets great again[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 13733-13742.

[25] Ding X, Guo Y, Ding G, et al. Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1911-1920.

[26] Setyawan N, Sun C C, Hsu M H, et al. FaceLiVT: Face Recognition using Linear Vision Transformer with Structural Reparameterization For Mobile Device[C]//2025 IEEE International Conference on Image Processing (ICIP). IEEE, 2025: 1720-1725.

[27] Ding X, Zhang X, Han J, et al. Scaling up your kernels to 31x31: Revisiting large kernel design in cnns[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 11963-11975.

[28] Ding X, Chen H, Zhang X, et al. Re-parameterizing your optimizers rather than architectures. arXiv 2022[J]. arXiv preprint arXiv:2205.15242, 4.

[29] Tian H, Xu B, Li S. Distillation dynamics: Towards understanding feature-based distillation in vision transformers[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2026, 40(11): 9520-9528.

[30] Zuo R, Li Y, Wei S, et al. Calibration-augmented and mechanism-driven deep learning hybrid framework for modeling actual distillation processes[J]. Industrial & Engineering Chemistry Research, 2025, 64(7): 3856-3870.

[31] Liu Y, Feng W, Liu Z, et al. Aligning information capacity between vision and language via dense-to-sparse feature distillation for image-text matching[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2025: 21679-21688.

[32] Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.

[33] Tian C, Xu Y, Zuo W, et al. Coarse-to-fine CNN for image super-resolution[J]. IEEE Transactions on Multimedia, 2020, 23: 1489-1502.

[34] Zhang H, Hu W, Wang X. Parc-net: Position aware circular convolution with merits from convnets and transformer[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 613-630.

[35] Huang Z, Zhang Z, Lan C, et al. Adaptive frequency filters as efficient global token mixers[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2023: 6049-6059.

[36] Wang W, Che S, Liu W, et al. A lightweight large receptive field network LrfSR for image super-resolution[J]. Scientific Reports, 2025, 15(1): 12535.

[37] Gendy G, Sabor N, Al Marzouqi H. Lightweight image super-resolution based on retentive network[J]. Neural Computing and Applications, 2026, 38(5): 130.

[38] Liu X, Liu J, Tang J, et al. Catanet: Efficient content-aware token aggregation for lightweight image super-resolution[C]//Proceedings of the Computer Vision and Pattern Recognition Conference. 2025: 17902-17912.

LSFFNet: Large-kernel Small-span Feature Fusion Network

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

INDEXING & ABSTRACTING