Improved DeepLabv3+ for Semantic Segmentation via Multi-Scale Context Fusion and Feature Attention

JinYao Zhang; XiXin Yang

doi:10.63313/JCSFT.9051

Authors

JinYao Zhang School of Science and Technology, Qingdao University, Qingdao 266071, China Author
XiXin Yang School of Science and Technology, Qingdao University, Qingdao 266071, China Author

DOI:

https://doi.org/10.63313/JCSFT.9051

Keywords:

Multi-scale feature fusion, DeepLabv3+, Deep learning, Semantic segmentation

Abstract

Aiming at the problems of small object edge contour detail loss and insufficient feature representation capability in semantic segmentation tasks, the paper proposes an improved DeepLabv3+-based semantic segmentation algorithm, termed RDSC-DeepLabv3+, which integrates multi-scale contextual enhancement with feature attention optimization. Based on the original DeepLabv3+ framework, the model architecture is improved from multiple aspects, including backbone network selection, multi-scale contextual modeling, and feature attention enhancement: (1) ResNet-50 is adopted as the backbone network to reduce the number of model parameters and computational complexity while maintaining strong feature representation capability;(2) a multi-scale feature fusion module that combines DenseASPP with Strip Pooling (SP) is introduced to enhance the model’s ability to capture contextual information of objects at different scales;(3) the CBAM attention mechanism is incorporated into shallow features to adaptively enhance discriminative region features while suppressing redundant background information.Experimental results on the PASCAL VOC 2007 dataset demonstrate that, compared with the baseline model, the proposed method achieves improvements of 1.86%, 1.78%, and 1.49% in ACC, mPA, and mIoU, respectively. These results verify the comprehensive advantages of RDSC-DeepLabv3+ in terms of segmentation accuracy and computational efficiency, while also exhibiting stronger boundary delineation capability and robustness in complex scenes.

References

[1] Li, X., Tian, J., Pang, X., Shen, L., Li, H., & Zheng, Z. (2026). A two-stage surface defect segmentation method for wind turbine blades based on Deeplabv3+. Scientific Reports (Nature Publisher Group), 16(1), 3534.

[2] Chen, X., Wang, S., Dinavahi, V., Yang, L., Wu, D., & Shen, M. (2025). Landslide recognition based on DeepLabv3+ framework fusing ResNet101 and ECA attention mechanism. Applied Sciences, 15(5), 2613. doi:

[3] H. Liu, M. Liu, F. Chang, C. Liu and Y. Lu, "Semantic Map Construction Under Complex Weather Scenarios," in IEEE Intelligent Systems, vol. 40, no. 5, pp. 25-33, Sept.-Oct. 2025, doi: 10.1109/MIS.2025.3562629.

[4] Di Tian, Yi Han, Shu Wang,Object feedback and feature information retention for small object detection in intelligent transportation scenes,Expert Systems with Applications,Volume 238, Part A,2024,121811,ISSN 0957-4174.

[5] Wei, W., Cheng, Y., He, J. et al. A review of small object detection based on deep learning. Neural Comput & Applic 36, 6283–6303 (2024). https://doi.org/10.1007/s00521-024-09422-6Honeycutt, L. (1998) Communication and Design Course. http://dcr.rpi.edu/commdesign/class1.html

[6] T. Yong, "An active contour for segmentation of images of low contrast and blurred boundaries," 2017 International Conference on Computer, Information and Telecommunication Systems (CITS), Dalian, China, 2017, pp. 78-82, doi: 10.1109/CITS.2017.8035275

[7] Sun, X., Wang, Q., Zhang, X., Xu, C., & Zhang, W. (2022). Deep blur detection network with boundary-aware multi-scale features. Connection Science, 34(1), 766–784. https://doi.org/10.1080/09540091.2021.1933906

[8] Najmi, A., Gevaert, C. M., Kohli, D., Kuffer, M., & Pratomo, J. (2022). Integrating remote sensing and street view imagery for mapping slums. ISPRS International Journal of Geo-Information, 11(12), 631. doi:https://doi.org/10.3390/ijgi11120631

[9] Beeche C, Singh JP, Leader JK, Gezer S, Oruwari AP, Dansingani KK, Chhablani J, Pu J. Super U-Net: a modularized generalizable architecture. Pattern Recognit. 2022 Aug;128:108669. doi: 10.1016/j.patcog.2022.108669. Epub 2022 Apr 1. PMID: 35528144; PMCID: PMC9070860.

[10] Wu, D., Zhao, J., Wang, Z. (2022). AM-PSPNet: Pyramid Scene Parsing Network Based on Attentional Mechanism for Image Semantic Segmentation. In: Wang, Y., Zhu, G., Han, Q., Wang, H., Song, X., Lu, Z. (eds) Data Science. ICPCSEE 2022. Communications in Computer and Information Science, vol 1628. Springer, Singapore. https://doi.org/10.1007/978-981-19-5194-7_32

[11] B. Zewdu Wubineh, A. Rusiecki and K. Halawa, "SE-DeepLabV3+: Cervical Cell Segmentation and Classification Using a Novel SE-Based DeepLabV3+ and Ensemble Method," in IEEE Access, vol. 13, pp. 116430-116441, 2025, doi: 10.1109/ACCESS.2025.3586764.

[12] Liu, H., Chen, Y., Wang, R. et al. MFA-Deeplabv3+: an improved lightweight semantic segmentation algorithm based on Deeplabv3+. Complex Intell. Syst. 11, 424 (2025). https://doi.org/10.1007/s40747-025-02028-y

[13] Woo, S., Park, J., Lee, JY., Kweon, I.S. (2018). CBAM: Convolutional Block Attention Module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11211. Springer, Cham. https://doi.org/10.1007/978-3-030-01234-2_1

[14] K. Wu, S. Zhang and Z. Xie, "Monocular Depth Prediction With Residual DenseASPP Network," in IEEE Access, vol. 8, pp. 129899-129910, 2020, doi: 10.1109/ACCESS.2020.3006704.

[15] Cui, S.; Yang, B.; Wang, Z.; Zhang, Y.; Li, H.; Gao, H.; Xu, H. Enhancing Suburban Lane Detection Through Improved DeepLabV3+ Semantic Segmentation. Electronics 2025, 14, 2865. https://doi.org/10.3390/electronics14142865

[16] Q. Hou, L. Zhang, M. -M. Cheng and J. Feng, "Strip Pooling: Rethinking Spatial Pooling for Scene Parsing," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 4002-4011, doi: 10.1109/CVPR42600.2020.00406.

[17] Qu, S.; Zhou, H.; Zhang, B.; Liang, S. MSPNet: Multi-Scale Strip Pooling Network for Road Extraction from Remote Sensing Images. Appl. Sci. 2022, 12, 4068. https://doi.org/10.3390/app12084068

[18] Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers.Advances in Neural Information Processing Systems (NeurIPS), 2021.arXiv:2105.15203