DBRS-Net: A Hybrid Framework for Food Image Classification

Authors

  • Dongmei Ma School of Physics and Electronic Engineering Northwest Normal University, Gansu Lanzhou, China Author
  • Denghui Wang School of Physics and Electronic Engineering Northwest Normal University, Gansu Lanzhou, China Author

DOI:

https://doi.org/10.63313/JCSFT.9029

Keywords:

Food image classification, Deep learning, ResNet, Swin Transformer, Hybrid net-work

Abstract

Food image classification holds significant importance in applications such as food retrieval, nutritional assessment, and dietary management. However, food images typically exhibit pronounced intra-class variations, high inter-class similarity, and complex shooting environments, limiting the classification performance of traditional deep learning approaches. To address this, this paper proposes a novel hybrid network architecture, the Dual-Branch ResNet-Swin Network (DBRS-Net). By integrating ResNet with the window-based attention mechanism of Swin Transformer, it leverages the complementary strengths of both architectures in local fine-grained feature extraction and global context modelling. Specifically, the ResNet branch captures local features such as texture and shape, while the Swin Transformer branch learns overall structure and long-range dependencies. A simple yet effective feature fusion strategy then synthesises comprehensive image representations. Experimental results demonstrate that even without introducing additional complex modules, DBRS-Net achieves stable classification performance on the Food11 dataset, attaining a top-1 accuracy of 94.83%. This provides a reliable foundation for research into food image recognition on larger-scale or more diverse datasets.

References

[1] Y. Wu and M. Zhang, "Swin-CFNet: An Attempt at Fine-Grained Urban Green Space Classifi-cation Using Swin Transformer and Convolutional Neural Network," in IEEE Geoscience and Remote Sensing Letters, vol. 21, pp. 1-5, 2024, Art no. 2503405, doi: 10.1109/LGRS.2024.3404393.

[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolu-tional neural networks," Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017.

[3] A. R. Bushara, R. S. V. Kumar, and S. Kumar, "An ensemble method for the detection and classification of lung cancer using computed tomography images utilizing a capsule net-work with Visual Geometry Group," Biomed. Signal Process. Control, vol. 85, art. no. 104930, 2023.

[4] C. Szegedy, W. Liu, Y. Jia, et al., "Going deeper with convolutions," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Boston, MA, USA, 2015, pp. 1–9.

[5] G. Huang, Z. Liu, K. Q. Weinberger and L. van der Maaten, "Densely connected convolutional networks," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 4700–4708.

[6] Y. Kawano and K. Yanai, “Food image recognition with deep convolutional features,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous Comput. (UbiComp), New York, NY, USA, 2014, pp. 589–593.

[7] J. Chen and C. W. Ngo, “Deep-based ingredient recognition for cooking recipe retrieval,” in Proc. 24th ACM Int. Conf. Multimedia, 2016, pp. 32–41.

[8] C. Liu, Y. Cao, Y. Luo, et al., “DeepFood: Deep learning-based food image recognition for computer-aided dietary assessment,” in Proc. IEEE Int. Conf. Smart Homes Health Telematics (ICOST), Cham, Switzerland, 2016, pp. 37–48.

[9] J. Chen and C. W. Ngo, “Deep-based ingredient recognition for cooking recipe retrieval,” in Proc. 24th ACM Int. Conf. Multimedia, New York, NY, USA, 2016, pp. 32–41.

[10] D. J. Attokaren, I. G. Fernandes, A. Sriram, Y. V. S. Murthy, and S. G. Koolagudi,“Food classi-fication from images using convolutional neural networks,”in Proc. 2017 IEEE Region 10 Conference (TENCON), Penang, Malaysia, 2017, pp. 2801–2806.

[11] N. Martinel, G. L. Foresti, and C. Micheloni, “Wide-Slice residual networks for food recogni-tion,” in Proc. 2018 IEEE Winter Conf. Appl. Comput. Vis. (WACV), Lake Tahoe, NV, USA, Mar. 12–15, 2018, pp. 567–576.

[12] B. Mandal, N. B. Puhan, and A. Verma, "Deep convolutional generative adversarial net-work-based food recognition using partially labeled data," IEEE Sensors Letters, vol. 3, pp. 7000104, 2019. doi: 10.1109/LSENS.2019.2925538.

[13] C. S. Won, "Multi-scale CNN for fine-grained image recognition," IEEE Access, vol. 8, pp. 116663–116674, 2020. doi: 10.1109/ACCESS.2020.3001234.

[14] L. Deng et al., "Mixed Dish Recognition With Contextual Relation and Domain Alignment," in IEEE Transactions on Multimedia, vol. 24, pp. 2034-2045, 2022, doi: 10.1109/TMM.2021.3075037.

[15] C.-S. Chen, G.-Y. Chen, D. Zhou, D. Jiang, and D.-S. Chen, "Res-VMamba: Fine-grained food category visual classification using selective state space models with deep residual learn-ing," arXiv:2402.15761 [cs.CV], 2024. [Online]. Available: https://arxiv.org/abs/2402.15761

Downloads

Published

2025-12-08

Issue

Section

Articles

How to Cite

DBRS-Net: A Hybrid Framework for Food Image Classification. (2025). Journal of Computer Science and Frontier Technologies, 2(1), 9–18. https://doi.org/10.63313/JCSFT.9029