DBRS-Net: A Hybrid Framework for Food Image Classification

Dongmei Ma; Denghui Wang

doi:10.63313/JCSFT.9029

Authors

Dongmei Ma School of Physics and Electronic Engineering Northwest Normal University, Gansu Lanzhou, China Author
Denghui Wang School of Physics and Electronic Engineering Northwest Normal University, Gansu Lanzhou, China Author

DOI:

https://doi.org/10.63313/JCSFT.9029

Keywords:

Food image classification, Deep learning, ResNet, Swin Transformer, Hybrid net-work

Abstract

Food image classification holds significant importance in applications such as food retrieval, nutritional assessment, and dietary management. However, food images typically exhibit pronounced intra-class variations, high inter-class similarity, and complex shooting environments, limiting the classification performance of traditional deep learning approaches. To address this, this paper proposes a novel hybrid network architecture, the Dual-Branch ResNet-Swin Network (DBRS-Net). By integrating ResNet with the window-based attention mechanism of Swin Transformer, it leverages the complementary strengths of both architectures in local fine-grained feature extraction and global context modelling. Specifically, the ResNet branch captures local features such as texture and shape, while the Swin Transformer branch learns overall structure and long-range dependencies. A simple yet effective feature fusion strategy then synthesises comprehensive image representations. Experimental results demonstrate that even without introducing additional complex modules, DBRS-Net achieves stable classification performance on the Food11 dataset, attaining a top-1 accuracy of 94.83%. This provides a reliable foundation for research into food image recognition on larger-scale or more diverse datasets.

References

[1] Y. Wu and M. Zhang, "Swin-CFNet: An Attempt at Fine-Grained Urban Green Space Classifi-cation Using Swin Transformer and Convolutional Neural Network," in IEEE Geoscience and Remote Sensing Letters, vol. 21, pp. 1-5, 2024, Art no. 2503405, doi: 10.1109/LGRS.2024.3404393.

[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolu-tional neural networks," Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017.

[3] A. R. Bushara, R. S. V. Kumar, and S. Kumar, "An ensemble method for the detection and classification of lung cancer using computed tomography images utilizing a capsule net-work with Visual Geometry Group," Biomed. Signal Process. Control, vol. 85, art. no. 104930, 2023.

[4] C. Szegedy, W. Liu, Y. Jia, et al., "Going deeper with convolutions," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Boston, MA, USA, 2015, pp. 1–9.

[5] G. Huang, Z. Liu, K. Q. Weinberger and L. van der Maaten, "Densely connected convolutional networks," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 4700–4708.

[6] Y. Kawano and K. Yanai, “Food image recognition with deep convolutional features,” in Proc. ACM Int. Joint Conf. Pervasive Ubiquitous Comput. (UbiComp), New York, NY, USA, 2014, pp. 589–593.

[7] J. Chen and C. W. Ngo, “Deep-based ingredient recognition for cooking recipe retrieval,” in Proc. 24th ACM Int. Conf. Multimedia, 2016, pp. 32–41.

[8] C. Liu, Y. Cao, Y. Luo, et al., “DeepFood: Deep learning-based food image recognition for computer-aided dietary assessment,” in Proc. IEEE Int. Conf. Smart Homes Health Telematics (ICOST), Cham, Switzerland, 2016, pp. 37–48.

[9] J. Chen and C. W. Ngo, “Deep-based ingredient recognition for cooking recipe retrieval,” in Proc. 24th ACM Int. Conf. Multimedia, New York, NY, USA, 2016, pp. 32–41.

[10] D. J. Attokaren, I. G. Fernandes, A. Sriram, Y. V. S. Murthy, and S. G. Koolagudi,“Food classi-fication from images using convolutional neural networks,”in Proc. 2017 IEEE Region 10 Conference (TENCON), Penang, Malaysia, 2017, pp. 2801–2806.

[11] N. Martinel, G. L. Foresti, and C. Micheloni, “Wide-Slice residual networks for food recogni-tion,” in Proc. 2018 IEEE Winter Conf. Appl. Comput. Vis. (WACV), Lake Tahoe, NV, USA, Mar. 12–15, 2018, pp. 567–576.

[12] B. Mandal, N. B. Puhan, and A. Verma, "Deep convolutional generative adversarial net-work-based food recognition using partially labeled data," IEEE Sensors Letters, vol. 3, pp. 7000104, 2019. doi: 10.1109/LSENS.2019.2925538.

[13] C. S. Won, "Multi-scale CNN for fine-grained image recognition," IEEE Access, vol. 8, pp. 116663–116674, 2020. doi: 10.1109/ACCESS.2020.3001234.

[14] L. Deng et al., "Mixed Dish Recognition With Contextual Relation and Domain Alignment," in IEEE Transactions on Multimedia, vol. 24, pp. 2034-2045, 2022, doi: 10.1109/TMM.2021.3075037.

[15] C.-S. Chen, G.-Y. Chen, D. Zhou, D. Jiang, and D.-S. Chen, "Res-VMamba: Fine-grained food category visual classification using selective state space models with deep residual learn-ing," arXiv:2402.15761 [cs.CV], 2024. [Online]. Available: https://arxiv.org/abs/2402.15761

DBRS-Net: A Hybrid Framework for Food Image Classification

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

INDEXING & ABSTRACTING