Multimodal Autonomous Navigation by Fusing Visual and Tactile Perception for Deformable Obstacle Traversal

ZiTong Zhou

doi:10.63313/AJET.9056

Authors

ZiTong Zhou Shenzhen Yuanchuangxing Technology Co., Ltd., Shenzhen, Guangdong, 518107, China Author

DOI:

https://doi.org/10.63313/AJET.9056

Keywords:

Visual–Tactile Fusion, Autonomous Navigation, Deformable Obstacles, Tactile Sensing, Mobile Robots, Multimodal Perception

Abstract

Autonomous mobile robots predominantly rely on visual perception for obstacle avoidance, which inherently treats all detected obstacles as rigid and impenetrable. However, in real-world environments, many obstacles such as curtains, vegetation, and flexible partitions are deformable and can be safely traversed with appropriate force control, yet visual appearance alone rarely provides reliable compliance information. This paper proposes a multimodal navigation framework that fuses exteroceptive visual sensing with proprioceptive tactile perception to assess the passability of ambiguous obstacles. A global visual planner generates an initial path, while a novel tactile-driven local passability classifier determines whether a frontal obstacle is rigid or soft. A custom CNN–LSTM network processes tactile time-series signals from a dedicated probing arm to output a haptic passability score. When a soft obstacle is identified, the navigation system activates an admittance controller to compliantly push through; otherwise, the obstacle is added to the costmap for re-planning. Simulations and real-robot experiments in environments containing curtains and artificial foliage demonstrate that the proposed visual–tactile fusion method reduces traveled distance by 22.3% and mission time by 18.7% compared to a pure vision-based detour approach, while maintaining a 100% hard-collision avoidance rate.

References

[1] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. Cambridge, MA: MIT Press, 2005.

[2] C. Cadena et al., "Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age," IEEE Trans. Robot., vol. 32, no. 6, pp. 1309–1332, 2016.

[3] L. Chen, Y. Zhu, and M. Li, "Tactile-GAT: Tactile graph attention networks for robot tactile perception classification," Scientific Reports, vol. 14, no. 27543, 2024.

[4] R. Calandra, A. Owens, D. Jayaraman, J. Lin, W. Yuan, J. Malik, E. H. Adelson, and S. Levine, "More than a feeling: Learning to grasp and regrasp using vision and touch," IEEE Robot. Autom. Lett., vol. 3, no. 4, pp. 3300–3307, 2018.

[5] W. Yuan, S. Dong, and E. H. Adelson, "GelSight: High-resolution robot tactile sensors for estimating geometry and force," Sensors, vol. 17, no. 12, p. 2762, 2017.

[6] M. A. Lee, Y. Zhu, K. Srinivasan, P. Shah, S. Savarese, L. Fei-Fei, A. Garg, and J. Bohg, "Making sense of vision and touch: Self-supervised learning of multimodal representations for contact-rich tasks," in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), 2019, pp. 8943–8950.

[7] T. Taunyazov, W. Sng, H. H. See, B. Lim, J. Kuan, A. F. Ansari, B. C. K. Tee, and H. Soh, "Event-driven visual-tactile sensing and learning for robots," in Proc. Robotics: Science and Systems (RSS), Corvallis, Oregon, USA, Jul. 2020.

[8] K.-T. Yu and A. Rodriguez, "Realtime state estimation of deformable objects with tactile feedback," IEEE Trans. Autom. Sci. Eng., vol. 16, no. 3, pp. 1315–1327, 2019.

[9] M. A. Lee, Y. Zhu, P. Zachares, M. Tan, K. Srinivasan, S. Savarese, L. Fei-Fei, A. Garg, and J. Bohg, "Making sense of vision and touch: Learning multimodal representations for contact-rich tasks," IEEE Trans. Robot., vol. 36, no. 3, pp. 582–596, 2020.

[10] Z. Liu et al., "A hybrid-frequency sampling tactile sensing system based on a flexible piezoresistive sensor array: Design and dynamic loading validation," PMC/Sci. Rep., 2025.

[11] M. Lambeta, P.-W. Chou, S. Tian, B. Yang, B. Maloon, V. R. Most, D. Stroud, R. Santos, A. Byagowi, G. Kammerer, and R. Calandra, "DIGIT: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation," IEEE Robot. Autom. Lett., vol. 5, no. 3, pp. 3838–3845, 2020.

[12] I. H. Taylor, S. Dong, and A. Rodriguez, "GelSlim 3.0: High-resolution measurement of shape, force and slip in a compact tactile-sensing finger," in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), 2022, pp. 10–781–10–787.

[13] D. Shah et al., "ViKiNG: Vision-based kilometer-scale navigation with geographic hints," in Proc. Robot.: Sci. Syst. (RSS), 2022.

[14] T. M. Howard and A. Kelly, "Optimal rough terrain trajectory generation for wheeled mobile robots," Int. J. Robot. Res., vol. 26, no. 2, pp. 141–166, 2007.

[15] P. Kormushev, S. Calinon, and D. G. Caldwell, "Imitation learning of positional and force skills demonstrated via kinesthetic teaching and haptic input," Adv. Robot., vol. 25, no. 5, pp. 581–603, 2011.