Design and Implementation of a Robotic Dynamic Grasping System Based on Im-proved YOLO and Fuzzy PID Control

ZiTong Zhou

doi:10.63313/JCSFT.9070

Authors

ZiTong Zhou Shenzhen Yuanchuangxing Technology Co., Ltd., Shenzhen, Guangdong, 518107, China Author

DOI:

https://doi.org/10.63313/JCSFT.9070

Keywords:

Dynamic Grasping, Improved YOLO, Fuzzy PID, Manipulator Control, Attention Mechanism, Multi-Scale Fusion, Real-Time Perception

Abstract

Industrial and service robots increasingly need to grasp objects that are moving on conveyors, sliding down chutes, or being handed over by humans, yet most production-grade pipelines still assume static targets. Two pain points dominate: detection networks that drop precision when the target is small, partially occluded, or motion-blurred, and joint-level controllers whose gains are tuned offline and therefore fail to compensate for the time-varying dynamics introduced by chasing a moving object. This paper proposes an end-to-end dynamic grasping system that couples an improved YOLO detector with a fuzzy PID joint controller. The detector embeds a Ghost-bottleneck CSPDarknet backbone, a coordinate-attention module that emphasizes motion-salient regions, a BiFPN neck for multi-scale fusion, and an SIoU + α-CIoU regression objective that converges faster on tightly packed parts. The controller treats tracking error and its derivative as fuzzy variables, online tunes ΔKp, ΔKi, and ΔKd through a 49-rule Mamdani inference base, and feeds the corrected gains to each joint in real time. The two modules are connected by a Kalman-smoothed pose stream and a quintic on-line trajectory re-planner. We trained the detector on a 12,800-image conveyor dataset and evaluated the integrated system in 300 dynamic trials on a 6-DOF UR5 with parallel jaws. Compared with a YOLOv5s + classical-PID baseline, the proposed pipeline raises [email protected] from 74.3 % to 86.4 %, cuts joint-tracking overshoot from 18.6 % to 3.7 %, and increases dynamic-grasp success rate from 65.4 % to 91.2 % at object speeds up to 0.30 m/s, while keeping inference at 129 FPS on a single RTX 3060.

References

[1] L. Sun, Y. Liu, and J. Wang, “High-throughput conveyor pick-and-place with deep visual servoing,” IEEE Trans. Autom. Sci. Eng., vol. 19, no. 3, pp. 1834–1847, 2022.

[2] W. Yang, T. Sun, and H. Wang, “Robot-to-human handover with motion prediction and grasp re-planning,” IEEE Robot. Autom. Lett., vol. 7, no. 4, pp. 9842–9849, 2022.

[3] G. Jocher et al., “YOLOv5: a state-of-the-art real-time object detection system,” GitHub repository, https://github.com/ultralytics/yolov5, 2022.

[4] K. Åström and T. Hägglund, PID Controllers: Theory, Design, and Tuning, 2nd ed. ISA Press, 1995.

[5] Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos, “A unified multi-scale deep CNN for fast object detection,” in Proc. ECCV, 2016, pp. 354–370.

[6] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in Proc. CVPR, 2018, pp. 8759–8768.

[7] R. Pawlowski, J. Kohnert, and M. Bartoszewicz, “Adaptive PID control for fast pick-and-place,” IEEE Trans. Ind. Electron., vol. 68, no. 5, pp. 4310–4319, 2021.

[8] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng, “ROS: an open-source Robot Operating System,” in Proc. ICRA Workshop on Open Source Software, 2009.

[9] J. Redmon and A. Farhadi, “YOLOv3: an incremental improvement,” arXiv:1804.02767, 2018.

[10] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. CVPR, 2017, pp. 936–944.

[11] M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and efficient object detection,” in Proc. CVPR, 2020, pp. 10781–10790.

[12] Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile network design,” in Proc. CVPR, 2021, pp. 13713–13722.

[13] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-IoU loss: Faster and better learning for bounding box regression,” in Proc. AAAI, 2020, pp. 12993–13000.

[14] Z. Gevorgyan, “SIoU loss: more powerful learning for bounding box regression,” arXiv:2205.12740, 2022.

[15] J. He, S. Erfani, X. Ma, J. Bailey, Y. Chi, and X.-S. Hua, “Alpha-IoU: A family of power intersection over union losses for bounding box regression,” in Proc. NeurIPS, 2021.

[16] B. D. O. Anderson and J. B. Moore, Optimal Control: Linear Quadratic Methods, Prentice Hall, 1990.

[17] H. Wang, X. Liu, and S. Liu, “Fuzzy adaptive PID control for autonomous mobile robot,” IEEE Access, vol. 8, pp. 165 412–165 422, 2020.

[18] D. Sun, F. Liao, and Y. Lou, “Fuzzy PID control of a teleoperated surgical manipulator,” IEEE/ASME Trans. Mechatronics, vol. 25, no. 5, pp. 2342–2352, 2020.

[19] F. Chaumette and S. Hutchinson, “Visual servo control. I. Basic approaches,” IEEE Robot. Autom. Mag., vol. 13, no. 4, pp. 82–90, 2006.

[20] S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” Int. J. Robot. Res., vol. 37, no. 4–5, pp. 421–436, 2018.

[21] J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg, “Dex-Net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics,” in Proc. RSS, 2017.

[22] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “GhostNet: more features from cheap operations,” in Proc. CVPR, 2020, pp. 1577–1586.