FUSION SPARSE AND SHAPING REWARD FUNCTION IN SOFT ACTOR-CRITIC DEEP REINFORCEMENT LEARNING FOR MOBILE ROBOT NAVIGATION

Mohamad Hafiz Abu Bakar; Abu Ubaidah Shamsudin; Zubair Adil Soomro; Satoshi Tadokoro; C. J. Salaan

doi:10.11113/jurnalteknologi.v86.20147

Authors

Mohamad Hafiz Abu Bakar Faculty of Electrical and Electronic Engineering, Universiti Tun Hussein Onn Malaysia, 86400 Batu Pahat, Johor, Malaysia https://orcid.org/0009-0005-2572-6718
Abu Ubaidah Shamsudin Faculty of Electrical and Electronic Engineering, Universiti Tun Hussein Onn Malaysia, 86400 Batu Pahat, Johor, Malaysia https://orcid.org/0000-0002-7917-5967
Zubair Adil Soomro Faculty of Electrical and Electronic Engineering, Universiti Tun Hussein Onn Malaysia, 86400 Batu Pahat, Johor, Malaysia https://orcid.org/0000-0003-0297-3909
Satoshi Tadokoro Tohoku University, 2 Chome-1-1 Katahira, Aoba Ward, Sendai, Miyagi 980-8577, Japan
C. J. Salaan Department of Electrical Engineering and Technology, MSU-Iligan Institute of Technology, Andres Bonifacio Ave, Iligan City, 9200 Lanao del Norte, Philippines

DOI:

https://doi.org/10.11113/jurnalteknologi.v86.20147

Keywords:

Soft Actor Critic Deep Reinforcement Learning (SAC DRL), Deep Reinforcement Learning, Mobile robot navigation, Reward function, Sparse reward, Shaping reward

Abstract

Nowadays, the advancement in autonomous robots is the latest influenced by the development of a world surrounded by new technologies. Deep Reinforcement Learning (DRL) allows systems to operate automatically, so the robot will learn the next movement based on the interaction with the environment. Moreover, since robots require continuous action, Soft Actor Critic Deep Reinforcement Learning (SAC DRL) is considered the latest DRL approach solution. SAC is used because its ability to control continuous action to produce more accurate movements. SAC fundamental is robust against unpredictability, but some weaknesses have been identified, particularly in the exploration process for accuracy learning with faster maturity. To address this issue, the study identified a solution using a reward function appropriate for the system to guide in the learning process. This research proposes several types of reward functions based on sparse and shaping reward in SAC method to investigate the effectiveness of mobile robot learning. Finally, the experiment shows that using fusion sparse and shaping rewards in the SAC DRL successfully navigates to the target position and can also increase accuracy based on the average error result of 4.99%.

References

X. Yang, M. Moallem, and R. V. Patel. 2005. A Layered Goal-oriented Fuzzy Motion Planning Strategy for Mobile Robot Navigation. IEEE Trans. Syst. Man, Cybern. Part B Cybern. 35(6): 1214-1224. Doi: 10.1109/TSMCB.2005.850177.

M. Faisal, M. Algabri, B. M. Abdelkader, H. Dhahri, and M. M. Al Rahhal. 2017. Human Expertise in Mobile Robot Navigation. IEEE Access. 6: 1694-1705. Doi: 10.1109/ACCESS.2017.2780082.

M. P. Deisenroth. 2013. A Survey on Policy Search for Robotics. Found. Trends® Robot. 2(1-2): 1-142. Doi: 10.1561/2300000021.

A. S. Polydoros and L. Nalpantidis. 2017. Survey of Model-Based Reinforcement Learning: Applications on Robotics. J. Intell. Robot. Syst. Theory Appl. 86(2): 53-173. Doi: 10.1007/s10846-017-0468-y.

J. Xiang, Q. Li, X. Dong, and Z. Ren. 2019. Continuous Control with Deep Reinforcement Learning for Mobile Robot Navigation. Proc. - 2019 Chinese Autom. Congr. CAC 2019. 1501-1506. Doi: 10.1109/CAC48633.2019.8996652.

J. C. de Jesus, V. A. Kich, A. H. Kolling, R. B. Grando, M. A. de S. L. Cuadros, and D. F. T. Gamarra. 2021. Soft Actor-Critic for Navigation of Mobile Robots. J. Intell. Robot. Syst. Theory Appl. 102(2): 1-11. Doi: 10.1007/S10846-021-01367-5/METRICS.

G. Chen et al. 2020. Robot Navigation with Map-Based Deep Reinforcement Learning. 2020 IEEE Int. Conf. Networking, Sens. Control. ICNSC 2020. Doi: 10.1109/ICNSC48988.2020.9238090.

R. N. Das, K., & Behera. 2017. A Survey On Machine Learning: Concept, Algorithms and Applications. Int. J. Innov. Res. Comput. Commun. Eng. 5(2): 1301-1309. Doi: 10.15680/IJIRCCE.2017.

S. Irfan, A. Meerza, M. Islam, and M. M. Uzzal. 2019. Q-Learning Based Particle Swarm Optimization Algorithm for Optimal Path Planning of Swarm of Mobile Robots. 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT). Doi: 10.1109/ICASERT.2019.8934450.

X. Luo, Y. Gao, S. Huang, Y. Zhao, and S. Zhang. 2019. Modification of Q-learning to Adapt to the Randomness of Environment. 2019 Int. Conf. Control. Autom. Inf. Sci. 1-4. Doi: 10.1109/ICCAIS46528.2019.9074718.

C. S. Arvind and J. Senthilnath. 2019. Autonomous RL: Autonomous Vehicle Obstacle Avoidance in a Dynamic Environment using MLP-SARSA Reinforcement Learning. 2019 IEEE 5th Int. Conf. Mechatronics Syst. Robot. ICMSR 2019. 120-124. Doi: 10.1109/ICMSR.2019.8835462.

V. Mnih et al. 2015. Human-level Control through Deep Reinforcement Learning. Nature. 518(7540): 529-533. Doi: 10.1038/nature14236.

Y. Wang, J. Tong, T. Y. Song, and Z. H. Wan. 2018. Unmanned Surface Vehicle Course Tracking Control based on Neural Network and Deep Deterministic Policy Gradient Algorithm. 2018 Ocean - MTS/IEEE Kobe Techno-Oceans, Ocean - Kobe 2018. Doi: 10.1109/OCEANSKOBE.2018.8559329.

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. PMLR. 1861-1870. Accessed: Feb. 14, 2023. [Online]. Available: https://proceedings.mlr.press/v80/haarnoja18b.html.

R. S. Sutton and A. G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.

X. Yu, Y. Sun, X. Wang, and G. Zhang. 2021. End-to-End AUV Motion Planning Method Based on Soft Actor-Critic. Sensors. 21(17): 5893. Doi: 10.3390/S21175893.

R. Takehara and T. Gonsalves. 2021. Autonomous Car Parking System using Deep Reinforcement Learning. 2nd Int. Conf. Innov. Creat. Inf. Technol. ICITech 2021. 85-89. Doi: 10.1109/ICITECH50181.2021.9590169.

Q. Zhang, J. Lin, Q. Sha, B. He, and G. Li. 2020. Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle. IEEE Access. 8: 24258-24268. Doi: 10.1109/ACCESS.2020.2970433.

C. Wu et al. 2019. UAV Autonomous Target Search based on Deep Reinforcement Learning in Complex Disaster Scene. IEEE Access. 7: 117227-117245. Doi: 10.1109/ACCESS.2019.2933002.

K. Zhu and T. Zhang. 2021. Deep Reinforcement Learning based Mobile Robot Navigation: A Review. Tsinghua Sci. Technol. 26(5): 674-691. Doi: 10.26599/TST.2021.9010012.

Mohamad Hafiz Abu Bakar, Abu Ubaidah bin Shamsudin, Ruzairi Abdul Rahim, Zubair Adil Soomro, and Andi Adrianshah. 2023. Comparison Method Q-Learning and SARSA for Simulation of Drone Controller using Reinforcement Learning. J. Adv. Res. Appl. Sci. Eng. Technol. 30(3 SE-Articles): 69-78. Doi: 10.37934/araset.30.3.6978.

L. Meng, R. Gorbet, and D. Kulić. 2020. The Effect of Multi-step Methods on Overestimation in Deep Reinforcement Learning. Proc. - Int. Conf. Pattern Recognit. 9740-9747. Doi: 10.1109/ICPR48806.2021.9413027.

Z. Yang, K. Merrick, S. Member, L. Jin, H. A. Abbass, and S. Member. 2018. Hierarchical Deep Reinforcement Learning for Continuous Action Control. IEEE Trans. neural networks Learn. Syst. 29(11): 5174-5184.

J. Xie, Z. Shao, Y. Li, Y. Guan, and J. Tan. 2019. Deep Reinforcement Learning with Optimized Reward Functions for Robotic Trajectory Planning. IEEE Access. 7: 105669-105679. Doi: 10.1109/ACCESS.2019.2932257.

J. Hare. 2019. Dealing with Sparse Rewards in Reinforcement Learning. arXiv:1910.09281.

C. Wang, J. Wang, J. Wang, and X. Zhang. 2020. Deep-Reinforcement-Learning-based Autonomous UAV Navigation with Sparse Rewards. IEEE Internet Things J. 7(7): 6180-6190. Doi: 10.1109/JIOT.2020.2973193.

M. Riedmiller et al. 2018. Learning by Playing - Solving Sparse reward Tasks from Scratch. 35th Int. Conf. Mach. Learn. ICML 2018. 10: 6910-6919.

C. Wang, J. Wang, Y. Shen, and X. Zhang. 2019. Autonomous Navigation of UAVs in Large-Scale Complex Environments: A Deep Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 68(3): 2124-2136. Doi: 10.1109/TVT.2018.2890773.

Y. Hu, Y. Hua, W. Liu, and J. Zhu. 2021. Reward Shaping based Federated Reinforcement Learning. IEEE Access. 9: 67259-67267. Doi: 10.1109/ACCESS.2021.3074221.

A. Laud and G. DeJong. 2003. The Influence of Reward on the Speed of Reinforcement Learning: An Analysis of Shaping. Proceedings, Twent. Int. Conf. Mach. Learn. 1: 440-447.

W. Wang, Z. Wu, H. Luo, and B. Zhang. 2022. Path Planning Method of Mobile Robot Using Improved Deep Reinforcement Learning. J. Electr. Comput. Eng. Doi: 10.1155/2022/5433988.

A. Trott, S. Research, S. Zheng, C. Xiong, and R. Socher. 2019. Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards. Adv. Neural Inf. Process. Syst. 32.

A. Hussein, E. Elyan, M. M. Gaber, and C. Jayne. 2017. Deep Reward Shaping from Demonstrations. Proc. Int. Jt. Conf. Neural Networks. 510-517. Doi: 10.1109/IJCNN.2017.7965896.

Soft Actor-Critic Agents - MATLAB & Simulink. https://www.mathworks.com/help/reinforcement-learning/ug/sac-agents.html (accessed Feb. 15, 2023).

TensorFlow for Deep Learning [Book]. https://www.oreilly.com/library/view/tensorflow-for-deep/9781491980446/ (accessed Feb. 15, 2023).

A. D. Rasamoelina, F. Adjailia, and P. Sincak. 2020. A Review of Activation Function for Artificial Neural Network. SAMI 2020 - IEEE 18th World Symp. Appl. Mach. Intell. Informatics, Proc. 281-286. Doi: 10.1109/SAMI48414.2020.9108717.

K. M. Lynch and F. C. Park. 2017. Modern Robotics. Cambridge University Press.

Mobile Robot Kinematics Equations - MATLAB & Simulink. https://www.mathworks.com/help/robotics/ug/mobile-robot-kinematics-equations.html (accessed Aug. 23, 2023).