Optimized Autonomous Drone Navigation Using Deep Q-Network Based Reinforcement Learning

Vidhu Chaudhary; Sheenu Rizvi

doi:10.47392/IRJAEM.2026.0094

Authors

Vidhu Chaudhary UG Scholar, Dept. of CSE, Amity University, Lucknow, Uttar Pradesh, India Author
Sheenu Rizvi Associate professor, Dept. of CSE, Amity University, Lucknow, Uttar Pradesh, India Author

DOI:

https://doi.org/10.47392/IRJAEM.2026.0094

Keywords:

Drone Navigation, Path Optimization, Reinforcement Learning, Deep Q-Network, Autonomous Systems, Machine Learning

Abstract

One of the critical problems in the contemporary intelligent system is the autonomous navigation of Unmanned Aerial Vehicles (UAVs) in obstacles-prone environments, especially when it comes to tasks such as surveillance, disaster response, logistics, infrastructure inspections, and smart mobility. Conventional path planning algorithms are highly dependent on pre-programmed environmental profile and deterministic optimization policies which are usually not flexible within uncertain or dynamically evolving settings. Furthermore, the classical shortest-path models do not usually maximize long-term operation safety and decision quality, but often maximize the geometric distance. Reinforcement Learning (RL) offers a viable alternative since it allows autonomous actors to acquire optimal policies to navigate the environment through interaction with the environment. However, traditional Q-Learning using tabular methods have a weakness in scalability, as the size of the state-action space grows. Deep reinforcement learning Deep reinforcement learning (also known as the Deep neural networks with reinforcement learning) deals with this weakness by estimating value functions with nonlinear function approximators that can generalize high-dimensional state spaces. The present paper suggests an approach to drone path optimization in obstacle-controlled settings, in which a Deep Q-Network (DQN) is used. The problem of navigation is developed as a Markov Decision Process (MDP), in which the drone agent becomes a learner to maximize its cumulative discounted reward by balancing efficiency and collision avoidance of the path. In DQN architecture, experience replay and synchronization of target network are introduced to improve the stability of the training process and reduce the phenomena of divergence due to bootstrapped target estimation. A broad scale experimental testing is done in a grid-based simulation space. The suggested DQN model is contrasted with a original tabular Q-Learning baseline with the identical environmental and reward characteristics. Findings indicate that the DQN methodology is more effective in terms of a faster convergence, high cumulative rewards, better policy stability and greatly reduced collision rates.

Downloads

Download data is not yet available.

Optimized Autonomous Drone Navigation Using Deep Q-Network Based Reinforcement Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

Keywords

Language

Information

Make a Submission