Optimized Autonomous Drone Navigation Using Deep Q-Network Based Reinforcement Learning
DOI:
https://doi.org/10.47392/IRJAEM.2026.0094Keywords:
Drone Navigation, Path Optimization, Reinforcement Learning, Deep Q-Network, Autonomous Systems, Machine LearningAbstract
One of the critical problems in the contemporary intelligent system is the autonomous navigation of Unmanned Aerial Vehicles (UAVs) in obstacles-prone environments, especially when it comes to tasks such as surveillance, disaster response, logistics, infrastructure inspections, and smart mobility. Conventional path planning algorithms are highly dependent on pre-programmed environmental profile and deterministic optimization policies which are usually not flexible within uncertain or dynamically evolving settings. Furthermore, the classical shortest-path models do not usually maximize long-term operation safety and decision quality, but often maximize the geometric distance. Reinforcement Learning (RL) offers a viable alternative since it allows autonomous actors to acquire optimal policies to navigate the environment through interaction with the environment. However, traditional Q-Learning using tabular methods have a weakness in scalability, as the size of the state-action space grows. Deep reinforcement learning Deep reinforcement learning (also known as the Deep neural networks with reinforcement learning) deals with this weakness by estimating value functions with nonlinear function approximators that can generalize high-dimensional state spaces. The present paper suggests an approach to drone path optimization in obstacle-controlled settings, in which a Deep Q-Network (DQN) is used. The problem of navigation is developed as a Markov Decision Process (MDP), in which the drone agent becomes a learner to maximize its cumulative discounted reward by balancing efficiency and collision avoidance of the path. In DQN architecture, experience replay and synchronization of target network are introduced to improve the stability of the training process and reduce the phenomena of divergence due to bootstrapped target estimation. A broad scale experimental testing is done in a grid-based simulation space. The suggested DQN model is contrasted with a original tabular Q-Learning baseline with the identical environmental and reward characteristics. Findings indicate that the DQN methodology is more effective in terms of a faster convergence, high cumulative rewards, better policy stability and greatly reduced collision rates.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 International Research Journal on Advanced Engineering and Management (IRJAEM)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
.