Deep Q-learning based optimization of VLC systems with dynamic time-division multiplexing | Kütüphane.osmanlica.com

Deep Q-learning based optimization of VLC systems with dynamic time-division multiplexing

İsim Deep Q-learning based optimization of VLC systems with dynamic time-division multiplexing
Yazar Siddiqi, U. F., Sait, S. M., Uysal, Murat
Basım Tarihi: 2020
Basım Yeri - IEEE
Konu Deep Q learning, Deep reinforcement learning, Dynamic time division multiple access, Visible light communications, Optimization, Non-deterministic algorithms
Tür Süreli Yayın
Dil İngilizce
Dijital Evet
Yazma Hayır
Kütüphane: Özyeğin Üniversitesi
Demirbaş Numarası 2169-3536
Kayıt Numarası a5e1b230-c4a2-4ce7-8a8f-de5a74dd2169
Lokasyon Electrical & Electronics Engineering
Tarih 2020
Notlar King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia
Örnek Metin The traditional method to solve nondeterministic-polynomial-time (NP)-hard optimization problems is to apply meta-heuristic algorithms. In contrast, Deep Q Learning (DQL) uses memory of experience and deep neural network (DNN) to choose steps and progress towards solving the problem. The dynamic time-division multiple access (DTDMA) scheme is a viable transmission method in visible light communication (VLC) systems. In DTDMA systems, the time-slots of the users are adjusted to maximize the spectral efficiency (SE) of the system. The users in a VLC network have different channel gains because of their physical locations, and the use of variable time-slots can improve the system performance. In this work, we propose a Markov decision process (MDP) model of the DTDMA-based VLC system. The MDP model integrates into deep Q learning (DQL) and provides information to it according to the behavior of the VLC system and the objective to maximize the SE. When we use the proposed MDP model in deep Q learning with experienced replay algorithm, we provide the light emitting diode (LED)-based transmitter an autonomy to solve the problem so it can adjust the time-slots of users using the data collected by device in the past. The proposed model includes definitions of the state, actions, and rewards based on the specific characteristics of the problem. Simulations show that the performance of the proposed DQL method can produce results that are competitive to the well-known metaheuristic algorithms, such as Simulated Annealing and Tabu search algorithms.
DOI 10.1109/ACCESS.2020.3005885
Cilt 8
Kaynağa git Özyeğin Üniversitesi Özyeğin Üniversitesi
Özyeğin Üniversitesi Özyeğin Üniversitesi
Kaynağa git

Deep Q-learning based optimization of VLC systems with dynamic time-division multiplexing

Yazar Siddiqi, U. F., Sait, S. M., Uysal, Murat
Basım Tarihi 2020
Basım Yeri - IEEE
Konu Deep Q learning, Deep reinforcement learning, Dynamic time division multiple access, Visible light communications, Optimization, Non-deterministic algorithms
Tür Süreli Yayın
Dil İngilizce
Dijital Evet
Yazma Hayır
Kütüphane Özyeğin Üniversitesi
Demirbaş Numarası 2169-3536
Kayıt Numarası a5e1b230-c4a2-4ce7-8a8f-de5a74dd2169
Lokasyon Electrical & Electronics Engineering
Tarih 2020
Notlar King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia
Örnek Metin The traditional method to solve nondeterministic-polynomial-time (NP)-hard optimization problems is to apply meta-heuristic algorithms. In contrast, Deep Q Learning (DQL) uses memory of experience and deep neural network (DNN) to choose steps and progress towards solving the problem. The dynamic time-division multiple access (DTDMA) scheme is a viable transmission method in visible light communication (VLC) systems. In DTDMA systems, the time-slots of the users are adjusted to maximize the spectral efficiency (SE) of the system. The users in a VLC network have different channel gains because of their physical locations, and the use of variable time-slots can improve the system performance. In this work, we propose a Markov decision process (MDP) model of the DTDMA-based VLC system. The MDP model integrates into deep Q learning (DQL) and provides information to it according to the behavior of the VLC system and the objective to maximize the SE. When we use the proposed MDP model in deep Q learning with experienced replay algorithm, we provide the light emitting diode (LED)-based transmitter an autonomy to solve the problem so it can adjust the time-slots of users using the data collected by device in the past. The proposed model includes definitions of the state, actions, and rewards based on the specific characteristics of the problem. Simulations show that the performance of the proposed DQL method can produce results that are competitive to the well-known metaheuristic algorithms, such as Simulated Annealing and Tabu search algorithms.
DOI 10.1109/ACCESS.2020.3005885
Cilt 8
Özyeğin Üniversitesi
Özyeğin Üniversitesi yönlendiriliyorsunuz...

Lütfen bekleyiniz.