Deep reinforcement based power allocation for the max-min optimization in non-orthogonal multiple access | Kütüphane.osmanlica.com

Deep reinforcement based power allocation for the max-min optimization in non-orthogonal multiple access

İsim Deep reinforcement based power allocation for the max-min optimization in non-orthogonal multiple access
Yazar Siddiqi, U. F., Sait, S. M., Uysal, Murat
Basım Tarihi: 2020
Basım Yeri - IEEE
Konu NOMA, Optimization, Silicon carbide, Resource management, Task analysis, Relays, Reinforcement learning, Non-orthogonal multiplexing, Double deep Q learning, Deep reinforcement learning, Non-convex optimization, Power-domain NOMA
Tür Süreli Yayın
Dil İngilizce
Dijital Evet
Yazma Hayır
Kütüphane: Özyeğin Üniversitesi
Demirbaş Numarası 2169-3536
Kayıt Numarası 446ab2ca-ca20-454a-b5f2-2f4a155f5270
Lokasyon Electrical & Electronics Engineering
Tarih 2020
Notlar Deanship of Scientific Research, King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia
Örnek Metin NOMA is a radio access technique that multiplexes several users over the frequency resource and provides high throughput and fairness among different users. The maximization of the minimum the data-rate, also known as max-min, is a popular approach to ensure fairness among the users. NOMA optimizes the transmission power (or power-coefficients) of the users to perform max-min. The problem is a constrained non-convex optimization for users greater than two. We propose to solve this problem using the Double Deep Q Learning (DDQL) technique, a popular method of reinforcement learning. The DDQL technique employs a Deep Q- Network to learn to choose optimal actions to optimize users' power-coefficients. The model of the Markov Decision Process (MDP) is critical to the success of the DDQL method, and helps the DQN to learn to take better actions. An MDP model is proposed in which the state consists of the power-coefficients values, data-rate of users, and vectors indicating which of the power-coefficients can be increased or decreased. An action simultaneously increases the power-coefficient of one user and reduces another user's power-coefficient by the same amount. The amount of change can be small or large. The action-space contains all possible ways to alter the values of any two users at a time. DQN consists of a convolutional layer and fully connected layers. We compared the proposed method with the sequential least squares programming and trust-region constrained algorithms and found that the proposed method can produce competitive results.
DOI 10.1109/ACCESS.2020.3038923
Cilt 8
Kaynağa git Özyeğin Üniversitesi Özyeğin Üniversitesi
Özyeğin Üniversitesi Özyeğin Üniversitesi
Kaynağa git

Deep reinforcement based power allocation for the max-min optimization in non-orthogonal multiple access

Yazar Siddiqi, U. F., Sait, S. M., Uysal, Murat
Basım Tarihi 2020
Basım Yeri - IEEE
Konu NOMA, Optimization, Silicon carbide, Resource management, Task analysis, Relays, Reinforcement learning, Non-orthogonal multiplexing, Double deep Q learning, Deep reinforcement learning, Non-convex optimization, Power-domain NOMA
Tür Süreli Yayın
Dil İngilizce
Dijital Evet
Yazma Hayır
Kütüphane Özyeğin Üniversitesi
Demirbaş Numarası 2169-3536
Kayıt Numarası 446ab2ca-ca20-454a-b5f2-2f4a155f5270
Lokasyon Electrical & Electronics Engineering
Tarih 2020
Notlar Deanship of Scientific Research, King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia
Örnek Metin NOMA is a radio access technique that multiplexes several users over the frequency resource and provides high throughput and fairness among different users. The maximization of the minimum the data-rate, also known as max-min, is a popular approach to ensure fairness among the users. NOMA optimizes the transmission power (or power-coefficients) of the users to perform max-min. The problem is a constrained non-convex optimization for users greater than two. We propose to solve this problem using the Double Deep Q Learning (DDQL) technique, a popular method of reinforcement learning. The DDQL technique employs a Deep Q- Network to learn to choose optimal actions to optimize users' power-coefficients. The model of the Markov Decision Process (MDP) is critical to the success of the DDQL method, and helps the DQN to learn to take better actions. An MDP model is proposed in which the state consists of the power-coefficients values, data-rate of users, and vectors indicating which of the power-coefficients can be increased or decreased. An action simultaneously increases the power-coefficient of one user and reduces another user's power-coefficient by the same amount. The amount of change can be small or large. The action-space contains all possible ways to alter the values of any two users at a time. DQN consists of a convolutional layer and fully connected layers. We compared the proposed method with the sequential least squares programming and trust-region constrained algorithms and found that the proposed method can produce competitive results.
DOI 10.1109/ACCESS.2020.3038923
Cilt 8
Özyeğin Üniversitesi
Özyeğin Üniversitesi yönlendiriliyorsunuz...

Lütfen bekleyiniz.