Actor-critic reinforcement learning for bidding in bilateral negotiation

Name: Actor-critic reinforcement learning for bidding in bilateral negotiation
Author: Arslan, Furkan, Aydoğan, Reyhan

İsim	Actor-critic reinforcement learning for bidding in bilateral negotiation
Yazar	Arslan, Furkan, Aydoğan, Reyhan
Basım Tarihi:	2022
Basım Yeri	- TÜBİTAK
Konu	Automated bilateral negotiation, Bidding strategy, Deep reinforcement learning, Entropy reinforcement learning, Imitation learning, Multi-agent systems
Tür	Süreli Yayın
Dil	İngilizce
Dijital	Evet
Yazma	Hayır
Kütüphane:	Özyeğin Üniversitesi
Demirbaş Numarası	1300-0632
Kayıt Numarası	6fe911b5-a329-40d0-bb4d-afeb4f1edf3a
Lokasyon	Computer Science
Tarih	2022
Notlar	Scientific and Research Council of Turkey ; TÜBİTAK
Örnek Metin	Designing an effective and intelligent bidding strategy is one of the most compelling research challenges in automated negotiation, where software agents negotiate with each other to find a mutual agreement when there is a conflict of interests. Instead of designing a hand-crafted decision-making module, this work proposes a novel bidding strategy adopting an actor-critic reinforcement learning approach, which learns what to offer in a bilateral negotiation. An entropy reinforcement learning framework called Soft Actor-Critic (SAC) is applied to the bidding problem, and a self-play approach is employed to train the model. Our model learns to produce the target utility of the coming offer based on previous offer exchanges and remaining time. Furthermore, an imitation learning approach called behavior cloning is adopted to speed up the learning process. Also, a novel reward function is introduced that does take not only the agent’s own utility but also the opponent’s utility at the end of the negotiation. The developed agent is empirically evaluated. Thus, a large number of negotiation sessions are run against a variety of opponents selected in different domains varying in size and opposition. The agent’s performance is compared with its opponents and the performance of the baseline agents negotiating with the same opponents. The empirical results show that our agent successfully negotiates against challenging opponents in different negotiation scenarios without requiring any former information about the opponent or domain in advance. Furthermore, it achieves better results than the baseline agents regarding the received utility at the end of the successful negotiations.
DOI	10.55730/1300-0632.3899
Cilt	30

Kaynağa git Özyeğin Üniversitesi

Aramaya Dön

Özyeğin Üniversitesi

Kaynağa git

Actor-critic reinforcement learning for bidding in bilateral negotiation

Yazar Arslan, Furkan, Aydoğan, Reyhan

Basım Tarihi 2022

Basım Yeri - TÜBİTAK

Konu Automated bilateral negotiation, Bidding strategy, Deep reinforcement learning, Entropy reinforcement learning, Imitation learning, Multi-agent systems

Tür Süreli Yayın

Dil İngilizce

Dijital Evet

Yazma Hayır

Kütüphane Özyeğin Üniversitesi

Demirbaş Numarası 1300-0632

Kayıt Numarası 6fe911b5-a329-40d0-bb4d-afeb4f1edf3a

Lokasyon Computer Science

Tarih 2022

Notlar Scientific and Research Council of Turkey ; TÜBİTAK

Örnek Metin Designing an effective and intelligent bidding strategy is one of the most compelling research challenges in automated negotiation, where software agents negotiate with each other to find a mutual agreement when there is a conflict of interests. Instead of designing a hand-crafted decision-making module, this work proposes a novel bidding strategy adopting an actor-critic reinforcement learning approach, which learns what to offer in a bilateral negotiation. An entropy reinforcement learning framework called Soft Actor-Critic (SAC) is applied to the bidding problem, and a self-play approach is employed to train the model. Our model learns to produce the target utility of the coming offer based on previous offer exchanges and remaining time. Furthermore, an imitation learning approach called behavior cloning is adopted to speed up the learning process. Also, a novel reward function is introduced that does take not only the agent’s own utility but also the opponent’s utility at the end of the negotiation. The developed agent is empirically evaluated. Thus, a large number of negotiation sessions are run against a variety of opponents selected in different domains varying in size and opposition. The agent’s performance is compared with its opponents and the performance of the baseline agents negotiating with the same opponents. The empirical results show that our agent successfully negotiates against challenging opponents in different negotiation scenarios without requiring any former information about the opponent or domain in advance. Furthermore, it achieves better results than the baseline agents regarding the received utility at the end of the successful negotiations.

DOI 10.55730/1300-0632.3899

Cilt 30