Conservative Learning Method for Diabetic Issues Using Reinforcement Learning & Brain Strom Optimization

S.anusuya, K. Anandapadmanabhan

doi:10.52783/tjjpt.v44.i5.2527

PDF

Published: Nov 29, 2023

DOI: https://doi.org/10.52783/tjjpt.v44.i5.2527

Keywords:

Reinforcement Learning (RL), recommender systems (RS), Supervised Negative Q-Learning (SNQN), Supervised Advantage Actor-Critic (SA2C), Brain Strom Optimization.

S.anusuya, K. Anandapadmanabhan

Abstract

Promising outcomes are shown when attention-based sequential recommendation techniques effectively capture the dynamic interests of users in previous interactions. Recent work has started incorporating reinforcement learning (RL) into these models in addition to improving user representations. We offer a recommender system (RS) that combines direct user feedback into the compensating format and takes into account significant features to deliver a more personalized experience, enabling development, by building up consecutive suggestions for RL problems with compensating signals. Recent RS reinforcement learning systems integrate Supervised Negative Q Learning (SNQN) and Supervised Superior Actor Critic (SA2C), yet these issues still exist. For instance, because there aren't many positive compensation signals, Q-value estimates are typically biased toward negative values.Additionally, the precise timestamp of the sequence has a significant influence on the Q value. We suggest using contrast-based objectives using Brain Strom optimization in conjunction with extensions to solve datasets with larger ranges in order to resolve the aforementioned problems. Furthermore, we are aware that using negative sampling could lead to possible instability problems. As a result, we present an enhanced strategy to address these issues with greater effectiveness.

Issue

Vol. 44 No. 5 (2023)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details