Environmental design. (a) The 2D network world environment used in Experiment 1. (b) To study the properties of optimal reward, we made several modifications to the global network environment. Top row: In a one-time learning environment, the agent can choose to remain at the food location continuously after arriving at it. In the lifelong learning environment, the agent was teleported to a random location in the network once it reached the food state. Middle row: In the stationary environment, the food remained in the same location for the lifetime of the agent. In the non-stationary environment, the food changed position during the lifetime of the agent. Bottom row: We used a 7 x 7 grid to simulate a dense reward setup. To simulate a sparse reward setup, we increased the grid size to 13 x 13. Credit: Computational Biology PLOS (2022). DOI: 10.1371 / journal.pcbi.1010316
Three researchers, two from Princeton University and the other from the Max Planck Institute for Biological Cybernetics, have developed simulations based on reinforcement learning that show that the human desire to always want more has evolved as a way to accelerate learning. In their paper published in Open Access Computational Biology PLOSRacht Dubey, Thomas Griffiths, and Peter Dayan describe the factors that went into their simulations.
Researchers are studying human behavior They are often puzzled by the seemingly contradictory desires of people. Many people have a constant desire for more of a particular thing, even though they know that fulfilling these desires may not lead to the desired result. Many people want more and more money, for example, with the idea that more money will make life easier, making them happier. But a host of studies have shown that making more money rarely makes people happier (except for those who start at a very low income level). In this new effort, researchers sought to better understand why people evolved in this way. To this end, they built a simulation to mimic the way humans respond emotionally to stimuli, such as achieving goals. To understand why people feel the way they feel better, they added checkpoints that can be used as a measure of happiness.
The simulation was based on Learning Enhancement, where people (or a machine) keep doing things that provide a positive reward and stop doing things that provide no reward or a negative reward. The researchers also added a simulation emotional reactions For the well-known negative effects of habituation and comparison, people become less happy over time when they get used to something new and become less happy when they see that someone else has more of the things they want.
While running the simulations, the researchers found that they achieved goals faster when habituation and comparison began — a suggestion that such emotional reactions may also play a role in faster learning in humans. They also found that simulation It becomes less “happy” when faced with more choices regarding the possible achievable options than the few available to choose from.
Researchers suggest that the reason people are prone to falling into an endless cycle of constantly wanting more is that, in general, it helps humans learn faster.
Rachette Dube et al., The Pursuit of Happiness: An Enhanced Educational Perspective on Habituation and Comparisons, Computational Biology PLOS (2022). DOI: 10.1371 / journal.pcbi.1010316
© 2022 Science X Network
the quote: Reinforcement Learning-Based Simulations Show Human Desire to Always Want More May Accelerate Learning (2022, Aug 5) Retrieved Aug 6, 2022 from https://phys.org/news/2022-08-learningbased-simulations-human -desire. programming language
This document is subject to copyright. Notwithstanding any fair dealing for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.