exploration vs. exploitation
Hansjörg Neth, Neele Engelmann, Ralf Mayrhofer
Abstract: Melioration — defined as choosing a lesser, local gain over a greater longer term gain — is a behavioral tendency that people and pigeons share. As such, the empirical occurrence of meliorating behavior has frequently been interpreted as evidence that the mechanisms of human choice violate the norms of economic rationality. In some environments, the relationship between actions and outcomes is known. In this case, the rationality of choice behavior can be evaluated in terms of how successfully it maximizes utility given knowledge of the environmental contingencies. In most complex environments, however, the relationship between actions and future outcomes is uncertain and must be learned from experience. When the difficulty of this learning challenge is taken into account, it is not evident that melioration represents suboptimal choice behavior.
|(…) we make search in our memory for a forgotten idea, just as we rummage our house for a lost object. In both cases we visit what seems to us the probable neighborhood of that which we miss. We turn over the things under which, or within which, or alongside of which, it may possibly be;
and if it lies near them, it soon comes to view.
|William James (1890), The Principles of Psychology, p. 654|
[Copyright neth.de, 2007–2014]:
Steve Payne, Geoff Duggan, Hans Neth (2007).
Discretionary task interleaving: Heuristics for time allocation in cognitive foraging.
Journal article in JEP:G.
Abstract: When participants allocated time across 2 tasks (in which they generated as many words as possible from a fixed set of letters), they made frequent switches. This allowed them to allocate more time to the more productive task (i.e., the set of letters from which more words could be generated) even though times between the last word and the switch decision (“giving-up times”) were higher in the less productive task. These findings were reliable across 2 experiments using Scrabble tasks and 1 experiment using word-search puzzles. Switch decisions appeared relatively unaffected by the ease of the competing task or by explicit information about tasks’ potential gain. The authors propose that switch decisions reflected a dual orientation to the experimental tasks. First, there was a sensitivity to continuous rate of return — an information-foraging orientation that produced a tendency to switch in keeping with R. F. Green’s (1984) rule and a tendency to stay longer in more rewarding tasks. Second, there was a tendency to switch tasks after subgoal completion. A model combining these tendencies predicted all the reliable effects in the experimental data.
|There is no reason to suppose that most human beings are
engaged in maximizing anything unless it be unhappiness,
and even this with incomplete success.
|R.H. Coase (1980), The Firm, the Market, and the Law, p. 4|
[Copyright neth.de, 2006]:
Hans Neth, Chris Sims, Wayne Gray (2006). Melioration dominates maximization: Stable suboptimal performance despite global feedback. Paper presented at CogSci 2006.
Abstract: Situations that present individuals with a conflict between local and global gains often evoke a behavioral pattern known as melioration — a preference for immediate rewards over higher long-term gains. Using a variant of a binary forced- choice paradigm by Tunney & Shanks (2002), we explored the potential role of global feedback as a means to reduce this bias.