Leerburg / Dog Training Articles / Using Intermittent Rewards in Training: The Concept of Differential Reinforcement

Using Intermittent Rewards in Training: The Concept of Differential Reinforcement

By Tim Frawley, M.A | Last modified: February 16, 2006

URL Copied!

Ed's Note: My brother is a behavioral psychologist and consultant in Canada. He successfully works with special needs people, many with IQs of less than 50. Tim and I get into very interesting discussions on learning concepts and I am always surprised at how the work he does with his patients applies to dog training. In my Basic Dog Obedience DVD. I use examples that Tim gave me to explain a lot of the training concepts.

Tim uses the information in the article to explain his work to the caregivers of some of his patients. It applies directly to dog training. I believe it will help dog trainers use the correct terminology when discussing training with other dog trainers.

The article is certainly food for thought for all dog trainers.

List of terms discussed in this article that are used in Differential Reinforcement:

Learning
Behavior
Reinforcement
Positive Reinforcement
Negative Reinforcement
Punishment
Schedules of Reinforcement
Continuous Reinforcement
Fixed Schedules
Variable Schedules
Ratio Strain
Superstitious Behavior (also referred to as "Extinction Burst")
Consistency
Communication

The single most effective and efficient technique available to those who work in the field of Human Services is Differential Reinforcement. Properly implemented, it will "solve" more than 80% of all the problems that you encounter when working with people. It will allow you to accomplish this by focusing on building new skills through the use of positive reinforcement, rather than punishing existing behaviors. It is not new. It is not difficult to explain. It is very difficult to do consistently. To use the technique effectively, it is critical to understand how and why it works the way that it does. Hopefully, this paper will help to provide you with that understanding.

There are a number of concepts, drawn from the research on Learning Theory that provide the basis for Differential Reinforcement. Knowing these concepts and understanding how they work and interact to influence behavior is essential. They are:

Learning

A relatively permanent change in behavior that occurs as a result of reinforced practice. Relatively is used in the definition because not all of the things that we learn are things that stay with us for all of our lives. Few of us can remember all of our teacher's names from K thru 13. Practice is used in the definition because there are very few things that we learn in our lives that take only 1 trial.

Behavior

A behavior is anything that we say or do. It must be observable and measurable. You can observe and measure "walking" for example, but you can only measure things like "thinking". Behaviors are described in terms of frequency, latency, duration, intensity, and topography. Frequency refers to the idea of how often the behavior happens. Latency refers to the idea of how long after an event does the behavior happen (i.e. after supper, after getting up, after being denied something). Duration refers to the idea of how long the behavior lasts. Intensity refers to the severity of the behavior. Topography refers to what the behavior looks like. Keeping all of these points in mind when describing a behavior will minimize the possibility of confusion or misunderstanding among your colleagues. All behaviors, according to Learning Theory, occur only because they are reinforced. Some clarification on this point. In new situations, we don't always know the "rules". As a result, we do a couple of things. First, we may look around to see what others are doing. Second, we may ask. Third, we may begin to interact with the environment in an exploratory manner. Those things that we do that have positive outcomes, we will do again. (This is called Thorndike's Law of Effect) Those things that have a negative outcome, we don't do again. Those things that have no discernible outcome are also things that we don't do again.

Reinforcement

Anything that increases the frequency of the behavior that it is paired with is a reinforcement. It is important to emphasize the point that anything can be a reinforcement. It does not need to be something that is appealing to you personally! Reinforcement and reward are often confused with each other. Both typically involve giving someone something that they like in return for something that they do. A reward, however, does not lead to a long-term increase in the frequency of the behavior that it is paired with. To illustrate, imagine that you are walking along the sidewalk, and you see a sign on a telephone pole that says "REWARD". Under the heading is a picture of a Dalmatian dog, together with relevant information on how you could get $100 for returning this valuable pet to its owner. You look up after reading the sign and see a Dalmatian walking toward you. Naturally, you catch the dog, return it to its owner and collect your reward from obviously happy owners. What you do not do is go back out looking for more Dalmatians! You have your one-time payoff for what you were "asked" to do. The likelihood of your returning "lost" dogs to their owners has not changed. Imagine now that instead of "REWARD" the sign had said "LOST" and appealed to the public to return a distraught little girl's dog. You see the dog and deliver it to the owner. Not only does the little girl shower you with hugs and kisses, but the parents are so thankful that they insist that you take $100 as a token of their appreciation for your time and effort! What do you think would be your response the next time you see an apparently lost dog? The difference between the two scenarios is the process. The reinforcement process involves giving a positive after the behavior occurs and not in any contractual way. It is not a "You did that because I told you I'd give you this" process, but rather an "I'm giving you this because you did that" process. This may seem like a subtle difference, but it is crucial in terms of the impact that the two processes have on an individual's behavior.

Positive Reinforcement

There are two processes for reinforcement, positive and negative. Positive Reinforcement involves giving something positive (a candy, a pat on the back, a compliment, etc.). Those things that meet basic needs (food to a hungry person, warmth to someone who is cold, etc.) are referred to as Primary reinforcements. Those things that acquire their value by being paired with a Primary reinforcer (money, compliments, etc.) are referred to as Secondary Reinforcements.

Negative Reinforcement

Negative Reinforcement is a little more complex. Technically it involves either escaping or avoiding something aversive, thereby increasing the likelihood of the escaping or avoiding behavior. Most of us have touched a hot stove at some point in our lives. Jerking our hand away is a behavior that is negatively reinforced in that we escape the aversive heat. Furthermore, keeping our hands away from hot stoves is negatively reinforced in the future because we avoid getting burned! Theoretically, negative reinforcement results in faster learning and learning which stays with us longer than positive reinforcement. Because it requires an aversive to be present "upfront", it is rarely, if ever, used in a systematic approach to teaching.

Punishment

Anything that decreases the frequency of the behavior that it is paired with is a punisher. As is the case with reinforcement, anything can be a punisher. There are two ways to punish: by presentation and by removal. Punishment by presentation occurs when a specific aversive consequence follows a behavior. Punishment by removal occurs when something positive is removed following a behavior. Within this category is a technique called Extinction, which involves not reinforcing a behavior that has been reinforced in the past.

Schedules of Reinforcement

We cannot reinforce an individual's behavior every time that it occurs, forever. It's impractical in terms of our time. It's artificial and intrusive in public settings. Most of all, the person that we are teaching will eventually experience Satiation, which means that he/she gets "full" of the reinforcer (You may like steak, but not for every meal, for weeks on end!). To avoid satiation, and to help the individual to internalize the behaviors that we are teaching, schedules of reinforcement are used. As the term implies, these are a set of different ways that we can plan for the gradual fading out of planned reinforcement, without having a negative effect on the learning process.

Continuous Reinforcement (CRF)

Continuous Reinforcement (CRF) is the first and most basic of the Schedules. Under this schedule, every time the target behavior occurs, it is reinforced. The ratio of reinforcement to behavior is then 1:1. This type of schedule results in a reasonably steady learning curve. It is most often used when we are teaching a brand new behavior to someone, or when it is considered to be critical that the person "gets" the message as soon as possible (i.e. safety skills). When extinction is introduced, the behavior rapidly disappears.

Fixed Schedules

Fixed Schedules of reinforcement are an extension of the CRF concept. Instead of one reinforcement for each behavior, a predetermined number of behaviors are required to earn reinforcement. A Fixed Ratio of 3:1 then would mean that the individual would have to demonstrate the target behavior 3 times in order to receive a reinforcement. In the same manner, a Fixed Interval of 3:1 would mean that the person would be expected to demonstrate the target behavior in each of 3 intervals before being given a reinforcer. Ratio Schedules refer to the exact number of behaviors that are required, while Interval Schedules refer to time periods wherein the behaviors must be in evidence. (In theory, there is no limit to how high the ratio could go.) This type of schedule produces a learning curve that has "plateaus", interspersed with fairly high rates of behavior. The plateaus occur when the person pauses to consume the reinforcement. When extinction is introduced, the frequency of the behavior drops off fairly rapidly, although not as rapidly as with CRF. Examples of Fixed Schedules are "piece work" (Ratio) and being on salary (Interval).

Variable Schedules

Variable Schedules of reinforcement are the ultimate goal of any intervention. Like the Fixed Schedules, they come in both Ratio and Interval forms. A Variable Ratio Schedule of 3:1 means that on average, the person is reinforced for every 3 demonstrations of the target behavior. Reinforcements are administered on a random basis, as far as the individual is concerned. Variable Schedules produce the highest rates of responding and the most resistance to extinction of any of the Reinforcement Schedules. Examples of the Variable Schedules are Lotteries (Ratio) and hunting or fishing (Interval). Most of us have most of our social behaviors reinforced on a Variable Schedule (Think of how often you are complimented!). It can be said that these Schedules induce a kind of paranoia in the individual, who never knows when the next reinforcement is coming. The reality is that the Schedule has to be carefully set in advance in order to ensure that enough reinforcement comes often enough to avoid a phenomenon called Ratio Strain.

Ratio Strain

Ratio Strain happens when the Schedule of Reinforcement is set too high and the individual "gives up" before the next reinforcement becomes available. An example of this would be the person who has been paid every two weeks, whose employer decides that paychecks will only be issued once a year. For most people, this would constitute Ratio Strain. On the other hand, if the employer gradually moved toward a "once a year" pay schedule, most people would be able to adapt. Moving through CRF, Fixed, and Variable Schedules in a gradual manner, based on the individual's abilities serves to reduce the likelihood of Ratio Strain.

Superstitious Behavior

Superstitious behavior, also referred to as "Extinction Burst”, is the "burst" of behavior that happens when we, as support providers, start a new program. Invariably, the individual responds to the change by increasing the frequency of the target behaviors) for a short period of time. If you think about it, this makes perfect sense. Imagine what your behavior would be like if the front door to your home was "re-hinged" when you were away for a little while. Now, instead of opening "in", it opens "out". It's probably safe to bet that the first few times that you use the door, you are going to end up "pushing" on it a couple of times before you remember that the "rules" have changed and pull instead. Superstitious behavior, then, is like the person "pushing" as he/she learns what the new rules are.

Consistency

This is an essential requirement for any program, and it is critical to Differential Reinforcement. It refers to the need to, as much as is humanly possible, provide the same response to the person that we want to teach. The closer that we can get to 100% consistency, the faster the teaching-learning process. Consistency might also be seen as synonymous with structure.

Communication

Communication is at least a two-party process. One party has the intent of expressing something. The second party, by definition (or default!) Must listen. There is almost an implied contract in the communication process that goes something like this. "I'll listen to you if you agree to listen to me." Invariably, the reason for communicating is to achieve a task or a goal that requires action by others. That action may be as simple as acknowledgment of what was communicated (I'm going to the store, bye!) or so complex as to require the cooperation of the person being communicated to (Can you help me lift this piano?). The nature of the task and the relationship that we have with the people that we communicate with helps us to decide what communication style to use. In a very broad sense, there are 5 different styles, which allow for differing levels of two-way communication. They are:

Instruct: state clearly what needs to be done, together with any further information which is essential to complete the task.
Inform: to be clear about what needs to be done but also provide further information such as why it needs to be done, what it is helping to achieve etc.
Consult: to ask for views that may or may not influence the final outcome.
Involve: to ask for views that will influence the final outcome.
Empower: to provide a clear framework and the license for others to do what they think is best.

These categories are not mutually exclusive. You need to first inform before you can involve. They are not intended to be seen as right or wrong. Each has its own value, depending on the purpose of the communication. For example, a Fire alarm would fit into the Instruct category. Clearly, it would not be appropriate to respond to a Fire alarm at the Involvement level, with a flip chart and an agenda! In this field, the style that you use will be a function of the skills and abilities of the people that you are supporting and the purpose of your communication. We run into problems however when those that we support communications with us. Invariably, problems occur because they use the Instruct style, either verbally or non-verbally. “This is what I want, I want it now. You get it for me or get out of my way!”

Now let's apply what we know to the most common cause of inappropriate behaviors: attention-seeking. A typical formula, in Learning Theory, includes an Antecedent (A), a resulting Behavior (B), and the Consequence (C) that follows. The formula is often called an ABC. In an attention-seeking situation, we see the following:

A
Need for attention

B
The behavior

C
Staff attention

The staff attention which follows the behavior is invariably a positive reinforcement for the individual. How do we know? Remember that a reinforcer is something, anything, that increases the behavior that it is paired with. It is logical then to conclude that attending to inappropriate behavior can and does reinforce that behavior.

One of the obstacles that need to be overcome in attention-seeking situations is the impact of the whole process on the staff involved. From the staff perspective, the behavioral formula above looks like this:

A
The Behavior

B
Attend to the Behavior

C
Peace and Quiet

Remember that escaping or avoiding something that is uncomfortable (punishing) is what Negative Reinforcement is all about. What we have then, is a situation where both the attention-seeking person and the staff involved are being reinforced by the process and the staff are being Negatively Reinforced, which has a stronger and longer-lasting impact!

We can get more clues as to what is happening in the attention-seeking dynamic if we look at baseline information in a different way. Typically, staff begin to document attention-seeking behavior as it occurs over time, thus developing a baseline of how often it is happening. To optimize the use of baseline data, we need to remember that, according to the theory, behaviors only occur because they are Reinforced. With that in mind, the baseline then tells us that, aside from all of the reinforcement that we are providing through contact and programs and aside from the reinforcement available in the environment, the individual is telling us that at least x (where x is the baseline frequency) more is required. When we decide that the behaviors that we see are "inappropriate" and that we want to assist the person to stop what they are doing (or at least decrease the frequency), what we are really saying is that we want the person to "give up" reinforcement. In the absence of a real solid understanding of the reasons for what we are asking, and that is generally not going to happen, there is not likely to be any motivation to participate.

An example of this scenario might be if you decided to drop the temperature in my home from 68 degrees to 64 degrees during the winter to save money. Heat to someone who is cold is a primary reinforcer. If I don't clearly understand and agree with your plan and all of its potential benefits, I'm very likely to react negatively. On the other hand, if you go out of your way to offer alternative sources of heat (read Reinforcement and Inform) such as a big fire in the fireplace, nice warm sweaters, and slippers, and perhaps the occasional hot chocolate, my motivation to change and accept your plans is likely to increase significantly!

The same is true when we begin a Differential Reinforcement program. It is absolutely essential that a wide variety of reinforcement is made available, for any and all behaviors that are appropriate and incompatible with those defined as the target behaviors. The reinforcement that is given need not be primary reinforcements (food etc.) nor does it need always to be time-consuming. Positive comments, short interactions, and even the offer to interact by playing a game or helping with a task can be highly effective reinforcements. The message that you want to give is that there are other ways to get your attention, ways that you like and appreciate. How do you know how often you should offer reinforcement? Look to your baseline information. It's clear from communication with the person concerning how often they need reinforcement. If the inappropriate behaviors are happening 25 times a day, then you will need to offer more than 25 reinforces for alternative behaviors. A good rule of thumb is "The more the better". Basically, offering more means that you are providing more teaching/learning opportunities, and the more of those that you have, the quicker the process.

The second component of Differential Reinforcement involves ignoring the behaviors that you do not want to see. Once again, remember that all behaviors occur because they are reinforced, therefore we can have an impact on their frequency by not providing the attention that we used to provide (read Extinction). The simplest way to ignore a behavior is to turn and walk away. It is also possible to ignore by not responding to what has happened at all. Instead, you behave as if nothing has happened and introduce a new subject or activity. In other words, you ignore what has happened, and redirect the person to something else. Activities that are chosen for redirection efforts should not be things that the person does not like. They should be presented as opportunities to interact with you and not directives or compliance episodes. Not all behaviors can be "ignored", especially if they involve danger to someone. In these cases, you can intervene as much as is necessary to ensure the safety of those involved but do so without comment and with a neutral expression. Remember that the person that you are working with is expecting specific types of responses from you, and those responses are his/her reinforcement. If you provide something that has not been experienced before it is unlikely to meet the requirements of a reinforcer.

A major problem with the use of Differential Reinforcement is that staff members tend to focus on the "ignore" portion of the technique. It is not uncommon to see reports that detail how an individual "acted out", was ignored, continued to " act out" and was essentially ignored for a whole shift. When this dynamic develops, it is very hard on both the staff and the person being supported. Functionally, what has happened is that the behavior is being responded to (by ignoring it) but the communication from the person goes unnoticed. The communication is a clear message, "I need attention!". If you respond to the behavior but not the communication, you will not be successful.

In any situation that involves attention-seeking, it is your reactions that are responsible for maintaining (reinforcing) the behavior. Unless you change the way that you do things, there will be little reason for the person that you are supporting to change their behavior.

Tim Frawley, M.A
Behavior Consultant

Shop Training Supplies