 |
Using
Intermittent Rewards in Training -
The Concept of Differential Reinforcement
By Tim
Frawley, M.A
Copyright 2001
My brother
is a behavioral psychologist and consultant in Canada. He successfully
works with special needs people, many with IQs of less that 50.
Tim and I get into very interesting discussions on learning concepts
and I am always surprised at how the work he does with his patients
applies to dog training. In my Basic
Dog Obedience video. I use examples that Tim
gave me to explain a lot of the training concepts.
Tim wrote
the article on this web page. He uses the information in the article
to explain his work to the care givers of some of his patients.
It applied directly to dog training. The reason that I chose to
include it on my web site is because it will help dog trainers use
the correct terminology when discussing training with other dog
trainers.
The article
is certainly food for thought for all dog trainers.
List of terms discussed in
this article that are used in Differential Reinforcement:
The single most effective and efficient technique
available to those who work in the field of Human Services is Differential
Reinforcement. Properly implemented, it will "solve" more
than 80% of all the problems that you encounter when working with
people. It will allow you to accomplish this by focusing
on building new skills through the use of positive reinforcement,
rather than punishing existing behaviors. It is not new.
It is not difficult to explain. It is very difficult to
do, on a consistent basis. In order to use the technique effectively,
it is critical that you understand how and why it works the way
that it does. Hopefully, this paper will help to provide
you with that understanding.
There are a number of concepts, drawn from
the research on Learning Theory, that provide the basis for Differential
Reinforcement. Knowing these concepts and understanding how they
work and interact to influence behavior is essential. They are:
Learning:
A relatively permanent change in behavior that occurs as
a result of reinforced practice. Relatively is used in the definition
because not all of the things that we learn are things that stay
with us for all of our lives. Few of us can remember all of our
teachers names from K thru 13. Practice is used in the definition
because there are very few things that we learn in our lives that
take only 1 trial.
Behavior:
A behavior is anything that we say or do. It must be observable
and measurable. You can observe and measure "walking"
for example, but you can only measure things like "thinking".
Behaviors are described in terms of frequency, latency, duration,
intensity, and topography. Frequency refers to the idea of
how often the behavior happens. Latency refers to the idea
of how long after an event does the behavior happen (i.e.. after
supper, after getting up, after being denied something). Duration
refers
to the idea of how long the behavior lasts. Intensity refers to
the severity of the behavior. Topography refers to what the
behavior looks like. Keeping all of these points in mind when
describing a behavior will minimize the possibility of confusion
or misunderstanding among your colleagues. All behaviors, according
to Learning Theory, occur only because they are reinforced.
Some clarification on this point. In new situations, we don't always
know the "rules". As a result we do a couple of things.
First we may look around to see what others are doing. Second
we may ask. Third, we may begin to interact with the environment
in an exploratory manner. Those things that we do that have
positive outcomes, we will do again. (This is called Thorndikes
Law of Effect) Those things that have a negative outcome, we don't
do again. Those things that have no discernible outcome are
also things that we don't do again.
Reinforcement:
Anything that increases the frequency of the behavior that
it is paired with is a reinforce. It is important to emphasize
the point that anything can be a reinforce. It does
not need to be something that is appealing to you personally! Reinforcement
and reward are often confused with each other. Both typically involve
giving someone something that they like in return for something
that they do. A reward however, does not lead to a long term increase
in the frequency of the behavior that it is paired with. To
illustrate, imagine that you are walking along the sidewalk and
you see a sign on a telephone pole that says "REWARD".
Under the heading is a picture of a Dalmatian dog, together with
relevant information on how you could get $100 for returning this
valuable pet to it's owner. You look up after reading the sign and
see a Dalmatian walking toward you. Naturally, you catch the
dog, return it to it's owner and collect your reward from obviously
happy owners. What you do not do is go back out looking for
more Dalmatians! You have your one- time payoff for what you
were "asked" to do. The likelihood of your returning "lost"
dogs to their owners has not changed. Imagine now that instead of
"REWARD" the sign had said "LOST" and appealed
to the public to return a distraught little girl's dog. You
see the dog and deliver it to the owner. Not only does the
little girl shower you with hugs and kisses, but the parents are
so thankful that they insist that you take $100 as a token of their
appreciation for your time and effort! What do you think would
be your response the
next time you see an apparently lost dog? The difference between
the two scenarios is the process. The reinforcement process
involves giving a positive after the behavior occurs and not in
any contractual way. It is not a "You did that because
I told you I'd give you this" process, but rather an "I'm
giving you this because you did that" process. This may
seem like a subtle difference, but it is crucial in terms of the
impact that the two processes have on an individual's behavior.
There
are two processes for reinforcement, positive and negative. Positive
Reinforcement involves giving something positive (a candy,
a pat on the back, a compliment etc.). Those things that meet basic
needs (food to a hungry person, warmth to someone who is cold,
etc.)
are referred to as Primary reinforces. Those things that acquire
their value by being paired with a Primary reinforcer (money, compliments,
etc.) are referred to as Secondary reinforces.Negative Reinforcement is a little more complex. Technically
it involves either escaping or avoiding something that is aversive,
thereby increasing the likelihood of the escaping or avoiding behavior.
Most of us have touched a hot stove at some point in our lives.
Jerking our hand away is a behavior that is negatively reinforced
in that we escape the aversive heat.
Furthermore, keeping our hands away from hot stoves is negatively
reinforced in the future because we avoid getting burned! Theoretically,
negative reinforcement results in faster learning and learning which
stays with us longer than positive reinforcement. Because it requires
an aversive to be present "up front" it is rarely, if
ever, used in a systematic approach to teaching.
Punishment:
Anything that decreases the frequency of the behavior that it is
paired with is a punisher. As is the case with reinforcement, anything
can be a punisher. There are two ways to punish, by presentation
and by removal. Punishment by presentation occurs when a specific
aversive consequence follows a behavior. Punishment by removal
occurs when something positive is removed following a behavior.
Within this category is a technique called Extinction, which
involves not reinforcing a behavior that has been reinforced
in the past.
Schedules
of Reinforcement:
We cannot reinforce an individual's behavior every time that it
occurs, forever. It's impractical in terms of our time. It's
artificial and intrusive in public settings. Most of all,
the person that we are teaching will eventually experience Satiation,
which basically means that he/she gets "full" of the reinforcer
(You may like steak, but not for every meal, for weeks at end!).
To avoid satiation, and to help the individual to internalize the
behaviors that we are teaching, schedules of reinforcement are used.
As the term implies, these are a set of different ways that we can
plan for the gradual fading out of planned reinforcement, without
have a negative effect on the learning process.
Continuous
Reinforcement (CRF)
is the first and most basic of the Schedules. Under this schedule,
every time the target behavior occurs, it is reinforced. The
ratio of reinforcement to behavior is then 1:1. This type of schedule
results in a reasonably steady learning curve. It is most often
used when we are teaching a brand new behavior to someone, or when
it is considered to be critical that the person "get"
the message as soon as possible (i.e. safety skills). When extinction
is introduced, the behavior rapidly disappears.
Fixed
Schedules of reinforcement
are an extension of the CRF concept. Instead of one reinforcement
for each behavior, a predetermined number of behaviors are required
to earn reinforcer. A Fixed Ratio of 3:1 then would mean that the
individual would have to demonstrate the target behavior 3 times
in order to receive a reinforcement. In the same manner a Fixed
Interval of 3:1 would mean that the person would be expected to
demonstrate the target behavior in each of 3 intervals before being
given a reinforcer. Ratio Schedules refer to the exact number
of behaviors that are required, while Interval Schedules refer to
time periods wherein the behaviors must be in evidence. (In
theory, there is no limit to how high the ratio could go.)
This type of schedule produces a learning curve that has "plateaus",
interspersed with fairly high rates of behavior. The plateaus occur
when the person pauses to consume the reinforcement. When extinction
is introduced, the frequency of the behavior drops off fairly rapidly,
although not as rapidly as with CRF. Examples of Fixed Schedules
are "piece work" (Ratio) and being on salary (Interval).
Variable
Schedules of reinforcement are the ultimate goal of any
intervention. Like the Fixed Schedules, they come in both
Ratio and Interval form. A Variable Ratio Schedule of 3:1
means that on the average the person is reinforced for every 3 demonstrations
of the target behavior. Reinforcements are administered on an apparently
random basis, as far as the individual is concerned. Variable Schedules
produce the highest rates of responding and the most resistance
to extinction of any of the Reinforcement Schedules. Examples of
the Variable Schedules are Lotteries (Ratio) and hunting or fishing
(Interval). Most of us have most of our social behaviors reinforced
on a Variable Schedule (Think of how often you are complimented!).
It can be said that these Schedules induce a kind of paranoia in
the individual, who never knows when the next reinforcement is coming.
The reality is that the Schedule has to be carefully set in advance
in order to ensure that enough reinforcement
comes often enough to avoid a phenomena called Ratio Strain. This
happens when the Schedule of Reinforcement is set too high and the
individual "gives up" before the next reinforcement becomes
available. An example of this would be the person who has been paid
every two weeks, whose employer decides that pay checks will only
be issued once a year. For most people, this would constitute
Ratio Strain. On the other hand, if the employer gradually moved
toward a "once a year" pay schedule, most people would
be able to adapt. Moving through CRF, Fixed, and Variable
Schedules in a gradual manner, based on the individual's abilities
serves to reduce the likelihood of Ratio Strain.
Superstitious
Behavior: (also referred to as "Extinction
Burst")This term refers to the "burst"
of behavior that happens when we, as support providers, start a
new program. Invariably, the individual responds to the
change by increasing the frequency of the target behaviors) for
a short
period of time. If you think about it, this makes perfect sense.
Imagine what your behavior would be like if the front door to
your
home was "re-hinged" when you were away for a little
while. Now, instead of opening "in", it opens "out".
It's probably safe to bet that the first few times that you use
the door, you are going to end up "pushing" on it a
couple of times, before you remember that the "rules" have
changed and pull instead. Superstitious behavior, then, is like
the person
"pushing" as he/she learns what the new rules are.
Consistency:
This is an essential requirement for any program, and it
is critical to Differential Reinforcement. It refers to the
need to, as much as is humanly possible, provide the same response
to the person that we want to teach. The closer that we can get
to 100% consistency, the faster the teaching learning process.
Consistency might also be seen as synonymous with structure.
Communication:
Communication is obviously at least a two party process.
One party has the intent of expressing something. The second
party, by definition (or default!) Must listen. There
is almost an implied contract in the communication process that
goes something like this. "I'll listen to you if you agree
to listen to me." Invariably, the reason for communicating
is to achieve a task or a goal that requires action by others. That
action may be as simple as acknowledgment of what was communicated
( I'm going to the store, bye!) or so complex as to require the
cooperation of the person being communicated to (Can you help me
lift this piano?). The nature of the task and the relationship that
we have with the people that we communicate with helps us to decide
what communication style to use. In a very broad sense, there
are 5 different styles, which allow for differing levels of two-way
communication. They are:
- Instruct: state clearly
what needs to be done, together with any further information which
is essential to complete the task.
- Inform: to be clear
about what needs to be done but also provide further information
such as why it needs to be done, what it is helping to achieve
etc.
- Consult: to ask for
views which may or may not influence the final outcome.
- Involve: to ask for
views which will influence the final outcome.
- Empower: to provide
a clear framework and the license for others to do what they think
is best.
These categories are
not mutually exclusive. You need to first in form before you can
involve. They are not intended to be seen as right or wrong. Each
has it's own value, depending on the purpose of the communication.
For example, a Fire alarm would fit in to the Instruct category.
Clearly, it would not be appropriate to respond to a Fire alarm
at the Involvement level, with a flip chart and an agenda! In this
field, the style that you use will be a function of the skills and
abilities of the people that you are supporting and the purpose
of your communication. We run into problems however when those
that we support communicate with us. Invariably problems occur because
they use the Instruct style, either verbally or non-verbally. This
is what I want, I want it now, You get it for me or get out of my
way!
Now let's apply what
we know to the most common cause of inappropriate behaviors, attention
seeking. A typical formula, in Learning Theory, includes an
Antecedent (A), a resulting Behavior (B) and the Consequence (C)
that follows. The formula is often called an ABC. In an attention
seeking situation, we see the following:
| A |
B |
C |
| Need for attention |
The behavior |
Staff Attention |
The staff attention which
follows the behavior is invariably a positive reinforcement for
the individual. How do we know? Remember that a reinforcer
is something, anything, that increases the behavior that it is paired
with. It is logical then to conclude that attending to an
inappropriate behavior can and does reinforce that behavior.
One of the obstacles
that needs to be overcome in attention seeking situations is the
impact of the whole process on the staff involved. From the
staff perspective, the behavioral formula above looks like this:
| A |
B |
C |
| The Behavior |
Attend to the Behavior |
Peace and Quiet |
Remember that escaping
or avoiding something that is uncomfortable (punishing) is what
Negative Reinforcement is all about. What we have then, is a situation
where both the attention seeking person and the staff involved are
being reinforced by the process and the staff are being Negatively
Reinforced, which has a stronger and longer lasting impact!
We can get more clues
as to what is happening in the attention seeking dynamic if we look
at baseline information in a different way. Typically staff begin
to document attention seeking behavior as it occurs over time, thus
developing a baseline of how often it is happening. To optimize
the use of baseline data, we need to remember that, according to
the theory, behaviors only occur because they are Reinforced.
With that in mind, the baseline then tells us that, aside from all
of the reinforcement that we are providing through contact and programs
and aside from the reinforcement available in the environment, the
individual is telling us that at least x (where x is the baseline
frequency) more is required. When we decide that the behaviors
that we see are "inappropriate" and that we want to assist
the person to stop what they are doing (or at least decrease
the frequency), what we are really saying is that we want the person
to "give up" reinforcement. In the absence of a
real solid understanding of the reasons for what we are asking,
and that is generally not going to happen, there is not likely to
be any motivation to participate. An example of this scenario
might be if you decided to drop the temperature in my home from
68 degrees to 64 degrees, during the winter, to save money.
Heat to someone who is cold is a primary reinforcer. If I
don't clearly understand, and agree with your plan and all of it's
potential benefits, I'm very likely to react negatively. On
the other hand, if you go out of your way to offer alternative sources
of heat (read reinforcement! And Inform) such as a big fire
in the fireplace, nice warm sweaters and slippers, and perhaps the
occasional hot chocolate, my motivation to change and accept your
plans is likely to increase significantly! The same is true
when we begin a Differential Reinforcement program. It is absolutely
essential that a wide variety of reinforcement is made available,
for any and all behaviors that are appropriate and incompatible
with those defined as the target behaviors. The reinforcement that
is given need not be primary reinforces. (food etc.) nor does it
need, always, to be time consuming. Positive comments, short interactions,
even the offer to interact by playing a game or helping with a
task can be highly effective reinforces. The message that you
want to give is that there are other ways to get your attention,
ways that you like and appreciate. How do you know how often you
should offer reinforcement? Look to your baseline information.
It's a clear communication from the person concerning how often
they need reinforcement. If the inappropriate behaviors are happening
25 times a day, then you will need to offer more than 25
reinforces. for alternative behaviors. A good rule of thumb is "The
more the better". Basically, offering more means that you are
providing more teaching/learning opportunities, and the more of
those that you have, the quicker the process.
The second component
of Differential Reinforcement involves ignoring the behaviors that
you do not want to see. Once again, remember that all behaviors
occur because they are reinforced, therefore we can have an impact
on their frequency by not providing the attention that we used to
provide (read Extinction!). The simplest way to ignore a behavior
is to turn and walk away. It is also possible to ignore by not responding
to what has happened at all. Instead, you behave as if nothing has
happened and introduce a new subject or activity. In other
words, you ignore what has happened, and redirect the person to
something else. Activities that are chosen for redirection
efforts should not be things that the person does not like.
They should be things that are presented as opportunities to interact
with you and not directives or compliance episodes. Not all behaviors
can be "ignored", especially if they involve danger to
someone. In these cases, you can intervene as much as is necessary
to ensure the safety of those involved, but, do so without comment
and with a neutral expression. Remember that the person that you
are working with is expecting specific types of responses from you,
and those responses are his/her reinforcement. If you provide something
that has not been experienced before it is unlikely to meet the
requirements of a reinforcer.
A major problem with
the use of Differential Reinforcement is that staff members tend
to focus on the "ignore" portion of the technique.
It is not uncommon to see reports that detail how an individual
"acted out", was ignored, continued to " act out"
and was essentially ignored for a whole shift. When this dynamic
develops, it is very hard on both the staff and the person being
supported. Functionally, what has happened is that the behavior
is being responded to (by ignoring it) but the communication from
the person goes unnoticed. The communication is a clear message,
"I need attention!". If you respond to the behavior but
not the communication, you will not be successful.
In any situation that
involves attention seeking, it is your reactions that are responsible
for maintaining (reinforcing) the behavior. Unless you change
the way that you do things, there will be little reason for the
person that you are supporting to change their behavior.
Tim
Frawley, M.A
Behavior Consultant
Copyright
Tim Frawley
2001
|
|