Experimentation in Training: Differential Reinforcement, Shaping and Errorless Protocols

Heads up: This is a long one, but there is a cute video at the end if you just want to skip to there!

I am at the stage where I’ve read a lot of dog training theory, and understand most of what I’ve read in an isolated, academic way. In the real world though, where everything is interconnected and every dog is different, I have been having issues connecting that sanitized academic understanding to reality. How do I apply this new knowledge?

How do I choose a method for this behaviour for this dog at this time in this place? How does a particular approach work with the temperament of dog I have? How does that change with different types of dogs? What approaches can I combine, and which are not really compatible?

Yup, I have reached analysis paralysis.

I spend so much time agonizing over what the ‘BEST’ approach is that I never get started. I quickly came to the conclusion that the solution to this is experimentation. I just have to get out and start trying things with my dog and see what happens. Right now I’m in a training phase where I am getting great enjoyment out of experimentation.

I am incredibly lucky that Perrin is also happy with this plan, as my changing approaches would unsettle many dogs. It also helps that we don’t have any super important competition goals in the near future, so I am not too worried about breaking behaviours without having the time to fix them. This is training for the fun of training and for learning more about how this works; there are no behaviour goals attached to this right now.

My most recent case studies on experimentation has been with shaping, differential reinforcement, and errorless learning. How to design training plans to use them, when to use them, how to execute them and how my dog reacts to them. It has brought up a great many questions, few of which I know the answer to at the moment. I wanted to record those questions and musings for the future, as it will be interesting to come back and see them a year or two down the road. I am excited to keep working at it and see what I come up with, but for now, here are my musings on the matter as of April 17, 2017:

I will start with some definitions. These summaries are my understanding of each method in my own words (which may or may not be actually technically correct, and will likely change over time as I learn more), and are as follows:

Differential Reinforcement: The use of two different value rewards in a training session to mark more or less desirable behaviours. In the definition I have been working with lately, this includes jackpots, but whether that truly fits here is debatable. I still don’t have really good definitions for things, hence the experimentation.

Errorless Protocols: A training plan design where I have not used error as a learning tool for the dog, and I have thoughtfully set up the environment or progressions to maximize success.

Shaping:  A dog-led activity where he offers behaviours and I reward successive approximations of the target behaviour until the final result is acheived. Sort of like a game of hot-and-cold. Except there is no ‘cooler’ or ‘hotter’; only yes or no. The only information is ‘I liked that one’. This inherently splits behaviours into what is ‘right’ (dog receives reinforcement) and what is ‘wrong’ (no reinforcement).

When I execute shaping poorly and wait for too large of a next step, Perrin gets frustrated when the expected reinforcement is withheld. I have been getting better at avoiding this frustration by getting better at splitting behaviours, keeping sessions short, avoiding the ‘just one more’ syndrome, and particularly by being generous with my criteria. Whether this last item affects the quality of the training I do not know, but I have observed that it GREATLY reduces frustration behaviours, with the pace of learning remaining the same, so I am sticking with it for now. I have been criticized for this approach in the past, but I guess I would rather have Perrin get some extra cookies and training take longer than have him just get frustrated and quit. If he quits and doesn’t want to play the game with me, then training would take much longer in the long run. I was vindicated on this subject recently when I listened to a conversation about it in Hannah Branigan’s podcast Episode 9, during a conversation with Amy Cook. They discuss simply slowly dropping the least close approximation (behaviours fall on a continuum, where some are closer to the target behaviour than others) rather than having one very stringent, ‘right’ answer. They explain it much better than I could around the halfway mark!

While that was a bit of a tangent, the element of frustration in shaping is what started me on this. Much of it is caused by poor training procedures on my part (and that is getting much better with thought and time), but I always love an excuse to try something new and see how it works.

I have been mixing shaping and differential reinforcement for a while now, in the form of jackpots. For his ‘closest’ attempt to the target behaviour, or breakthroughs, Perrin would receive a hand full of cookies. For  approximations that are still right, but farther from the target behaviour than his ‘best attempt’, he gets one cookie. This has been in an effort to establish a way to communicate “hotter” and “cooler” in the shaping process; to add more resolution to my communications in order to make it easier for Perrin to understand. I have read a bit about how the jury is still out on the science of the effectiveness of jackpots, but it seems to work pretty well for Perrin and I. Maybe the ‘low value’ cookies were just keeping him engaged and in the game rather than actually giving him more information to process and use to make better decisions. If that is the case, I think it still worked well enough for me to keep playing with, even if it didn’t work the way that I thought. I can always stop using it if I find out differently later on.

I read about errorless protocols a few months ago, and have been playing with them ever since (although I unknowingly used one to house train Perrin). I thought that adding this approach to my toolbox would be a nice change of pace for both Perrin and I at a time when I was frequently frustrating him during shaping. I was still working on fixing that problem, but in the mean time I wanted to give Perrin a frustration free option (and I am always trying to learn new things, so that ever present motivation was in play too). The first training plan I built this way was for adding distance to position changes. I started with a foot target as I added distance in tiny amounts, then starting the distance over again without the target. I have also been working on recalls with an errorless mindset and have been progressing really well. The rate of reinforcement for both parties (cookies for Perrin, and success for me) was quite high and it kept both of us engaged and eager to work. I have found that being in the mindset of using errorless protocols makes it extremely clear to me that it is MY responsibility to set Perrin up for success (as it always is). I am much more careful to set up good antecedents, to be mindful of Perrin’s frame of mind and his ability to work at that moment, and to not over-face him with environmental stimuli. That is something that I really need to take away and do better at that in all of our training.

The differential reinforcement came in to the picture with errorless learning when I was screwing up on designing errorless protocols. I would be in the middle of a training session where I was trying new errorless protocols, and things would not go as expected. In an effort to avoid frustration, I started using some lower value rewards to fill the holes in my errorless plan to stave off the frustration I was attempting to avoid. Good training? Nope. But it got me thinking…

I have a cursory understanding of all three methods, but even basic experimentation has given me so many more questions!

  • Why and where would I use one versus the other? With this dog? With others?
  • Where to use errorless protocols over shaping or vice versa, and what affect does each have on my learner?
  • How do the three interplay?
  • How can I combine them to maximize the benifits and minimize the downsides?
  • On what points (theoretical or mechanical) are they not compatible?
  • How do I define for myself how errorless protocols are different from differential reinforcement protocols where the dog always receives some sort of reinforcement, but they are of different value?
    • The differences there can be nuanced. The differences in value of your chosen reinforcers would dictate how different from one another they are. A very low value treat versus a very high value one might be more akin to shaping? Two closely valued reinforcers would be closer to ‘errorless’? Personal play vs a cookie for Perrin would yield vastly different results than a cookie vs cooked chicken, and still different results from a tug versus chicken. Depending on the environment we are in and the training item we are working on those results would all change yet again. As always, choosing an appropriate reinforcer for the activity matters.
  • Is ‘errorless’ defined by the intention to design a training plan that does not use errors as a learning tool, or is it defined by a situation in which the learner is not aware of ‘errors’ because they still receive reinforcement, and therefore avoids frustration? To some dogs is a vast disparity in reinforcer value (or type) going to cause the same kind of frustration if the plan is applied poorly, as a poorly executed shaping session?
    • For Perrin it seems not to matter as long as both reinforcers I have chosen are things that he ACTUALLY finds reinforcing. I lose frustration, gain enthusiasm and resiliency to my mistakes, and I don’t sacrifice much learning speed by using kibble and chicken. However if I try to use a tug (low value) and food (high value), he just gets frustrated, as he never wants a toy when there is food available as an option.

I am starting to see this as a sort of continuum with errorless protocols on one end and shaping on another; differential reinforcement could be anywhere between those points depending on how you design your training plan and choose your reinforcers and how those variables affect your dog and their learning. Naturally, as soon as I start thinking that way, I come across interesting things about micro shaping and it being damn near errorless. I need more info, but I know that I definitely do not have the skills to pull that off. I will do more research on that subject and come back to the drawing board on this one. And I love the fact that I have to go back and re-think everything.

I am usually an outcome driven person. I knit because I like mittens. I cook because I want the food. As such, I am continually surprised by how much I love the process of dog training in a way that I have never enjoyed any process before. I love learning about all of these different methods and approaches, thinking through how they work, how they may interact with other methods, how my dog may react to them, how dogs with different temperaments may react to them, how the gel with my abilities and strengths. It is all a big, open ended problem to solve; I love complex problems and dreaming up out-of-the-box solutions. This love may have led me to study engineering, but I am finding that I am more able to apply this love and the skills I learned in school to dog training better than than I can in an industrial setting.

Fittingly enough, I listened to a podcast today while out on a walk in the woods with Perrin that I identified greatly with. It is about passion and success and how most successful passions are not a lightning bolt moment. It really struck me, and think that others may find it very much worth the time to listen to. It can be found here if you want to hear it!

That was a rambling monster of a post! So here is a video of Perrin having the zoomies in the snow:

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s