Operant Boxes & Schedules of Reinforcement

In this video I describe the operant boxes used by Skinner (often called “Skinner boxes”) to study the relationship between different schedules of reinforcement and behavior. Then I describe 4 possible schedules of reinforcement including fixed-ratio, variable-ratio, fixed-interval, and variable-interval, as well as how random rewards created “superstitious” behaviors in pigeons.

Don’t forget to subscribe to the channel to see future videos! Have questions or topics you’d like to see covered in a future video? Let me know by commenting or sending me an email!

Check out my psychology guide: Master Introductory Psychology, a low-priced alternative to a traditional textbook: http://amzn.to/2eTqm5s

Video with Skinner showing operant boxes:
https://www.youtube.com/watch?v=I_ctJqjlrHA&t=77s

Video Transcript:

Hi, I’m Michael Corayer and this is Psych Exam Review. In this video I want to go into a little more detail on B. F. Skinner’s work on operant conditioning. First we’ll look at a device that Skinner created called an operant box, although other people have referred to as a “Skinner box”. So what an operant box was was a box that allowed Skinner to track behavior over time.

Inside this box we’ll have some animal, so here’s a rat inside of this operant box. The box contains a section where food can be dropped into the box. So we have a rat in here who’s hungry and we know he’s hungry because the animals that Skinner used, the rats and the pigeons were kept a little bit underweight about 3/4 of their normal weight.

This would ensure that they’re always hungry, therefore they’re always motivated to learn. We’ll come back to this idea when we talk about motivation later.

So we have this food dispenser here and so the food can be dropped into this little tray here so the rat can eat and this happens when the rat does some specified behavior.

We can have some lever here. Let’s say every time the rat presses on this lever then food will be dropped in here and he’ll get the reward, he’ll get the reinforcement. Or it could be a disc that a pigeon pecks at. I’ll put a link in the video description where you can see a video of Skinner working with some pigeons and you can see these operant boxes in action. The other part of the box was connected to a device over here which was essentially like a pen and a sheet of paper.
What would happen is the pen would move, over time, to the right so that would tell us about time passing and then each time the animal did the behavior like press the lever or peck at the disc, the pen would click up. That would tell us that a behavior had occurred.

Over time the pen would be drawing a line on the paper that would tell us the total number of lever presses over some certain amount of time. This allowed Skinner to keep track of different schedules of reinforcement and how they influenced behavior. So he could compare something like continuous reinforcement where he reinforces the animal every time that the lever is pressed, some other schedule that would involve intermittent reinforcement and that refers to only rewarding the animal at certain times for a certain number of lever presses or over a certain period of time.

Now we can look at some of the more detailed schedules of reinforcement that Skinner used to to track of how they influenced the behavior from the animal.

Le’s look at some of these different ways he could provide intermittent reinforcement. OK the first thing he could do was he could change the ratio between how many times do you have to press the lever to how much food do you get. So you have to press the lever 5 times to get one food pellet or you have to press the lever 3 times.

This would be a fixed ratio schedule. That means there’s some ratio between lever presses and rewards. So you have some X number of lever pressess that’s going to get you one food pellet. It could be five lever presses or it could be one lever press. Continuous reinforcement would be one lever press gets one food pellet every time. The important point is that it’s fixed. It’s predictable, the animal knows “if I press five times, I get a food pellet”.

A human example for this would be something like a vending machine. It’s supposed to be predictable. If you put in a dollar and you press this button then you get a soda. It works that way every time. So that’s an example of a fixed-ratio schedule.

In contrast, we could have a variable ratio schedule. So in a variable ratio we don’t know how many presses will lead to the food pellet. It’s unpredictable and it’s variable, meaning it’s always changing. That’s what makes it unpredictable. Sometimes you press three times and you get a food pellet. Then you press 10 times before you a food pellet and then you press once and you get a food pellet. So it’s a ratio that’s always changing. So it’s unpredictable. A human example for this would be a slot machine.

You don’t know how many times you have to pull the lever to get the reward. And it changes, when you win you could win again immediately or you could play 20 more times before you win, right? So this is going to change the rate of behavior that you’re going to have compared to say a vending machine.

Now the other thing that Skinner could change rather than changing the ratio between how many lever presses get food, he could change the time interval.

So the next two schedules will involve changing the time interval. One thing he did do was have a fixed interval. In a fixed interval schedule, this means that X amount of time allows you to get one food pellet. In other words, a food pellet is only available once every 2 minutes. You still have to do the behavior, you still have to press the lever in order to get it. But once you get it, you have to wait until another 2 minutes has passed before you can get the next one. So we have some set amount of time that’s fixed and you’re allowed to get a certain amount of reward during that certain interval of time.

What would be a human example of this? Well, you have a regular paycheck. If you show up and do your job, if you press the lever, then every two weeks you get a paycheck. Or every month or something like that. It’s not the case that if you work really hard you get two paychecks two days in row. It doesn’t work that way. You have to wait, you can only get one paycheck every two weeks. But you still also have to do your job, right? Just like the animal here still has to press the lever every 2 minutes or something to get that food pellet. Pressing it more isn’t going to get more food pellets until another 2 minutes has passed. So that’s a fixed-interval pattern.

Lastly, we have a, you can probably guess, a variable interval. So in this case we have some unknown unit of time that’s variable that’s always changing is associated with getting the food pellet. Here you don’t know how much time passes before the next one is available.

Again, you still have to press the lever to get it but you don’t know how long to wait before the next will be available. So what’s this going to do to your behavior? What’s a human example of this? A human example would be a pop quiz. Let’s say that in a certain class you always had pop quizzes for your tests and you never knew when they were coming. You could have two days in a row where you have a quiz or it could be 2 weeks in between them and it changes every time.

How would this influence your behavior? You’d probably study a little bit every day to try to be prepared for this pop quiz that may or may not happen. So here the unpredictable part is how much time will pass before the next one. OK so now that we have a general idea of these four different patterns we can think about, what would the pen connected to the operant box draw? What would that line look like for each of these different schedules? This is what Skinner looked at.

If we imagine one of these charts here we have time over here and then this is going to show the cumulative responses, the total number of behaviors. Each time the lever is pressed the pen clicks up and so it’s adding up over time the total number of behaviors.

Let’s look through each of these schedules and see what their lines would look like. If we did this fixed ratio every certain number of lever presses will get a food pellet. Let’s imagine we’re looking at our rat here he’s gonna press it, let’s say it’s five times, so he’s going to quickly press it five times and then he stops and eats the food pellet and then he’s going to press it five more times then he eats his food pellet, then he presses it five more times.

So we’re going to get a line looking something like this. We get five behaviors in a row then eat the food pellet, five in a row, five in a row like this.

Now we can compare that to our variable ratio. So what’s going to happen in a variable ratio? Well he’s going to press it rapidly because he doesn’t know which press is going to deliver the reward and it’s always changing. So he’s going to press it rapidly. Let’s say he presses it 10 times then he gets a reward.

Then he presses it three times and he gets a reward but then he has to press it 15 more times. So he’s going to do that very rapidly. He’s just going to sit there pressing pressing pressing, hoping the next press is going to be the one that delivers the reward.

So in the variable ratio schedule you get a very rapid response. The animal will sit there press press press, hoping to get to that. It’s kinda similar to what you see with someone playing a slot machine. You never see anyone sitting at a vending machine pouring money in and pressing buttons but in slot machines you see that all the time. Very rapid, the idea is the next one could be the reward so you want to get to it as quickly as possible. Another difference here is that this variable ratio schedule is very resistant to extinction. Because what happens if you turn off the food delivery the animals going to keep pressing that lever rapidly because the animal doesn’t know that the food has stopped. It just thinks “now I need to press it 20 times or maybe it’s 21 or maybe it’s 22” and they’ll keep pressing the lever and they won’t learn that it doesn’t work anymore. In comparison, a fixed-ratio extinction occurs more rapidly. You can remember this if you just think of the difference between a vending machine and a slot machine. If a vending machine breaks, it’s not delivering food, how many times are you going to play? How many times are you going to do the behavior before you stop? Put your money in, press the button, no soda comes out. Put your money in, press the button, no soda comes out. Probably after 2 or 3 you’re going to give up. But a slot machine let’s say is not actually going to ever pay off, how many times might you play it before you figure that out? You might play it 20 times and just think you’re having a streak of bad luck.

The variable ratio is going to be more resistant to extinction. Now let’s look at the time intervals. If we have a fixed interval, what’s that going to look like? Every certain amount of time you can get a food pellet, then you have to wait for that time before you can get another one.

So we press the button, press the lever, then we get a food pellet, and then there’s no point in pressing for awhile. There’s going to be another minute or two before you say ok, “I think it’s been”, let’s say the interval is every 2 minutes, “I think it’s been about 2 minutes I’m going to start pressing the lever again to make sure and then “I just got it, now I don’t have to do anything for a while” and then “it’s been 2 minutes, press it again”.

So you’re going to get this, what’s called a scalloped pattern where you have these rest periods followed by a rapid increase in behavior followed by rest, followed by rapid increase followed by rest, like this. You can think about how this would work in the case of your studying. If you have a fixed time interval between your exams. Let’s say you have a big exam ever month,

what’s your study going to look like? Exam’s coming up, study study study! Ok, just had the exam, play video games, oh the exam’s coming up in another day, study study study, then take a break. You’re going to see this same sort of pattern.

What’s your study going to look like if you never know when the test is coming. You could have two tests in a row or you could have tests two weeks apart. How are you going to study that way? You’re not going to cram every night. What you’re probably going to do is have a nice slow steady rate of response where you’re going to study a little bit every day. That’s what happens with this variable interval.

Imagine if it was paying off at an unpredictable time. You’d press the lever every once in a while to just check. Can I get some food now? No, alright, wait a little bit, try again, no, try again. You’re going to have this slow and steady rate of behavior and that’s what Skinner found in this variable interval schedule. Ok, I want to quickly describe one other thing that Skinner did where he just rewarded the animals at random. By this I mean it wasn’t the ratio that was changing, it wasn’t the time that was changing. It was random in that they didn’t even have to press any levers.

They didn’t have to do anything, he just dropped food into their cage at random times. It was unpredictable. You might think that the animal would just sit there and wait for the food to magically rain down from the sky. If you’re a pigeon sitting in there and food just drops in, you eat it,
you don’t have to do anything, just wait around for the next piece of food to drop.

But that’s not what happened. What Skinner found was evidence for superstitious behavior, or what he said was superstitious behavior in these pigeons. What this meant was the pigeons would happen to be doing something before the reward came. They’d happen to be turning their head to the left and food drops in. They happen to turn their head to the left again and food drops in again. They think “I know what to do! Turn my head to the left and I’ll get food!” so they keep turning their head to the left, they end up spinning in circles because they thought that was what was causing the food to drop into the cage.

Or the pigeon next to him would be lifting his foot as the food dropped in and so he’d think “they must want me to keep lifting my foot” and food drops in again and so he’d sit there lifting his foot over and over again or flapping one of his wings or something. He’d end up with these superstitious behaviors that were purely chance but that the animals thought were being rewarded.

We can draw a human comparison if we think about you happen to wear a new pair of socks and you just so happen to hit a home run at your game that day and you think “these are my lucky baseball socks”. Then you happen to be wearing them again and you just so happen to hit a home run again. It’s this sort of random chance but you become convinced that these are your lucky socks. Skinner would say that’s because of this random nature of the reinforcement and this chance occurrence that it’s paired with something else like your lucky socks and you become convinced that that’s the reason for the outcome that you’re getting.

Ok so in the next video we’ll look a little more detail on how we can use conditioning to get more complex behaviors. So this was operant boxes and these schedules of reinforcement. I hope you found this helpful if so, please like the video and subscribe to the channel for more.

Thanks for watching!

Leave a Reply Cancel reply