Adam Prescott

Variables, closures & scope

One fundamental entity in programming is a variable. You can’t work effectively without giving a name to something you’d like to keep track of. On the face of it, it can be all quite simple. But bring closures, scopes, and bindings, and there are often puzzling things to understand. This particular piece of writing will deal with scope and closures in Ruby specifically, but many things here are likely more generally applicable.

First up: naming things.

Identifiers

When assigning a value to some variable, such as x = 1, the idea is that the identifier — some name — x should somehow map to 1. To be more precise, it’s a mapping to the object 1, since variables in Ruby are references to objects, not the objects themselves.

str_one = "hello"
str_two = str_one
str_one << " there"
str_two #=> "hello there"

Standard example showing how variables point to objects, really. So how does this identifier-to-object mapping take place? What if there’s more than one identifier in use?

def foo
  x = 1
end

x = 5  #=> 5
foo    #=> 1
x      #=> 5

We have a single local variable identifier in this code: x. But x clearly has different values depending on where the program is during execution. x in the last line is not 1, despite foo changing the value of one of the xs. Let’s strip it down.

Being a local variable, x is a reference to something. When you dereference x — in other words, when you replace the reference with the actual thing being referenced, the referent — you’ll get a value. Which value that is depends on where you are at in the code at dereference time. In this program, sometimes it’s 1, sometimes it’s 5; the mapping isn’t unique independent of point of execution.

Obviously, there have to be rules governing how this all works together. If there are two objects pointed to by the same identifier in this program, x, there must be more information kept behind-the-scenes, to determine the return value. It’s this behind-the-scenes information which constitutes the environment.

Closures

At its core, a closure is a function paired with an environment. Because of the intricacies of what an environment is, fully explaining closures can be a little involved when trying to cover all its aspects. Consequently, the code examples typically given to explain them can hide some details unless they’re explicitly made clear. For lack of a better starting point before looking at what an environment is, one of those detail-hiding examples is good enough, and we can keep to a vague description of “environment” as behind-the-scenes information relating identifiers to values.

def foo
  x = 1
  lambda { x }
end

x = 2

foo.call #=> 1

The { x } part of this code is an anonymous function. That is, it’s a block of code which isn’t associated to an identifier in the program. That foo.call returns 1 and not 2 is a hint as to what distinguishes an anonymous function from a closure — which in this case is the entire lambda { x } object.

If the last line, instead of being foo.call had simply been x, the return value would have been 2, because that’s the value of x which is “visible” in that final line of this bit of code. foo though, is defined with reference to an x, and foo.call returns 1, not 2.

It’s obvious then, that the lambda part of the method body of foo is important in some way. The lambda turns the anonymous function { x } into a closure by attaching an environment. Great. But what does that mean? We can see from the simple demonstration above that it’s significant for how it changes return values, but what exactly is the environment, and in what way does it distinguish a closure from some other plain block of code?

Bindings and scope

Closures, as mentioned, are anonymous functions coupled with some environment. That environment is the “binding.” In Ruby, there’s a Binding class and you can get access to the current binding at any moment in your code, using Kernel#binding.

# binding at this line
binding                        #=> #<Binding:0x1005d2298>

def foo
  binding  # binding within this method
end

# the binding within foo
foo                            #=> #<Binding:0x100618298>

With this Binding object, knowing that it’s tightly coupled to the notion of an environment, we can play around.

As an initial attempt at nailing down what we’re dealing with here, it’s possible to think of a binding as, in some respect, a way to keep track of a set of values for variables.

def foo
  x
end

def bar(b)
  b.eval("x")
end

x = 1
current_binding = binding

foo                  #=> NameError, undefined local variable or method `x'
bar(current_binding) #=> 1

current_binding points to a Binding object, which keeps a hold of the association, for the top-level of this snippet of code, between the identifier x and the (object) value 1. Since foo opens a new level of scope, x has no associated value inside foo. Scope can be understood as a set of identifier-value associations which are visible within some hierarchy in the program. Any method definition in Ruby opens up its own scope, clearing out the identifiers currently visible within the method body.

By contrast, bar takes a Binding as an argument and looks up the value associated to the identifier x within that binding. This is the b.eval. The resulting value is 1 due to that, even though bar opens up a new level of scope. By passing around a Binding object explicitly, we can keep a hold of associations and use them in place of some other set of associations. That is what’s happening in bar.

In fact, using the local_variables method, we can take a look at the known local variable identifiers within a binding.

local_variables #=> []

x = 1
local_variables #=> [:x]

def new_scope
  local_variables # From the binding of `new_scope`
end

new_scope #=> []  x doesn't exist in new_scope

def scope_from_binding(b)
  { :from_methods_scope  => local_variables,
    :from_bindings_scope => b.eval("local_variables") }
end

scope_from_binding(binding) #=> { :from_methods_scope  => [],
                            #     :from_bindings_scope => [:x] }

Just what is that association

It’s at this point that you might be forgiven for thinking that a binding is just all about storing a snapshot of the mappings from variable names to values — mainly because I’ve been suggesting that it is. That actually isn’t strictly the case.

x = 1
b1 = binding.dup
b1.eval("x") #=> 1, no surprises there

x = 2
b1.eval("x")

What’s the return value here? We duplicated the Binding object, so if we look at b1 as an association identifier-x1, then the final return value should be 1, right? It’s actually 2, though.

There’s a bit more to think about. Perhaps some more experimenting?

x = 1
b1 = binding.dup
x = 2
b1.eval("x")     #=> 2, as before

b1.eval("x = 4")
b1.eval("x")     #=> 4
x                #=> 4

If you aren’t scratching your head at this point, congratulations, you get it. Here’s the important piece of information: b1 and the top-level binding are not straight mappings from identifiers to values, instead they can be thought of as mappings from identifiers to storage locations, not the values at those locations. So, from identifiers to a place in memory, in essence.

Even though we duplicated the top-level binding into b1, the binding is a mapping from identifier-xx-storage-location, and the storage location is the same in both the original top-level binding and b1. When calling b1.eval("x = 4"), this does change the value kept in the memory location for x, and hence its value. But the result is the same as simply executing x = 4; the mappings are duplicated, but the storage locations are still the same.

This binding, then, as it’s now understood, is the environment. The local variables which are within the scope can be dereferenced at call time to their values by following identifier-xx-storage-locationx-value, where (the important part) the identifier-xx-storage-location is the binding as determined at call time.

As Erann Gat said it,

The association between an identifier and a place to store values is called a binding.

and,

Sometimes the term "binding" is used to refer to the storage location itself rather than the association between the identifier and the storage location. This is not strictly correct, but rarely leads to confusion.

Back to those closures

We can see then that a closure is an anonymous function and a binding, which is not just a set of values, but an association between identifiers and storage locations, and reassigning variables for particular identifiers changes the values in the storage locations.

x = 1
l = lambda { x }
x = 2
l.call #=> 2, not 1!

Reassigning x to 2 after the lambda has been created, modifies the value in the storage location for x, by going through the binding which exists in every one of the first 3 lines.

x = 1
@l = lambda { x }

def foo
  # opens a new scope to make the point about which
  # binding we're changing
  x = 2
  puts x
  puts @l.binding.eval("x")
  @l.binding.eval("x = 50")
  puts x
end

x   #=> 1

foo	#=> 2
    #   1
    #   2

x   #=> 50

This is a more involved example, which demonstrates that, even though foo opens up a new scope, we have a hole into the outer scope through the binding of @l, which is why, after calling foo, x is no longer 1, as it was initially, instead it’s 50.

A more concise example, without method definitions:

x = lambda { puts x }
x.call
y = x
x = 1
y.call

Hopefully you can see what’s happening here, now.

Bindings without a current reference

An important subtlety is that if a binding does not contain a reference to a storage location at the time it’s bound to a closure, then the closure will raise a NameError even if the binding subsequently has an association created between that same identifier and some value.

first_letter = lambda { a }
first_letter.call         #=> NameError
a = "a"
first_letter.call         #=> NameError

b = "b"
second_letter = lambda { b }
second_letter.call        #=> "b"
b = "not c"
second_letter.call        #=> "not c"

We can see this more directly by working with Binding objects.

b1 = binding.dup  #=> #<Binding:0x94cb3b0>
b2 = binding.dup  #=> #<Binding:0x94c70f8>
p = 5             #=> 5
b1.eval("p = 10") #=> 10
b2.eval("p = 15") #=> 15
b1.eval("p")      #=> 10
b2.eval("p")      #=> 15
p                 #=> 5

Here, p was not assigned before we duplicated the Binding instances. When the variable is assigned before-hand, the output is different.

q = 100           # assigning it here!

b1 = binding.dup  #=> #<Binding:0x94cb3b1>
b2 = binding.dup  #=> #<Binding:0x94c70f9>
q = 5             #=> 5
b1.eval("q = 10") #=> 10
b2.eval("q = 15") #=> 15
b1.eval("q")      #=> 15  different!
b2.eval("q")      #=> 15
q                 #=> 15  different!

As an aside, these two p and q examples only reflect the behaviour of Ruby 1.9. On 1.8.7, the return values are 5, 10, 15, 15, 15, 15, for both. I think 1.9’s behaviour, shown above, is arguably the better of the two, as b1 and b2 initially have nothing associated with p, so they both create independent associations as part of eval. That said, it’s possibly counter to the previous point that b1 and b2 are “the same” binding. It’s a question of what should happen here:

first = binding
second = binding
first.eval("unseen_var = 1")
second.eval("unseen_var")

Ruby 1.8.7 says 1. Ruby 1.9 says NameError.

Summary of closures

Courtesy of Runpaint, this summarises things nicely:

"A closure is a combination of a function and an environment." The function is a parametrised block of executable code, and the "referencing environment", or binding, is a reference to the lexical environment of the closure’s creation site. The binding represents its variables as references, which are de-referenced in the environment the closure is called, every time it is called.

Scopes in Rubinius

One of the really cool things about the Rubinius implementation of Ruby is that it exposes, by requirement, a level of internals which you can’t find in MRI, including some internals with scopes. Because these internals are exposed in Ruby itself, you can play around with scopes as objects, using VariableScope, including getting access to the available local variables within that scope, with VariableScope.current.locals.