I have this modified code example I took from the wikipedia article on anonymous functions.
comp(threshold)
{
return lambda(x)
{
return x < threshold;
};
}
main()
{
a = comp(10);
lib::print( a(5) );
}
Anonymous functions shouldn’t be too hard to add to my scripting language. It should just be a case of adding the function code in the normal fashion, with the exception that the only way to access that function is through the variable to which it is assigned.
The problem with the closure example above is that the function body for the anonymous function references (in a general sense) a memory location that is invalid (or would be) at the time the closure is called.
I have two potential solutions to the problem in my mind already; I just want to get some recommendations first, before I attempt to add this feature to my language.
Since your question wasn’t very specific, it can only be answered with a general algorithm. I also make no claims that this is the “most sensible” way, simply one that works.
First, we need to define the problem space.
Your scripting language has local variables (using the Lua term): variables which are not global. These are variables which can theoretically be captured by a lambda.
Now, let’s assume your scripting language does not have a way to dynamically select local variables. This means that, simply by inspecting the syntax tree, one can see the following:
Given this information, local variables now are split into two groups: pure local variables, and captured local variables. I will call these “pure locals” and “captured locals”.
Pure locals are, for want of a better term, registers. When you compile to your byte code, pure locals are the simplest to handle. They’re specific stack indices, or they’re specific register names. However you’re doing your stack management, pure locals are assigned fixed locations within a particular scope. If you’re wielding the power of JIT, then these are going to become registers, or the closest thing to that possible.
The very first thing you need to understand about captured locals is this: they must be managed by your memory manager. They exist independently of the current call stack and scope, and therefore, they need to be free-standing objects which are referenced by functions that capture them. This allows multiple functions to capture the same local and therefore reference each others private data.
Therefore, when you enter a scope that contains captured lambdas, you will allocate a piece of memory that contains all of the captured locals that are part of that particular scope. For example:
The root scope of the
compfunction has two local variables. Both of them are captured. Therefore, the number of captured locals is 2 and the number of pure locals is zero.Your compiler (to byte code) will therefore allocate 0 registers/stack variables for pure locals, and it will allocate a free-standing object which contains two variables. Assuming you’re using garbage collection, you will need something to reference it in order for it to continue to live. That’s easy: you reference it in a register/stack location, one that is not directly accessible by the script. So really, you do allocate a register/stack variable, but the script can’t directly touch it.
Now, let’s look at what
lambdadoes. It creates a function. Again, we know that this function captures some variables outside of its scope. And we know which variables it captures. We see that it captures two variables, but we also see that these two variables come from the same free-standing block of memory.So what
lambdadoes is create a function object that has a reference to some bytecode and a reference to the variables it is associated with. The bytecode will use that reference to get its captured variables. Your compiler knows which variables are pure local to the function (like the argumentx) and which are externally captured locals (like threshold). So it can figure out how to access each variable.Now, when
lambdacompletes, it returns a function object. At this point, the captured variables are referenced by two things: the lambda function and the stack: the current scope of the function. Whenreturnfinishes however, the current scope is destroyed and all the things that were previously referenced are no longer referenced. So when it returns the function object, only the lambda function has the reference to the captured variables.This is all rather complicated though. A much simpler implementation would be to just make all local variables effectively captured; all local variables are captured locals. If you do this, then your compiler can be a lot simpler (and probably quicker). When a new scope is entered, all of the locals for that scope are allocated in a block of memory. When a function is created, it references all of the external scopes it uses (if any). And when a scope is exited, it removes a reference to the locals it allocated. If nobody else is referencing it, then the memory can be freed.
It’s very simple and straightforward.