klapaucius

A language so inspired by lisps, forths and array languages that it resembles none of these.

A combinator-based, concatenative, tacit lisp with automatic memory management, immutability, partial application, some church encodings, without variables or lambdas, where everything is a function.

repl src

Setting Out

This language emerged as a result of thinking about computing on stack machines. I wanted to spawn some sort of convergence of lisp-like and forth-like languages, I wanted to take the point-free style one gets in forths, and bring it into something with a structure based more upon s-expressions. It's a take on stack-based programming, but where we work from the other direction in a sense.

To demonstrate, let's break down something simple, like (+ 4 5). If one has some experience with other lisps, this code might appear pretty obvious, and even without that experience, if one were to look at a plus, a four, and a five arranged like so, one might guess that this program adds four and five, returning 9. And yes, that's exactly what happens, but understanding how we get there sheds light on how we can understand this language. It's already been stated that everything is a function, but it might be helpful to think of everything as a monadic function, a function that takes only one argument (which generally holds true, save for a particularly exciting and useful exception). But isn't the above-mentioned + a function that takes two values? Sort of! In this case, it is reasonable to think of + as a function that takes one argument n, and returns another function, which takes one argument m and returns n + m. Breaking this down a bit further reveals some aspects of the underlying system.

How might we think about + being a monadic function that returns a function? Well let's start by breaking our example into 'simpler' pieces. Let's start with just (+). If we type that into our repl and hit enter, we get _512 in return. What is this? You're right to ask. Things prefixed with an underscore like that are just how internal operations are printed, the underscore followed by a numeric value, here 512 corresponds to our plus. So let's call this function now. Type in (+ 4), and you get _512 in return. The same thing? What happened to the 4? I promise it's still in there, it's just printed like that to not overwhelm the output when dealing with more complicated matters. You can even get the four back out, doing getargs (+ 4) returns a list of arguments applied to any builtin operation (here, we'd get (4)).

So, if you're still with me, now we have a nice new function (+ 4). Let's call this function now. Give it ((+ 4) 5), and indeed you'll get a 9 in return! When it's parenthesized this way, I feel it's easier to understand that (+ 4) is itself a function, and can be applied to a number to get the sum of that number and four. So now is there a difference between ((+ 4) 5) and (+ 4 5)? Practically speaking they're the same. When the interpreter evaluates a statement like (a b c d), it takes a and b off the top of the list, appplies a to b, and sticks that (let's call it ab) to the beginning of the list and starts over again. So (a b c d) becomes (ab c d) which in turn becomes (abc d), which goes to (abcd), and now since there aren't any more arguments, it drops the parenthesis giving us abcd. This gives us a decent amount of leeway with the use of parenthesis in a way that's different from other lisps. The expressions

(+ 4 5)
((+ 4) 5)
(((+) 4) 5)
(((+ ((4)) 5)))

all evaluate to the same thing, in (practically) the same way. This allows us to use parenthesis for refactoring or to make code clearer. It also allows a nice trick in the repl: we can just drop the outmost parenthesis, so instead of any of those examples we can just write + 4 5! Doing this feels so nice to me, and since I haven't encountered it in other lisps before, I consider this to be the most significant idea other lisps could learn from.

One thing to note about this style is that as it currently stands, this makes this language a bit lazier than most languages. That is, the arguments to functions are only evaluated when there's enough information to completely evaluate the function. So, say, if one were to use the function (+ <a>), the argument a will only be evaluated when the function is passed another value. This will always be in the language, though I do imagine the introduction of a way to more eagerly evaluate arguments.

Combinators

Quite a bit is needed to make a language as tacit as this one, and the method I've selected to do so is through the use and overuse of combinators. One can think of the use of combinators here as a form of rewriting expressions. Let's take as an example the C combinator. (C a b c) takes its three arguments and gives back (a c b). Let's look at how this can work in a more practical example:

← C - 1 5
 = - 5 1
 → 4

so if one were interested in a function that subtracts one from a number, there you have it, in (C - 1).

Another often-useful combinator is the B combinator, sometimes called the composition combinator. This one takes (B a b c) to (a (b c)). An example one might come across frequently is:

← B car cdr (list 1 2 3 4)
 = car (cdr (list 1 2 3 4))
 → 2

which is commonly called cadr in many other lisps. A decent amount of other compositions of car and cdr can also easily be defined, but we'll leave that for later.

This language has all the combinators from the I S K and B C K W combinator calculi. From the composition of these, many other useful combinators can be made. Two combinators provided are the phi and psi combinators, where Phi a b c d gives you a (b d) (c d), and where Psi a b c d gives you a (b c) (b d), both very useful in general, and phi particularly useful for processing lists. For more, I encourage you to check out this and this link for many nice examples of other combinators and their uses.

There is one further built-in combinator, incredibly powerful and used for recursion. The Z combinator. It is quite cumbersome at this point to work with, but it can power most anything you might want to do that requires any sort of iteration. The Z combinator takes Z g v to g (Z g) v, enabling recursion. One major implementation difference is that Z evaluates its second argument rather than just reproducing the symbols in the desired order, for the sake of performance (this may change at some point).

Okay, we've described it, now let's see how we might work with it. Please note that this is currently the least pleasant part of the language, and the brunt of the work going forward will likely be how to make actually implementing things comfortable. But let's think about implementing the factorial function (please note that this implementation is not tail-recursive, so it's not the most practical implementation). My process has been starting with an expression with variables, and abstracting them out (getting them to be the last things in the expression). So, the process starts with

(if (eq 0 n) 1 (* n (fac (- n 1))))

and our goal is to extract the n and fib terms, in that order, to get something usable with the Z combinator. There are several different things we could do first, but let's start with something we've already seen before, with getting a function to subtract one from something. We just replace below

(if (eq 0 n) 1 (* n (fac ((C - 1) n))))

Next we can use the B combinator to come to

(if (eq 0 n) 1 (* n (B fac (C - 1) n)))

At this point, it's pretty useful to remember how much freedom we have in using parenthesis to help us refactor things, and group a couple different things up to get

(if ((eq 0) n) 1 (* n ((B fac (C - 1)) n)))

Now, we get to use the S combinator, which takes S a b c to a c b c. If we put it right in front of the multiplication, we get

(if ((eq 0) n) 1 (S * (B fac (C - 1)) n))

which reduces the number of times n appears in the expression. The next several steps will be applications of the same replacements we've already done (including reparenthesizing), but in different places. Try and follow along!

(if ((eq 0) n) 1 (S * (B fac (C - 1)) n))
(if ((eq 0) n) 1 ((S * (B fac (C - 1))) n))
((B if (eq 0) n) 1 ((S * (B fac (C - 1))) n))
(((B if (eq 0)) n) 1 ((S * (B fac (C - 1))) n))

The next tool we have in our bag of tricks is another way of messing with parenthesis, which has been alluded to earlier. Since ((a b) (c d)) functions the same as (a b (c d)), we can remove parenthesis that start expressions (though not ones in the middle of expressions, because calling c on d needs to happen before being used as the argument for (a b)). So, we can rewrite our latest expression:

(((B if (eq 0)) n) 1 ((S * (B fac (C - 1))) n))
((B if (eq 0)) n 1 ((S * (B fac (C - 1))) n))

This gives us more power for refactoring and reorganizing code in general. We continue on:

((B if (eq 0)) n 1 ((S * (B fac (C - 1))) n))
(C (B if (eq 0)) 1 n ((S * (B fac (C - 1))) n))
((C (B if (eq 0)) 1) n ((S * (B fac (C - 1))) n))
(S (C (B if (eq 0)) 1) (S * (B fac (C - 1))) n)
(S (C (B if (eq 0)) 1) (S * (B fac (C - 1)))) n

At this point, n is the last term in this expression, so we can consider it factored out, and then proceed to do the same thing for fac

(S (C (B if (eq 0)) 1) (S * (B fac (C - 1))))
(S (C (B if (eq 0)) 1) (S * (C B (C - 1) fac)))
(S (C (B if (eq 0)) 1) ((S *) ((C B (C - 1)) fac)))
(S (C (B if (eq 0)) 1) (B (S *) (C B (C - 1)) fac))
((S (C (B if (eq 0)) 1)) ((B (S *) (C B (C - 1))) fac))
(B (S (C (B if (eq 0)) 1)) (B (S *) (C B (C - 1))) fac)
(B (S (C (B if (eq 0)) 1)) (B (S *) (C B (C - 1)))) fac
(B (S (C (B if (eq 0)) 1)) (B (S *) (C B (C - 1))))

And now we have factored out our recursive call. What do we do with this now? Now we get to use the Z combinator. We've just figured out the g we want to pass it, and since this is a factorial function, the v we need is just whatever iteger we want the factorial of. So,

← Z (B (S (C (B if (eq 0)) 1)) (B (S *) (C B (C - 1)))) 5
 → 120

Wonderful! Now, if we wanted to use factorial more often, we'd just assign it a (global) name, like so:

← def factorial (Z (B (S (C (B if (eq 0)) 1)) (B (S *) (C B (C - 1)))))
 → _264
← factorial 7
 → 5040
← factorial 0
→ 1

One might think all this to be a bit much to get something as simple as factorial. I'd probably agree. Aside from fixing various pain points and adding practical features, the majority of effort that will go into this language will be toward making writing things like recursive functions much easier.

Church Encodings

Everything in this language is a function. That includes numbers, strings, and everything else. Most things, including strings, are pretty unexciting, a string is just a function that returns itself. But, for unsigned integers, t, and nil, we get some exciting behaviours. When t is applied as a function, it takes two arguments and returns the first, and nil takes two and returns the second (these are represented in combinators as K and K I respectively). Numbers are pretty nice here, when treated as a function, n takes two arguments, and applies the first to the second n times. So, the code (4 f v) turns into the code (f (f (f (f v))))

Particularly for numbers, this is a nice way to iterate things. A couple simpler examples include multiplication and exponentiation:

← 5 (+ 3) 0
 → 15
← 6 (C - 4) 100
 → 76
← 4 (* 5) 1
 → 625
← 4 3 (+ 1) 0
 → 81
← 7 2 (+ 1) 0
 → 128

These are certainly not the most efficient ways to calculate these values, but they illustrate how one might use numbers as functions. Another more practical and exciting use of numbers can come in the form of composition of car and cdr, like the cadr mentioned above. We can define cadddr as (B car (3 cdr)) and so on.

The use of t and nil as functions isn't quite so broad in its uses, but the combination of them can really make code more compact. Consider any sort of expression that evaluates to a boolean, such as (eq 4 n). If it evaluates to true, it'll function as the t function, so passing it two arguments returns the first. If it evaluates to false, it'll function as nil, so passing it two arguments returns the second. So, the following examples work the same way:

if (eq 6 n) "n was six" "n was not six"
(eq 6 n) "n was six" "n was not six"

So 'if' is generally superfluous and not needed. In fact, using the I combinator (which takes I a to a), one can define (and I have, though I don't actually use it that much) 'if' with the following statement: (def if I). Simple as that. All this is the biggest use of Church encodings so far, though I may play around with doing some more useful and interesting things.

rest

The last fairly significant thing I want to tell you about here is rest. This operation could probably be considered the most 'improper' thing in this language. This is how we can do variadic functions. rest is passed a function that takes a function that operates on a list. rest takes that function, and applies it to all the rest of the arguments in that statement. So, one could implement a more commonly lisp-style variadic addition:

← def +/ (rest (fold +))
 → _7
← +/ 1 2 3 4 5
 → 15

which gives us a nice variadic addition.