My actual talk goes into more detail about the philosophy as we begin. In Ken Iverson's Turing Award lecture Notation as a Tool of Thought, valuable insight comes front and center.

APL: The Basics

This section primarily covers the basics of APL syntax, and introduces some functions we'll use later.

For a much more technical look, see function valence. Note the distinction between a function application and a function. A given function (like -) may be applied as a monad or dyad; it doesn't have to be one or the other.

The last example illustrates the assignment token ←, and that functions can be given names.

We'll use the function Combinations (!) later so it'll be helpful to understand it.

Like most languages, APL has parentheses for when you want a different ordering.

There are two ways to think about APL's order of operations: APL is evaluated from right to left (imperative), or each function applies to the value immediately to the left, if any, and everything to the right up to an enclosing parenthesis or the end of the statement (functional).

APL is not a functional language: functions are second-class rather than first class values.

APL's syntax makes functional patterns harder and array-oriented patterns easier. In fact syntactic choices like the function-value distinction probably play a large role in shaping how programming paradigms like array orientation develop.

In APL, the ordinary numbers 0 and 1 are used for booleans.

Numbers usually aren't stored as pairs of doubles. Dyalog APL is very good at keeping data in small internal types, and this is a huge performance advantage relative to languages like C which always use the declared type.

The convention of forming lists automatically from adjacent arguments is called "stranding".

The second statement shows a nested list formed by using the rule with lists as the arguments.

]box on should be set by default in the provided scripts.

There's no dedicated string type in APL; it uses lists of characters.

Here we see that Equal to (=) is an arithmetic operation, returning 0 or 1, rather than assignment. It maps over lists in exactly the same way as other arithmetic does.

⎕IO is the "index origin", that is, the index used for the leading element of a list. It's set to 1 by default in Dyalog but I need it to be 0 in various places. Like the boxed display, it's set up already in the provided docker resources.

For an operator such as Commute (⍨), regular arguments are allowed as operands as well as functions.

The term "reduction" comes from APL: the operator was around at least as early as 1962.

The function ⌊ takes the minimum of its two arguments. The paired floor/ceiling symbols in mathematics are also from APL, which has since dropped the pairing.

Compose (∘) in most cases does whatever writing the operands and arguments with no operator in the middle would do. In Dyalog 18.0 and later it's usually better to use Atop (⍤), which applies the right operand to all arguments and the left operand to its result.

A lot shorter than most languages. APL does have a fair amount of irregular syntax though (Outer Product is one example).

The fact that operators and functions go in opposite directions sounds pretty dissonant, but works fine in practice. It was a deliberate choice, not an accident.

Hopefully by now you feel that reading APL is possible, even if you have to make your way through expressions slowly at first.

The Outer Product Operator

Outer Product was initially considered a special case of matrix product, which is an unfortunate origin for such a fundamental primitive. The syntax is not like most operators for this reason: it's derived from the ordinary dyadic operator Inner Product (.). I won't discuss Inner Product here.

This statement uses Commute (⍨) from the previous section to place the same argument on both sides of the outer product (∘. with operand +).

That argument is generated with iota (⍳; officially, "Index Generator") which here returns the first 12 indices. Since we've specified that indices start at zero we add one to get the numbers used in a traditional multiplication table.

This is a complicated expression. Take your time!

APL has no "true" and "false" so = returns 1 or 0. This has some significant advantages, like the ease of expressing the computation above.

The expression {⍺,' ',⍵} is called a "dfn" (DEE-fun with hardly any u). It's a way of writing lexically scoped anonymous functions. The left argument, if any, is always named ⍺ and the right argument is always named ⍵, pronounced as the Greek letters alpha and omega.

Catenate , combines two lists, such as the string arguments given here. It also handles single elements like the space character.

So each element of the result is a string or list of characters.

The second expression is just to show that APL can do more complicated things fairly easily when required; maybe it's something to come back to when you have more experience. Do note however the parallel between the outer product used for names and the other (∘.+) for numbers.

Recall that ! is the combination counting function from mathematics.

Residue (|) computes a modular division, like % in some other languages but with the arguments reversed.

Transpose exchanges horizontal and vertical axes like in matrix arithmetic.

Using Outer Product with comparisons like Less-Than (<) is a nice way to generate triangular matrices.

The first dfn places its argument alongside that argument's transpose using stranding.

Since = compares arguments one value at a time, another function Match (≡) is provided to test whether they are identical as a whole.

The result is really a 3-dimensional cube; to display it on the screen it's split into 2D slices.

Here two outer products with string catenation are combined. Catenation is associative because it doesn't matter which order you use to join things together; the elements all end up in order in a list.

Similarly, it doesn't matter in which order we outer-product-catenate. It's equivalent to say that each book has three chapters and each of these chapters has five paragraphs; or that each chapter is composed of five paragraphs and each book is composed of three of these chapters.

By now you should know what Outer Product does. But maybe not what it is.

Arrays

Now we are getting somewhere.

An array is something that is shaped like an outer product!

Combining indices for lists of length 2, 3, and 4 gives us indices for a 2×3×4 array.

The Index Generator can actually do this for us if passed a list of lengths instead of a single length. This list is called…

…the array's shape, and its length is the rank.

A monoid is a set with an operation on that set that is associative, and has an identity element (like a group without inverses).

I skipped over the identity element here in order to discuss it later.

Don't worry too much about understanding Split.

Many languages don't really have multidimensional arrays; they simply simulate them with nested lists. The difference is highlighted here: in one of these languages, these three values would all be considered the same.

It's hard to make everything fit on a slide sometimes…

Flat (non-nested) arrays are also different from nested lists in that each dimension has to have a uniform length. So nested lists can express things that flat arrays can't (but nested arrays can).

The identity function ⊢ is used here to print the value of d.

The leading axis viewpoint brings some benefits of working with nested lists to the array model. It may seem obvious for someone used to nested lists but the idea that it's useful to prioritise some dimensions of an array was a big shift in APL thinking.

The choice of the leading axis rather than the trailing axis is superior because it means that each cell is contiguous, both on the screen and in memory. The choice to write that axis at list index 0 (and to write index 0 at the left) is independent and arbitrary.

This is the essence of arrays in my opinion.

Also in my opinion, the most concrete difference between static and dynamic typing is that with static typing you know something about your values even if there aren't any of them (or they aren't computed yet).

We get one empty line at the top because there are two cells with no lines in them: the space is the one separating these cells. After one reduction there's just one cell with no lines.

This makes APL's transpose a true self-inverse. In other languages transpose can lose information.

A big part of the reason why +/ or sum is called a reduction is that it reduces the rank of its argument by one.

Zilde (⍬) is the empty list, and is different from the empty string (''). This is because APL stores in each empty array a "prototype" indicating what shape, nested structure, and type(s) the elements would have if there were any. Unlike zeros in the shape, the prototype isn't a clean theoretical concept and doesn't make sense with all array operations.

The last code block shows that the shape of the empty list is a one-element list, not an empty list.

The expressions shown indicate that to find the bound of an outer product, we can take the product of the shape of the outer product, or the product of the combined argument shapes, or the product of the two arguments bounds.

Bound is a composition of two monoid transformations (homomorphisms): ⍴ goes from arrays to shapes, and ×/ goes from shapes to bounds.

I now introduce the term "scalar" which I have been awkwardly stepping around before.

APL implementations universally recognise simple reductions like +/ and ×/ on empty arrays in order to return the identity element; some can have pretty extensive code to find identities for other functions.

Note that it's only the identity element when the arguments are restricted to be arrays of lists. For an array of scalars, there is no identity element since the result will contain arrays with rank at least 1.

A table of monoids, and one not-quite-monoid: the lengths resulting from Tally (≢), which form a semigroup.

The columns show the name, the function to get it from an array (recall ⊢ is the identity function), the monoid operation, the identity element, and an example.

Tally doesn't fit: the operation ⊣ simply returns the left argument, so every element is a right identity but no element is a left identity. The problem is that tally is required to always be a number, and scalars are given a tally of one even though they have no axes to count along. If it instead used a maybe-number, with scalars given a tally of nothing and other arrays just a number, then it would be a monoid.

Finally we can put it all into place.

Array thinking is so deeply embedded in APL that I had to introduce a new term "argument" to talk about the things functions are called on without having to first define "array".

I hope you've got some questions. I hope you're questioning everything.

The Rank Operator

And its name is the Rank operator, in case you missed it.

⌽ reverses along the last axis, and ⊖ reverses along the first axis (the leading one!). The symbols are nicely intuitive for matrices: hold onto the bar, and flip 180° around it.

Dyadic ⍴ in the first block is Reshape, which creates an array with the shape given on the left and elements taken from the right argument.

Now Rank tells the operand function to work individually on the two rank-2 cells of the array.

For ⌽ this has no effect but for ⊖ it provides another bit of useful functionality.

The combination of Rank and a leading-axis reverse make the trailing-axis reverse redundant. The same pattern holds for many of the old-school paired APL functions and operators.

The two reductions / and ⌿ form another pair. However, ⌿ isn't a true leading-axis primitive: rather than apply the function to the major cells of its argument, it applies it to individual elements within those major cells—essentially, along columns. For +/ there's no difference since + is a scalar function, but for functions that operate on whole arrays it's very different.

The array language J was designed in accordance with the leading axis theory, and fixes many problems like this (its version of reduction is called Insert). Dyalog has taken many primitives from J by now; Rank is one of them.

I stated in the talk that Take (↑) and Drop (↓) were a lucky design, working with leading axis theory even though they predate it. Later I learned that this is not the case: originally, a length had to be specified for every axis (or an axis specified). They were extended by SHARP APL to allow fewer lengths than axes after the development of leading axis theory (see Take's history).

The left argument 'rank' here is used three times, while each cell on the right is only used once.

Some useful formatting code! Format (⍕) converts arrays to strings like the ones shown by the session.

The second statement shows a function with two ranks, one on the left and one on the right.

Scalar functions work like they are called with rank 0 all the time (in Dyalog, they additionally enjoy pervasion but suffer from singleton extension).

0*0 has to be 1 for polynomial evaluation to work right.

For a non-scalar function f, this would be written f⍤0⍤0 1 or f⍤1 0⍤0 1.

Because Dyalog has a maximum rank of 15, 99 is used here to denote "infinity".

This example shows how Rank can be adapted to work easily with either row vectors or column vectors. In most cases row vectors are more intuitive but column vectors provide better performance because the three cells are large and the operations performed on them are easily vectorised.

With all these nested ranks the above code makes for a tough conceptual challenge, but if you're willing to work through it you may find you have a much deeper understanding of arrays!

A Modern Functional Approach

The code in this section is real Haskell, and not the pseudo-Haskell I showed earlier.

A straightforward definition with map. Note that this version is is tied to lists, and not arrays. You can't use it multiple times to construct a high-rank array in the same way.

This is analogous to the decomposition of Outer Product into two applications of Rank from the previous section.

Functors are pretty well-known in the typed functional programming world.

Wait, what was that last bit?

This transformation of types is known as the reader monad (every (functional programming, not APL) monad is a functor).

This combinator is something I often want when using APL; I introduced it to my programming language I (brought to you by the letter I) years ago with the name "split-compose" but only found its relation to Outer Product while preparing this talk.

The Over operator has been added in Dyalog version 18.0. I presented it in a talk at Dyalog '19.

Another example of a combinator related to the Outer Product is the "owl" combinator, which satisfies ((.)$(.)) g x h y = g x (h y) = (outer g id h) x y.

Thinking with Outer Product

We didn't reach this section in the talk. Consequently, no thinking occurred.

The Cantor set is a fairly simple construction: the visualisation above is just eight lines of Javascript with d3. What's important about how we'll approach it in APL isn't that it's shorter, but that it gives us deeper insight.

⎕R is Dyalog's replace operator, using PCRE regex. It replaces instances matching the left operand string with the right operand; \0 on the right is expanded to the first matching group, in this case the entire match.

We will use the Power operator for iteration in many of our solutions, as it's a very good fit for the structure of this problem.

Ravel (,) reshapes an array to a vector with the same bound. Like Reshape, it assumes a top-to-bottom, left-to-right ordering.

{⍺×⍵} is the same as ×, and ×⍤0 1 on two vectors is an outer product!

In other words, Cantor set iterations are flattened outer products of many copies of 1 0 1, and we can combine them all starting from the left or the right.

Remember that reduction always reduces the rank by one, so reduction on a vector returns a scalar. We want the high-rank array contained in this scalar (its only element), so we use First (⊃), the opposite of Enclose, to get it out before ravelling.

A "sufficiently smart interpreter" could actually recognise that ∘.∧ is associative and optimise either the iteration or reduction to take advantage of this.

The construction with reduce makes the place values very concrete, but it would also be possible to do this with power multiplying the left side by 10 at each step. That operation isn't associative and would have to be done right to left.

Our counting mechanism is very flexible because the base and list of digits are independent.

That last line looks familiar…

Wikipedia makes the claim that "The geometric mean of the Cantor set is approximately 0.274974", marked as "unreliable source". With APL we can make a quick sanity check on this claim.

First we use our boolean Cantor set generator, find index of each 1 using Where (⍸), and convert it to a midpoint by adding 0.5 and scaling. The geometric mean of these midpoints approximates the Cantor set's geometric mean (under an appropriate measure).

But we can also generate those indices directly using the counting techniques above. This requires an outer product of length-2 rather than length-3 arrays, so it's exponentially more efficient.

The utility function cmpx compares two expressions, showing the runtime in seconds and an asterisk if the results differ (here they don't). Given that cgm0 processes 30 billion intervals, a second and a half isn't bad at all! On the other hand, the asymptotic advantage of the index method, which works with only the 4 million nonempty intervals, is clear.

(That's not actually a free group's Cayley graph, but that doesn't prevent it from looking cool.)

Outer Product Performance

Not performed.

These timings compare Dyalog APL's outer products to gcc implementations, compiled for a generic architecture (like the Dyalog interpreter) and natively (on a recent Intel CPU). Dyalog 18.0 uses the same code to calculate its result in this case but memory allocation is changed to make better use of cache. Both native gcc and Dyalog use AVX2 instructions; Dyalog checks for these at runtime so the programmer doesn't even need to know about them.

Benefits of the array paradigm are apparent with a large left argument and small right argument. There's an advanced algorithm at work here, based on fast methods to copy each left argument element and the entire right argument.

Dyalog's poor performance when on argument has length 1 is due to an extra copy after the operation. The outer product is reduced to a simple arithmetic function which sometimes optimises better, but then the result shape needs to be changed.

Dyalog runs close to native C performance but can beat gcc for smaller right arguments, especially when using small integer types.

Here Dyalog performed poorly for small types. This is because it always assumes the addition will overflow, even though it's easy to calculate using the ranges of both arguments. It has since been improved.

However, this also highlights one of Dyalog's key performance strengths: it tends to use a type that fits the data range well. Even if Dyalog can't beat gcc on matching types, it's likely to use a smaller type since the programmer doesn't have to specify it. For nontrivial outer products the arguments are always "squeezed" to the smallest possible type.

Some not-so-great cases. Vectorised multiplication instructions are an absolute mess in x86 and 2-byte multiply is a casualty. It can be fixed but doesn't fit into our current framework well.

This is not an equivalent comparison. Dyalog supports bit booleans while C doesn't really, so the C code uses byte booleans for its result.

Packing to byte booleans is slightly harder computationally but the output can then be processed much faster. Version 18.0 takes advantage of this for small right arguments: while version 17.1 uses the same strategy for comparison as the other operations, 18.0 swaps sides and reverses the comparison to compute a transposed result. Then it transposes this with a fast bit matrix transpose (see my talk on boolean Transpose at Dyalog '17).

APL's advantages are multiplied when we consider the question of boolean outer products, such as the Cantor set. It uses the same general strategies as before, but is able to do up to eight times as much work because it automatically packs eight bits to a byte, something nearly unheard of in other languages.

In the first measurements, C beats APL for a few cases with longer right arguments because of the work required to align the rows. I really hate it when the best minds at Google, Microsoft, and Red Hat beat me at booleans, so after seeing these measurements I wrote some code to reuse result rows which are already aligned correctly.

Boolean and (∧) is identical to multiplication both in results and performance, so this is the operation used for computing Cantor set iterations. Each row is either a copy of the right argument or zero; it's a little faster than Not-Equals prior to the row-reuse code because zero rows are easier to write.

As mentioned in the talk, there are several slides I didn't write:

Outer Product implementation
- Dyalog has fast vectorised code to copy each element of an array a few times (this is an APL primitive, Replicate). It uses this on the left argument and repeats the small right argument to match, so that all the inner code is vectorised. I developed this algorithm by APL; I don't think I would have come up with it otherwise. Even if I had, it's far easier to apply in the implementation of an array language with an Outer Product than in another language without one since the compiler would have to recognise a particular kind of nested loop or map.
Usage of Outer Product
- A real-world usage of the outer product is to compute membership or index in a small list quickly for each entry of a large list. The number of comparisons is linear in the length of the large list, and comparisons are vectorised. Because of Dyalog's fast boolean operations the results of comparisons can also be combined quickly, and the overall operation is very fast. It's usually possible to beat it (my Dyalog '18 talk covered this), but it requires some very advanced vector instruction programming.
- A simple example is to compute the indices of all points in a small circle. The computation for a boolean mask with 1s for each point in the circle can be decomposed so that nearly all the work is confined to a single comparison outer product, and APL has a primitive to get indices from the mask which runs very quickly.

Look. You and I both know you just jumped down here without loading all those frames. Sure, you can now see "The End" with a nice sparkly background, but do you really feel like you've accomplished anything?

Well, while you're here…

Every background image, colors and all, is generated using an outer product in single line of J (some of the lines are long, but nothing preposterous). The image is a shape 1080 1920 3 array of color intensities between 0 and 1, and the function wr rescales to 0–255 and writes it to a file. Everything else in the script (after 0 : 0) is a string literal used as a comment: just some utilities and other ideas that aren't needed to make the images. Images take a few seconds at most to generate.