Specification: BQN evaluation

This page describes the semantics of the code constructs whose grammar is given in grammar.md. The formation rules there are not named, and here they are identified by either the name of the term or by copying the rule entirely if there are several alternative productions.

Here we assume that the referent of each identifier, or equivalently the connections between identifiers, have been identified according to the scoping rules.

Evaluation is an ordered process, and any actions required to evaluate a node always have a specified order unless performing them in any order would have the same effect. Side effects that are relevant to ordering are setting and getting the value of a variable, and causing an error. Errors described in this page are "evaluation errors" and can be caught by the Catch () modifier. If caught, evaluation halts without attempting to complete any in-progress node, and is restarted by Catch.

As specified, BQN programs can involve an arbitrary amount of information, but when run there will be memory and possibly other limitations. To accommodate this, any part of evaluation can cause an error, if a resource such as memory, stack memory, or limited execution time is exhausted.

Programs and blocks

The result of parsing a valid BQN program is a PROGRAM, and the program is run by evaluating this term.

A PROGRAM or BODY is a list of STMTs, which are evaluated in program order. A BODY also allows an EXPR followed by "?" in place of an STMT: then the expression is evaluated as usual but its result is checked as discussed below. A result is always required for BODY nodes, and sometimes for PROGRAM nodes (for example, when loaded with •Import). If any identifiers in the node's scope are exported, or any of its statements is an EXPORT, then the result is the namespace created in order to evaluate the node. If a result is required but the namespace case doesn't apply, then the last STMT node must be an EXPR and its result is used. The statement EXPR evaluates some BQN code and possibly assigns the results, while nothing evaluates any subject or Derv terms it contains but discards the results. An EXPORT statement performs no action.

A block consists of several BODY terms, some of which may have an accompanying header describing accepted inputs and how they are processed. An immediate block blSub is evaluated when reached. Other types of blocks don't evaluate any BODY immediately, but instead return a function or modifier that obtains its result by evaluating a particular BODY. The BODY is identified and evaluated once the block has received enough inputs (operands or arguments), which for modifiers takes one call for an IMM_BLK and two for an ARG_BLK. If two calls are required, then on the first call the operands are simply stored and no code is evaluated yet. The stored values can be accessed by equality checking, or •Decompose if defined.

To evaluate a block when enough inputs have been received, each case (I_CASE, A_CASE, or S_CASE), excluding A_CASE nodes whose ARG_HEAD contains "⁼", is tried in order. If any case completes, the block returns the result of that evaluation, and if all cases are tried but none finishes, an error results. A case might not complete because of an incompatible header or failed predicate, as described below. A general case (one with no header or predicates, as defined in the grammar) is always compatible, unless it is the first of two general cases in an ARG_BLK block and a left argument is given—this will be handled by the second case.

If a case has a header, then it must structurally match the inputs to begin evaluation. That is, if headX is an lhs, the right argument must match that structure, and similarly for HeadF with a left operand and HeadG with a right operand. If headW is an lhs, there must be a left argument matching that structure. This means that 𝕨 not only matches any left argument but also no argument. The test for compatibility is the same as for destructuring assignment described below, except that the header may contain constants, which must match the corresponding part of the given argument. For a compatible header, inputs and other names are bound when evaluation of a BODY is begun. Special names are always bound when applicable: 𝕨𝕩𝕤 if arguments are used, 𝕨 if there is a left argument, 𝕗𝕘 if operands are used, and _𝕣 and _𝕣_ for modifiers and combinators, respectively. Any names in the header are also bound, allowing multiple assignment for arguments.

When a predicate "?" is evaluated, the associated EXPR is evaluated and its result is checked. If it's not one of the numbers 0 or 1, an error results. If it's 1, evaluation of the BODY continues as usual. If it's 0, evaluation is stopped and the next compatible BODY term is evaluated using the block's original inputs.

If there is no left argument, but the BODY contains 𝕨 or 𝕎 at the top level, then it is conceptually re-parsed with 𝕨 replaced by · to give a monadic version before application; this modifies the syntax tree by replacing some instances of subject, arg, or Operand with nothing. The token 𝕎 is not allowed in this case and causes an error. Re-parsing 𝕨 can also cause an error if it's used as an operand or list element, where nothing is not allowed by the grammar. Note that these errors must not appear if the block is always called with two arguments. True re-parsing is not required, as the same effect can also be achieved dynamically by treating · as a value and checking for it during execution. If it's used as a left argument, then the function should instead be called with no left argument (and similarly in trains); if it's used as a right argument, then the function and its left argument are evaluated but rather than calling the function · is "returned" immediately; and if it's used in another context then it causes an error.

Assignment

An assignment is one of the four rules containing ASGN. It is evaluated by first evaluating the right-hand-side subExpr, FuncExpr, _m1Expr, or _m2Exp_ expression, and then storing the result in the left-hand-side identifier or identifiers. The result of the assignment expression is the result of its right-hand side. Except for subjects, only a lone identifier is allowed on the left-hand side and storage sets it equal to the result. For subjects, destructuring assignment is performed when an lhs is lhsList, lhsStr, or lhsArray. Destructuring assignment is performed recursively by assigning right-hand-side values to the left-hand-side targets, with single-identifier assignment as the base case. The target "·" is also possible in place of a NAME, and performs no assignment.

In assignment to lhsList or lhsStr, the right-hand-side value, here called v, must be a list (rank 1 array) or namespace. If it's a list, then each LHS_ENTRY node must be an LHS_ELT. The left-hand side is treated as a list of lhs targets, and matched to v element-wise, with an error if the two lists differ in length. If v is a namespace, then the left-hand side must be an lhsStr where every LHS_ATOM is an NAME, or an lhsList where every LHS_ENTRY is an NAME or lhs "⇐" NAME, so that it can be considered a list of NAME nodes some of which are also associated with lhs nodes. To perform the assignment, the value of each name is obtained from the namespace v, giving an error if v does not define that name. The value is assigned to the lhs node if present (which may be a destructuring assignment or simple subject assignment), and otherwise assigned to the same NAME node used to get it from v.

Assignment to lhsArray destructures the major cells of right-hand-side value v, which must be an array of rank at least 1. The number of cells in v is its length l, that is, the first element of its shape. The shape of each is the shape of v without its first element, and the cell ravels are formed by splitting v's ravel evenly into l sections. Besides this difference in how v is divided, assignment behaves the same way as assignment of a list v to lhsList.

A destructuring assignment is performed in program order, or equivalently index order, with each sub-assignment fully completed before beginning the next (a depth-first order). Thus if an assignment with encounters an error but it's caught with , some of the assignment may have already been performed, changing variable values.

Modified assignment is the subject assignment rule lhs Derv "↩" subExpr?. This case results in an error if lhs contains "·", "⇐", or an empty lhsArray node (one with no LHS_ELT components). With these conditions, the grammar for lhs is a subset of subExpr; the node is evaluated as if it were a subExpr, and passed as an argument to Derv. The full application is lhs Derv subExpr, if subExpr is given, and Derv lhs otherwise. Its value is assigned to lhs, and is also the result of the modified assignment expression.

Expressions

We now give rules for evaluating an atom, Func, _mod1 or _mod2_ expression (the possible options for ANY). A literal or primitive sl, Fl, _ml, or _cl_ has a fixed value defined by the specification (literals and built-ins). An identifier s, F, _m, or _c_, if not preceded by atom ".", must have an associated variable due to the scoping rules, and returns this variable's value, or causes an error if it has not yet been set. If it is preceded by atom ".", then the atom node is evaluated first; its value must be a namespace, and the result is the value of the identifier's name in the namespace, or an error if the name is undefined. A parenthesized expression such as "(" _modExpr ")" simply returns the result of the interior expression. A block is defined by the evaluation of the statements it contains after all parameters are accepted, as described above.

A list "⟨" ? ( ( EXPR )* EXPR ? )? "⟩" or ANY ( "‿" ANY )+ consists grammatically of a list of expressions. To evaluate it, each expression is evaluated in source order and their results are placed as elements of a rank-1 array. The two forms have identical semantics but different punctuation. The square bracket notation "[" ? ( EXPR )* EXPR ? "]" evaluates expressions in the same way, but makes them into major cells of an array instead of elements. The result is identical to applying the primitive function Merge (>) to a list of the expression results.

Rules in the table below are function and modifier evaluation.

L Left Called Right R Types
𝕨 ( subject | nothing )? Derv arg 𝕩 Function, subject
𝕗 Operand _mod1 1-Modifier
𝕗 Operand _mod2_ ( subject | Func ) 𝕘 2-Modifier

In each case the constituent expressions are evaluated in reverse source order: Right, then Called, then Left. Then the expression's result is obtained by calling the Called value on its parameters. A left argument of nothing is not used as a parameter, leaving only a right argument in that case. The type of the Called value must be appropriate to the expression type, as indicated in the "Types" column. For function application, a data type (number, character, or array) is allowed. It is called simply by returning itself. Although the arguments are ignored in this case, they are still evaluated. A block is evaluated by binding the parameter names given in columns L and R to the corresponding values. Then if all parameter levels present have been bound, its body is evaluated to give the result of application.

Modifiers that are evaluated when they receive operands are called immediate. Other modifiers, including primitives and some kinds of block, simply record the operands and are called deferred. The result of applying a deferred modifier once is called a derived function, and is one kind of compound function.

The rules for trains create another kind of compound function. A compound function is identified by the rule that created it, and the values of its parts.

Left Center Right Result
Operand Derv Fork {(𝕨L𝕩)C(𝕨R𝕩)}
nothing? Derv Fork { C(𝕨R𝕩)}

A train is a function that, when called, calls the right-hand function on all arguments, then the left-hand function, and calls the center function with these results as arguments. As with applications, all expressions are evaluated in reverse source order before doing anything else. Then a result is formed without calling the center value. Its behavior as a function is described in the rightmost column, using L, C, and R for the results of the expressions in the left, center, and right columns, respectively.