# Arithmetic function benchmarks

Implementation notes

The arithmetic functions `+-รโโ`

, comparisons, and floating-point division, are standard SIMD functionality. With `+-ร`

an overflow check is needed, and if it happens a result in a larger type needs to be created. Dyadic `|`

has some optimization for integers but is only really fast when `๐จ`

is an atom.

Most other primitives, including `รทโโ`

and `โโผ`

, require conversion to floats, so will ideally run at the same speed for all types. Libraries to compute others using SIMD exist but CBQN doesn't use anything like this yet.

## Monadic

Mostly the same as the dyadic case. There's a SIMD `โ`

instruction. Primitives `-`

and `|`

can overflow for an integer argument containing the smallest value of that type, or boolean 1 for `-`

. This slows them down for i16 and smaller cases; i32 isn't likely to include the minimum value.

## Table

Arithmetic Table works like a bunch of scalar-list operations, with the list repeated. For long right arguments this is straightforward; for shorter ones the constant overhead of setting up scalar-list arithmetic as well as the efficiency loss for uneven lengths become expensive. So Table switches to list-list arithmetic, expanding `๐จ`

with a constant Replicate `k/๐จ`

, to keep the overhead from going too high.

## Leading-axis extension

Leading-axis arithmetic is a series of scalar-list operations like Table, the difference being that it has a new list at each step. So the performance is pretty similar; not having to reshape `๐ฉ`

speeds it up for smaller widths but the extra memory traffic slows it down for larger ones.

## Trailing-axis extension, with Rank

This case does list-list operations but with one fixed list. For small widths it just has to repeat the lower-rank argument, which is less intrusive than constant Replicate. However, list-list arithmetic is slower than scalar-list.