# `latmath.array` — N-Dimensional Arrays

The core array module. Pure Python, zero NumPy dependency. Provides a dense, strided, n-dimensional array backed by `array.array` with full dtype support, broadcasting, vectorized arithmetic, advanced indexing, reductions, and linear algebra.

**Why it exists:** latpy's entire numerical stack rests on NDArray. It solves the problem of how to do array-oriented computing (element-wise operations, broadcasting, reductions, linear algebra) using only the Python standard library. Every other module — stats, random, ML, io, viz — consumes or produces NDArray objects.

### Element-wise by default

All arithmetic operators (`+`, `-`, `*`, `/`, `//`, `**`, comparisons, bitwise) are **element-wise**, following NumPy convention. This means `A * B` multiplies corresponding elements, not matrices. Matrix multiplication is available via `numpy_compat.np.dot` / `numpy_compat.np.matmul`. The rationale is consistency: element-wise semantics make broadcasting natural, simplify dtype promotion, and match the expectation from NumPy that `+` never means "matrix multiply."

### Internal module split — rationale

The module is split into focused sub-modules, each owning one concern:

| Sub-module | Responsibility | Why separate |
|---|---|---|
| `ndarray.py` | `NDArray` class, constructors, indexing dispatchers | The class definition is the API surface; keeping it focused prevents a monolithic 5000-line file. |
| `ufuncs.py` | Element-wise ops (`add`, `mul`, `where`, comparisons, copy helpers) | Ufuncs have a distinct pattern (broadcast → promote → kernel → output); grouping them simplifies optimization and dtype dispatch. |
| `reduce.py` | Reductions (`sum`, `min`, `max` with axis support) | Reductions share axis-normalization logic, contiguity fast-paths, and output-shape computation. Isolating them avoids duplicating this machinery in ndarray.py. |
| `broadcast.py` | `broadcast_to` view | Broadcasting is purely a view-manipulation problem (0-stride expansion). Its logic is independent of element values. |
| `sort.py` | Sort, argsort, nonzero, std, var, cumsum, any, all, prod, clip | These are "further operations" that build on ndarray but are not needed for the core array interface. Separating them keeps ndarray.py lean. |
| `index.py` | Advanced indexing (fancy, orthogonal, setitem) | Indexing logic is complex (normalization, bounds checking, view construction, fill iteration). It merits its own file. |
| `linalg.py` | Linear algebra (sub, ssd, argmin, qr, eig, solve) | Linear algebra is algorithmically heavy and independently testable. Keeping it separate means you can use NDArray without pulling in QR decomposition. |
| `numpy_compat.py` | NumPy-compatible singleton API | A migration convenience (see [note below](#numpy-compatibility)). |
| `kernels/` | Dtype-specific element/reduction loops | Hot loops that differ per dtype; kept separate so they can be swapped for C extensions in future versions without touching higher-level code. |
| `dtypes.py` | DType, promotion | The type system is a separate concern used by every other sub-module. |

---

## Data Types

```python
from latpy.latmath.array.dtypes import DType, I64, F64, B1, parse_dtype, promote, dtype_of_scalar
```

| Constant | Kind | Size | Description |
|---|---|---|---|
| `I64` | `"i"` | 8 B | Signed 64-bit integer |
| `F64` | `"f"` | 8 B | Double-precision float |
| `B1` | `"b"` | 1 B | Boolean (0/1) |

**Functions:**

- `parse_dtype(dtype: str | DType | None) -> DType` — Resolve `"i64"`, `"f64"`, `"b1"`, or a `DType` instance. Raises `DTypeError` on unknown strings.
- `dtype_of_scalar(x) -> DType` — Infer dtype from a Python value (`int` → `I64`, `float` → `F64`, `bool` → `I64`). Note: `bool` is treated as `I64` in v0.0.7 (for consistency with integer arithmetic). Raises `DTypeError` for unsupported types (e.g., `str`, `complex`).
- `promote(a: DType, b: DType) -> DType` — Result type for mixed-dtype operations (`F64 > I64`). `promote(I64, F64) → F64`, `promote(B1, I64) → I64`, `promote(B1, F64) → F64`.

### Edge cases

```python
# Type inference from mixed lists
a = array([1, 2, 3.0])      # dtype=F64 because float promotes int
b = array([True, False, 1]) # dtype=I64 (bool treated as int)
c = array([True, 2.5])      # dtype=F64 (bool → int → float)

# Scalar wrapping
d = array(42)                # NDArray(shape=(1,), dtype=I64)
e = array(3.14)              # NDArray(shape=(1,), dtype=F64)

# Empty arrays
f = array([])                # NDArray(shape=(0,), dtype=I64)
g = zeros((0, 5), dtype=F64) # NDArray(shape=(0, 5), dtype=F64) — 0×5 matrix
h = zeros((5, 0), dtype=I64) # NDArray(shape=(5, 0), dtype=I64) — 5×0 matrix
```

---

## NDArray

```python
from latpy.latmath.array import NDArray, zeros, array
```

### Constructors

| Signature | Description |
|---|---|
| `zeros(shape, dtype=I64, axes=None) -> NDArray` | Zero-filled array. `shape` can be `int` or `tuple[int, ...]`. Raises `ShapeError` if shape has a negative dimension. |
| `array(obj, dtype=None, axes=None) -> NDArray` | From nested list, scalar, or NDArray. If `dtype` is `None`, the dtype is inferred by promoting the types of all scalar elements. |

**Examples with edge cases:**

```python
from latpy.latmath.array import zeros, array

# Basic construction
a = zeros((2, 3), dtype="f64")   # 2×3 float64 array of zeros
b = zeros(4, dtype=I64)           # 1D array of 4 zeros

# Empty arrays — valid shapes with size-0 dimensions
empty_row = zeros((0, 5))         # NDArray(shape=(0, 5), dtype=I64)
empty_col = zeros((5, 0))         # NDArray(shape=(5, 0), dtype=I64)
empty_1d  = array([])             # NDArray(shape=(0,), dtype=I64)

# Scalar wrapping — single Python values become 1-element 1D arrays
scalar_int = array(42)            # NDArray(shape=(1,), dtype=I64) — value: [42]
scalar_flt = array(3.14)          # NDArray(shape=(1,), dtype=F64) — value: [3.14]
scalar_bool = array(True)         # NDArray(shape=(1,), dtype=I64) — value: [1]

# Type inference from mixed lists — promotion follows F64 > I64 > B1
mixed_int_float = array([1, 2, 3.0])    # NDArray(shape=(3,), dtype=F64) — values: [1.0, 2.0, 3.0]
mixed_bool_int = array([True, False, 5]) # NDArray(shape=(3,), dtype=I64) — values: [1, 0, 5]
all_bools = array([True, False, True])   # NDArray(shape=(3,), dtype=I64) — values: [1, 0, 1]

# Ragged nested lists raise ShapeError
# array([[1, 2], [3]])  → ShapeError: ragged nested list detected

# From existing NDArray (copy or dtype change)
original = array([1, 2, 3], dtype=I64)
copy_exact = array(original)            # same data, new buffer
copy_f64 = array(original, dtype="f64") # copy with type cast → [1.0, 2.0, 3.0]
```

### Properties

| Name | Type | Description |
|---|---|---|
| `.shape` | `tuple[int, ...]` | Dimension lengths |
| `.strides` | `tuple[int, ...]` | Strides in elements (not bytes) |
| `.dtype` | `DType` | Element type |
| `.ndim` | `int` | Number of dimensions |
| `.axes` | `tuple[str, ...]` | Axis names (optional, defaults to `("a0", "a1", ...)`) |
| `.T` | `NDArray` | Transposed view (2D only; raises `ShapeError` for ndim≠2) |

```python
a = zeros((3, 4), dtype=F64, axes=("row", "col"))
a.shape    # (3, 4)
a.strides  # (4, 1) — row-major, each row is 4 elements apart
a.dtype    # DType(F64)
a.ndim     # 2
a.axes     # ("row", "col")
a.T.shape  # (4, 3)
```

### Methods

| Signature | Description |
|---|---|
| `.copy() -> NDArray` | Deep copy, materialized as contiguous |
| `.transpose(*perm) -> NDArray` | View with permuted axes; no args → reverse |
| `.reshape(*shape) -> NDArray` | View reshape (requires C-contiguous) |
| `.ravel() -> NDArray` | Flatten to 1D view (C-contiguous only) |
| `.flatten() -> NDArray` | Flatten to 1D copy |
| `.tolist() -> list` | To nested Python lists (B1 elements returned as `bool`) |
| `.astype(dtype) -> NDArray` | Type-cast copy |
| `.broadcast_to(shape) -> NDArray` | Broadcast view (zero-copy, 0-stride expansion) |
| `.mean(axis=None) -> float | NDArray` | Arithmetic mean |
| `.sum(axis=None) -> scalar | NDArray` |
| `.min(axis=None) -> scalar | NDArray` |
| `.max(axis=None) -> scalar | NDArray` |
| `.sort(axis=-1) -> NDArray` | In-place sort (last axis by default) |
| `.argsort() -> NDArray` | Indices that sort (1D only) |
| `.nonzero() -> NDArray` | Indices of non-zero elements |
| `.std(axis=None, ddof=0) -> float | NDArray` | Standard deviation |
| `.var(axis=None) -> float | NDArray` | Variance |
| `.cumsum(axis=None) -> NDArray` | Cumulative sum |
| `.any(axis=None) -> bool | NDArray(B1)` | Any element true |
| `.all(axis=None) -> bool | NDArray(B1)` | All elements true |
| `.prod(axis=None) -> int | float | NDArray` | Product of elements |
| `.clip(lo, hi) -> NDArray` | Clip values to `[lo, hi]` |

### Operators

All operators are **element-wise** — see [rationale above](#element-wise-by-default).

| Operator | Method | Description |
|---|---|---|
| `A + B` | `__add__` / `__radd__` | Element-wise add |
| `A - B` | `__sub__` / `__rsub__` | Element-wise subtract |
| `A * B` | `__mul__` / `__rmul__` | Element-wise multiply |
| `A / B` | `__truediv__` | Float division (always returns F64) |
| `A // B` | `__floordiv__` | Integer division |
| `A ** B` | `__pow__` | Power |
| `-A` | `__neg__` | Negation |
| `A == B` | `__eq__` | Equal (returns B1 array) |
| `A != B` | `__ne__` | Not equal (returns B1 array) |
| `A > B` | `__gt__` | Greater than (returns B1 array) |
| `A >= B` | `__ge__` | Greater or equal (returns B1 array) |
| `A < B` | `__lt__` | Less than (returns B1 array) |
| `A <= B` | `__le__` | Less or equal (returns B1 array) |
| `~A` | `__invert__` | Bitwise not (B1 only; raises `DTypeError` for I64/F64) |
| `A & B` | `__and__` | Bitwise and (B1 only) |
| `A | B` | `__or__` | Bitwise or (B1 only) |
| `A ^ B` | `__xor__` | Bitwise xor (B1 only) |
| `len(A)` | `__len__` | Size of first dimension (raises `IndexError` for 0-D?) |

### Indexing (`A[key]`)

| Key Type | Behavior |
|---|---|
| `int` | Scalar element (reduces a dimension) |
| `slice` | View (shares buffer) |
| `tuple` | Mixed int/slice/newaxis (e.g. `A[0, :, None]`) |
| `None` | Newaxis (inserts dim of size 1) |
| NDArray (B1) | Boolean mask compression: `A[A > 0]` (1D only in v0.0.7) |
| NDArray (I64) | Fancy integer indexing (1D or row selection for ndim≥2) |
| `tuple` of NDArray(I64) | Paired fancy indexing (n elements from n indices per axis) |
| `tuple` of (NDArray(I64), int) | Fancy rows + fixed column |
| `int, slice, ...` | Orthogonal indexing |

**Edge cases:**

```python
a = array([10, 20, 30, 40, 50])

# Out-of-bounds raises IndexError (via ShapeError)
# a[10]     → ShapeError: index out of bounds
# a[-100]   → ShapeError: index out of bounds

# Negative indices are supported
a[-1]      # 50  — last element
a[-3]      # 30  — third from end

# Slice with step, negative step
a[::2]     # NDArray([10, 30, 50]) — every other element
a[::-1]    # NDArray([50, 40, 30, 20, 10]) — reversed view

# Tuple of wrong length
# a[0, 1]  → ShapeError: Index arity mismatch: got 2 indices for ndim=1

# 2D: tuple length must equal ndim (after newaxis expansion)
b = zeros((3, 4))
b[0]       # OK — implicit trailing slice: equivalent to b[0, :]
b[0, :]    # OK — first row, all columns
# b[0, 1, 2]  → ShapeError: Index arity mismatch: got 3 indices for ndim=2

# Newaxis inserts a dimension
b[:, None].shape  # (3, 1, 4)

# Boolean mask (1D only)
c = array([1, -1, 3, -2, 5])
c[c > 0]   # NDArray([1, 3, 5])

# Fancy indexing out of bounds
# a[array([0, 10])]  → ShapeError: fancy index out of bounds: index 10 for size 5

# B1 mask with 2D array (not supported — raises ShapeError)
# b[b > 0]  → ShapeError: A[mask]: only supported for 1D arrays in v0.0.5
```

---

## Broadcasting

Broadcasting follows NumPy semantics: two shapes are compatible when, for each trailing dimension, the sizes are equal or one is 1. The result shape is the element-wise maximum. Broadcasting is implemented as a **zero-copy view** via 0-stride axis expansion.

### Edge cases

```python
# Scalar broadcasting — scalar is treated as shape (1,), broadcasts to any shape
a = array([1, 2, 3])
a + 10           # NDArray([11, 12, 13])

# 1D × 2D broadcast
row = array([1, 2, 3])          # shape (3,)
mat = zeros((4, 3))             # shape (4, 3)
row + mat                       # broadcasts: (3,) → (1, 3) → (4, 3)

# Incompatible shapes raise ShapeError
# array([1, 2, 3]) + array([1, 2])  → ShapeError: broadcast_shape: incompatible dims 3 vs 2
# zeros((3, 4)) + zeros((4, 3))      → ShapeError (trailing dims 4 vs 3)

# 1D → 2D broadcast with explicit broadcast_to
v = array([1, 2, 3])
v.broadcast_to((4, 3)).shape   # (4, 3) — 0-stride on axis 0
v.broadcast_to((4, 3)).strides # (0, 1) — axis 0 is broadcast (stride 0)

# Broadcasting to smaller ndim raises ShapeError
v.broadcast_to((3,))            # OK — same shape, returns self
# v.broadcast_to((3, 4, 5))    → OK — pad dims 0, stride 0
# v.broadcast_to((2,))          → ShapeError — incompatible dim 3 vs 2
```

---

## Reductions

Reductions (`sum`, `min`, `max`) support `axis=None` (full reduction to scalar), `axis=int` (reduce a single dimension), and negative axes.

### Edge cases

```python
from latpy.latmath.array import sum, min, max

a = array([[1, 2, 3], [4, 5, 6]], dtype=I64)

# axis=None — full reduction to scalar
sum(a)           # 21
min(a)           # 1
max(a)           # 6

# axis=0 — reduce rows (collapse first dimension)
sum(a, axis=0)   # NDArray([5, 7, 9])  — column sums
min(a, axis=0)   # NDArray([1, 2, 3])  — column minimums

# axis=1 — reduce columns (collapse second dimension)
sum(a, axis=1)   # NDArray([6, 15])    — row sums

# Negative axis
sum(a, axis=-1)  # NDArray([6, 15])    — same as axis=1
sum(a, axis=-2)  # NDArray([5, 7, 9])  — same as axis=0

# Empty array — sum returns 0, min/max raise DomainError
empty = array([], dtype=I64)
sum(empty)       # 0
# min(empty)     → DomainError: min: empty array
# max(empty)     → DomainError: max: empty array

# Empty reduction along a non-empty axis
empty_row = zeros((0, 5), dtype=I64)
sum(empty_row, axis=0)   # NDArray([0, 0, 0, 0, 0])  — shape (5,)
sum(empty_row, axis=1)   # NDArray([], dtype=I64)    — shape (0,)
# min(empty_row, axis=1)  → DomainError: min: empty reduction axis

# B1 reductions
b = array([True, False, True], dtype=B1)
sum(b)           # 2  — counts True values
min(b)           # False
max(b)           # True
```

---

## Module-level Functions

```python
from latpy.latmath.array import add, mul, sum, min, max, where, eq, ne, lt, le, gt, ge
from latpy.latmath.array import sub, ssd, argmin, qr, eig, solve
```

| Signature | Description |
|---|---|
| `add(A, B) -> NDArray` | Element-wise addition |
| `mul(A, B) -> NDArray` | Element-wise multiplication |
| `sum(A, axis=None) -> scalar | NDArray` |
| `min(A, axis=None) -> scalar | NDArray` |
| `max(A, axis=None) -> scalar | NDArray` |
| `where(M, A, B) -> NDArray` | Select from A (mask True) or B (mask False) |
| `eq(A, B) -> NDArray` | Element-wise `==` (returns B1) |
| `ne(A, B) -> NDArray` | Element-wise `!=` (returns B1) |
| `lt(A, B) -> NDArray` | Element-wise `<` (returns B1) |
| `le(A, B) -> NDArray` | Element-wise `<=` (returns B1) |
| `gt(A, B) -> NDArray` | Element-wise `>` (returns B1) |
| `ge(A, B) -> NDArray` | Element-wise `>=` (returns B1) |
| `sub(A, B) -> NDArray` | Element-wise subtraction |
| `ssd(A, axis=None) -> int | float | NDArray` | Sum of squared deviations |
| `argmin(A) -> NDArray` | Index of minimum (1D only) |
| `qr(A) -> (Q, R)` | QR decomposition |
| `eig(A) -> (λ, v)` | Dominant eigenvalue/vector (power iteration) |
| `solve(A, b) -> NDArray` | Linear solve (Gaussian elimination) |

### Ufunc edge cases: type promotion

```python
from latpy.latmath.array import add, mul

# Mixed I64/F64 → promoted to F64
i = array([1, 2, 3], dtype=I64)
f = array([1.5, 2.5, 3.5], dtype=F64)
add(i, f)          # NDArray([2.5, 4.5, 6.5], dtype=F64)

# Mixed B1/I64 → promoted to I64
b = array([True, False, True], dtype=B1)
i = array([10, 20, 30], dtype=I64)
add(b, i)          # NDArray([11, 20, 31], dtype=I64)  — True→1

# Boolean arithmetic — B1 operands produce I64 results via promotion
b1 = array([True, False], dtype=B1)
b2 = array([False, True], dtype=B1)
add(b1, b2)        # NDArray([1, 1], dtype=I64)  — True+False=1, False+True=1

# Scalar arithmetic with float
add(i, 2.5)        # NDArray([3.5, 4.5, 5.5], dtype=F64)
```

### Linear algebra edge cases

```python
from latpy.latmath.array.linalg import qr, eig, solve

# QR — requires ndim=2, rows ≥ cols
A_fine = array([[3.0, 2.0], [1.0, 4.0]], dtype=F64)
Q, R = qr(A_fine)   # A ≈ Q @ R

# Non-square QR: 4×2 matrix
A_rect = array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0]], dtype=F64)
Q, R = qr(A_rect)    # Q is 4×2, R is 2×2

# QR with more columns than rows — raises ShapeError
# qr(array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], dtype=F64))
#   → ShapeError: qr: number of rows must be >= number of columns

# QR with linearly dependent columns — raises DomainError
# qr(array([[1.0, 2.0], [2.0, 4.0]], dtype=F64))
#   → DomainError: qr: column is linearly dependent

# solve — requires square A, 1D b
x = solve(A_fine, array([5.0, 6.0], dtype=F64))  # NDArray([1.6, 0.7]) (approximately)

# Singular matrix solve — raises DomainError
# solve(array([[1.0, 2.0], [2.0, 4.0]], dtype=F64), array([1.0, 2.0], dtype=F64))
#   → DomainError: solve: singular matrix
# (The second row is 2× the first; determinant is 0.)

# Non-square solve — raises ShapeError
# solve(array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]], dtype=F64), array([1.0, 2.0], dtype=F64))
#   → ShapeError: solve: A must be square, got 3x2

# eig — requires square 2D
val, vec = eig(A_fine)  # dominant eigenvalue and corresponding eigenvector

# eig on non-square — raises ShapeError
# eig(array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], dtype=F64))
#   → ShapeError: eig requires square matrix, got 2x3

# eig convergence — power iteration has max_iter=1000, tol=1e-12
# For well-conditioned matrices, converges reliably. For pathological
# matrices (equal-magnitude eigenvalues), may converge slowly or
# to a combined mode. This is a known limitation of power iteration.
```

---

## Index Utilities

```python
from latpy.latmath.array.index import normalize_key, getitem, setitem
```

| Signature | Description |
|---|---|
| `normalize_key(key, ndim) -> tuple` | Canonicalize indices: int → slice, insert implicit trailing slices, expand `None` to newaxis |
| `getitem(A, key) -> scalar | NDArray` | Low-level get, bypassing NDArray.__getitem__ dispatch |
| `setitem(A, key, value) -> None` | Low-level set (scalar fill or array fill via `_fill_from_seq`) |

`normalize_key` handles:
- Non-tuple keys wrapping: `normalize_key(0, 2)` → `(0, slice(None))`
- Implicit trailing slices: `normalize_key((0,), 2)` → `(0, slice(None, None, None))`
- Newaxis positions: `normalize_key((None, 0), 2)` → `(0, slice(None, None, None))` with newaxis tracking
- Raises `ShapeError` on arity mismatch or unsupported index types

---

## NumPy Compatibility

```python
from latpy.latmath.array.numpy_compat import np

# Use like real numpy:
a = np.array([1, 2, 3])
b = np.zeros((2, 3), dtype=np.float64)
c = np.eye(4)
d = np.concatenate([a, b])
e = np.dot(a, b)
```

The `np` singleton (`NumpyCompatNamespace`) provides a familiar NumPy-like API backed entirely by latpy arrays. It maps:
- `np.int64` → `"i64"`, `np.float64` → `"f64"`, `np.bool_` → `"b1"`
- `np.array`, `np.zeros`, `np.ones`, `np.full`, `np.arange`, `np.linspace`, `np.eye`
- `np.concatenate`, `np.stack`, `np.expand_dims`, `np.reshape`, `np.copy`, `np.pad`, `np.triu`, `np.tril`
- Unary: `np.neg`, `np.abs`, `np.sign`, `np.square`, `np.sqrt`, `np.exp`, `np.log`, `np.log10`, `np.sin`, `np.cos`, `np.tanh`, `np.degrees`, `np.arccos`, `np.round`, `np.clip`
- Binary: `np.add`, `np.multiply`, `np.subtract`, `np.dot`, `np.matmul`, `np.where`, `np.maximum`, `np.minimum`
- Reductions: `np.sum`, `np.mean`, `np.std`, `np.var`, `np.min`, `np.max`, `np.any`, `np.all`, `np.prod`, `np.cumsum`
- Sort/Search: `np.argsort`, `np.argpartition`, `np.sort`, `np.nonzero`, `np.asarray`
- `np.linalg`: `norm`, `solve`, `inv`, `det`, `eigvals`, `svd`, `qr`, `eig`
- `np.random`: `seed`, `randn`, `randint`, `uniform`, `choice`, `rand`, `shuffle`

> **Note:** The `numpy_compat` module exists for **migration convenience** — it lets you take existing NumPy code and run it against latpy with minimal changes. It is **not recommended for new code**. Using the direct `latmath.array` API (NDArray, `zeros`, `array`, `add`, `sum`, etc.) produces clearer error messages, avoids the indirection of string-based dtype handling, and makes the dependency on latpy (not NumPy) explicit. The compat layer is maintained to ease gradual porting, not as a primary API.

```python
class NumpyCompatNamespace:
    # Dtypes
    int8, int16, int32, int64, float16, float32, float64, ndarray, newaxis

    # Creation
    array, zeros, ones, full, arange, linspace, eye

    # Manipulation
    concatenate, stack, expand_dims, reshape, copy, pad, triu, tril

    # Unary
    neg, abs, sign, square, sqrt, exp, log, log10, sin, cos, tanh, degrees, arccos, round, clip

    # Binary
    add, multiply, subtract, dot, matmul, where, maximum, minimum

    # Reductions
    sum, mean, std, var, min, max, any, all, prod, cumsum

    # Sorting / Searching
    argsort, argpartition, sort, nonzero, asarray

    # Sub-namespaces
    linalg:  norm, solve, inv, det, eigvals, svd, qr, eig
    random:  seed, randn, randint, uniform, choice, rand, shuffle
```

---

## Kernels

Internal element-wise and reduction routines — one per dtype. These are the hot loops that would be replaced by C extensions in a performance-optimized build:

```
kernels/elem_i64.py      add, mul, cmp, where (I64)
kernels/elem_f64.py      add, mul, cmp, where (F64)
kernels/reduce_i64.py    sum, min, max (I64)
kernels/reduce_f64.py    sum, min, max (F64)
kernels/propagate_b1.py  boolean propagation
kernels/propagate_sparse.py  sparse interval utilities
```

The kernel split mirrors the dtype dispatch in `ufuncs.py` and `reduce.py`: each operation checks the dtype at call time and dispatches to the appropriate kernel module. This design makes it straightforward to:
1. Add a new dtype (write `elem_<type>.py` and `reduce_<type>.py`, register in dispatch)
2. Replace Python loops with Cython/C extensions by swapping the kernel package
3. Test each dtype path independently