# `torch` — Autograd for NDArray Lightweight automatic differentiation built on latpy's NDArray. Reverse-mode autograd with a PyTorch-like API. Pure stdlib, zero dependencies. **Why it exists:** Gradient computation is essential for optimization and ML. This module tracks operations on `Tensor` objects, builds a computation graph, and computes gradients via reverse-mode automatic differentiation (backpropagation). The API mirrors PyTorch so users familiar with it can start immediately. --- ## Tensor ```python from latpy.torch import Tensor, tensor, zeros_like ``` | Signature | Description | |-----------|-------------| | `tensor(data, requires_grad=False) -> Tensor` | Create a new tensor from lists, scalars, or NDArray | | `Tensor(data, requires_grad, ...)` | Low-level constructor | | `zeros_like(t, requires_grad=False) -> Tensor` | Zero-filled tensor with same shape | ```python from latpy.torch import tensor x = tensor([1.0, 2.0, 3.0], requires_grad=True) print(x.tolist()) # [1.0, 2.0, 3.0] print(x.shape) # (3,) ``` **Properties:** - `.data` — underlying `NDArray` - `.grad` — accumulated gradient `NDArray` (or `None` before `backward()`) - `.requires_grad` — whether this tensor tracks gradients - `.shape`, `.ndim` — shape convenience properties --- ## Differentiable Operations ```python from latpy.torch import ( add, mul, sub, div, neg, pow, sin, cos, exp, log, sum, mean, matmul, ) ``` All operations support broadcasting and return new `Tensor` instances that track the computation graph. | Operation | Forward | `backward()` gradient | |-----------|---------|----------------------| | `add(a, b)` | a + b | ∂/∂a = 1, ∂/∂b = 1 | | `mul(a, b)` | a \* b | ∂/∂a = b, ∂/∂b = a | | `sub(a, b)` | a − b | ∂/∂a = 1, ∂/∂b = −1 | | `div(a, b)` | a / b | ∂/∂a = 1/b, ∂/∂b = −a/b² | | `neg(a)` | −a | ∂/∂a = −1 | | `pow(a, b)` | aᵇ | ∂/∂a = b·aᵇ⁻¹, ∂/∂b = aᵇ·ln(a) | | `sin(a)` | sin(a) | ∂/∂a = cos(a) | | `cos(a)` | cos(a) | ∂/∂a = −sin(a) | | `exp(a)` | eᵃ | ∂/∂a = eᵃ | | `log(a)` | ln(a) | ∂/∂a = 1/a | | `sum(a)` | Σa | ∂/∂a = 1 | | `mean(a)` | Σa / n | ∂/∂a = 1/n | | `matmul(a, b)` | a @ b | ∂/∂a = grad @ bᵀ, ∂/∂b = aᵀ @ grad | --- ## Automatic Differentiation ```python x = tensor([3.0], requires_grad=True) y = mul(x, x) # y = x² z = add(y, x) # z = x² + x z.backward() # dz/dx = 2x + 1 = 7 print(x.grad.tolist()) # [7.0] ``` Chain rule composes naturally: ```python x = tensor([2.0], requires_grad=True) y = pow(x, 2.0) # y = x² z = sin(y) # z = sin(x²) z.backward() # dz/dx = cos(x²) · 2x # ≈ cos(4.0) * 4.0 = -2.614... ``` --- ## Optimizer ```python from latpy.torch import SGD ``` | Signature | Description | |-----------|-------------| | `SGD(params, lr=0.01)` | Stochastic gradient descent | | `.step()` | Update all parameters: p ← p − lr · p.grad | | `.zero_grad()` | Clear gradients from all parameters | ```python from latpy.torch import tensor, pow, SGD x = tensor([5.0], requires_grad=True) opt = SGD([x], lr=0.1) # Training loop for _ in range(50): loss = pow(x, 2.0) # minimize x² loss.backward() opt.step() opt.zero_grad() print(x.tolist()[0]) # ≈ 0.0 ``` --- ## Design Notes - The computation graph is **dynamic** — rebuilt on every forward pass. - Gradients **accumulate** across multiple `backward()` calls (call `zero_grad()` between iterations). - `SGD` is a minimal optimizer. Extend the pattern for momentum, Adam, etc. - All tensors are float64 (`F64`) — no mixed-precision support yet. - The `ndarray` `.T` property works for transposition in `matmul` backward.