# `torch` — Autograd for NDArray

Lightweight automatic differentiation built on latpy's NDArray. Reverse-mode autograd with a PyTorch-like API. Pure stdlib, zero dependencies.

**Why it exists:** Gradient computation is essential for optimization and ML. This module tracks operations on `Tensor` objects, builds a computation graph, and computes gradients via reverse-mode automatic differentiation (backpropagation). The API mirrors PyTorch so users familiar with it can start immediately.

---

## Tensor

```python
from latpy.torch import Tensor, tensor, zeros_like
```

| Signature | Description |
|-----------|-------------|
| `tensor(data, requires_grad=False) -> Tensor` | Create a new tensor from lists, scalars, or NDArray |
| `Tensor(data, requires_grad, ...)` | Low-level constructor |
| `zeros_like(t, requires_grad=False) -> Tensor` | Zero-filled tensor with same shape |

```python
from latpy.torch import tensor
x = tensor([1.0, 2.0, 3.0], requires_grad=True)
print(x.tolist())  # [1.0, 2.0, 3.0]
print(x.shape)     # (3,)
```

**Properties:**
- `.data` — underlying `NDArray`
- `.grad` — accumulated gradient `NDArray` (or `None` before `backward()`)
- `.requires_grad` — whether this tensor tracks gradients
- `.shape`, `.ndim` — shape convenience properties

---

## Differentiable Operations

```python
from latpy.torch import (
    add, mul, sub, div, neg, pow,
    sin, cos, exp, log,
    sum, mean, matmul,
)
```

All operations support broadcasting and return new `Tensor` instances that track the computation graph.

| Operation | Forward | `backward()` gradient |
|-----------|---------|----------------------|
| `add(a, b)` | a + b | ∂/∂a = 1, ∂/∂b = 1 |
| `mul(a, b)` | a \* b | ∂/∂a = b, ∂/∂b = a |
| `sub(a, b)` | a − b | ∂/∂a = 1, ∂/∂b = −1 |
| `div(a, b)` | a / b | ∂/∂a = 1/b, ∂/∂b = −a/b² |
| `neg(a)` | −a | ∂/∂a = −1 |
| `pow(a, b)` | aᵇ | ∂/∂a = b·aᵇ⁻¹, ∂/∂b = aᵇ·ln(a) |
| `sin(a)` | sin(a) | ∂/∂a = cos(a) |
| `cos(a)` | cos(a) | ∂/∂a = −sin(a) |
| `exp(a)` | eᵃ | ∂/∂a = eᵃ |
| `log(a)` | ln(a) | ∂/∂a = 1/a |
| `sum(a)` | Σa | ∂/∂a = 1 |
| `mean(a)` | Σa / n | ∂/∂a = 1/n |
| `matmul(a, b)` | a @ b | ∂/∂a = grad @ bᵀ, ∂/∂b = aᵀ @ grad |

---

## Automatic Differentiation

```python
x = tensor([3.0], requires_grad=True)
y = mul(x, x)            # y = x²
z = add(y, x)            # z = x² + x
z.backward()             # dz/dx = 2x + 1 = 7
print(x.grad.tolist())   # [7.0]
```

Chain rule composes naturally:

```python
x = tensor([2.0], requires_grad=True)
y = pow(x, 2.0)          # y = x²
z = sin(y)               # z = sin(x²)
z.backward()             # dz/dx = cos(x²) · 2x
# ≈ cos(4.0) * 4.0 = -2.614...
```

---

## Optimizer

```python
from latpy.torch import SGD
```

| Signature | Description |
|-----------|-------------|
| `SGD(params, lr=0.01)` | Stochastic gradient descent |
| `.step()` | Update all parameters: p ← p − lr · p.grad |
| `.zero_grad()` | Clear gradients from all parameters |

```python
from latpy.torch import tensor, pow, SGD

x = tensor([5.0], requires_grad=True)
opt = SGD([x], lr=0.1)

# Training loop
for _ in range(50):
    loss = pow(x, 2.0)   # minimize x²
    loss.backward()
    opt.step()
    opt.zero_grad()

print(x.tolist()[0])  # ≈ 0.0
```

---

## Design Notes

- The computation graph is **dynamic** — rebuilt on every forward pass.
- Gradients **accumulate** across multiple `backward()` calls (call `zero_grad()` between iterations).
- `SGD` is a minimal optimizer. Extend the pattern for momentum, Adam, etc.
- All tensors are float64 (`F64`) — no mixed-precision support yet.
- The `ndarray` `.T` property works for transposition in `matmul` backward.