torch — Autograd for NDArray

Lightweight automatic differentiation built on latpy’s NDArray. Reverse-mode autograd with a PyTorch-like API. Pure stdlib, zero dependencies.

Why it exists: Gradient computation is essential for optimization and ML. This module tracks operations on Tensor objects, builds a computation graph, and computes gradients via reverse-mode automatic differentiation (backpropagation). The API mirrors PyTorch so users familiar with it can start immediately.


Tensor

from latpy.torch import Tensor, tensor, zeros_like

Signature

Description

tensor(data, requires_grad=False) -> Tensor

Create a new tensor from lists, scalars, or NDArray

Tensor(data, requires_grad, ...)

Low-level constructor

zeros_like(t, requires_grad=False) -> Tensor

Zero-filled tensor with same shape

from latpy.torch import tensor
x = tensor([1.0, 2.0, 3.0], requires_grad=True)
print(x.tolist())  # [1.0, 2.0, 3.0]
print(x.shape)     # (3,)

Properties:

  • .data — underlying NDArray

  • .grad — accumulated gradient NDArray (or None before backward())

  • .requires_grad — whether this tensor tracks gradients

  • .shape, .ndim — shape convenience properties


Differentiable Operations

from latpy.torch import (
    add, mul, sub, div, neg, pow,
    sin, cos, exp, log,
    sum, mean, matmul,
)

All operations support broadcasting and return new Tensor instances that track the computation graph.

Operation

Forward

backward() gradient

add(a, b)

a + b

∂/∂a = 1, ∂/∂b = 1

mul(a, b)

a * b

∂/∂a = b, ∂/∂b = a

sub(a, b)

a − b

∂/∂a = 1, ∂/∂b = −1

div(a, b)

a / b

∂/∂a = 1/b, ∂/∂b = −a/b²

neg(a)

−a

∂/∂a = −1

pow(a, b)

aᵇ

∂/∂a = b·aᵇ⁻¹, ∂/∂b = aᵇ·ln(a)

sin(a)

sin(a)

∂/∂a = cos(a)

cos(a)

cos(a)

∂/∂a = −sin(a)

exp(a)

eᵃ

∂/∂a = eᵃ

log(a)

ln(a)

∂/∂a = 1/a

sum(a)

Σa

∂/∂a = 1

mean(a)

Σa / n

∂/∂a = 1/n

matmul(a, b)

a @ b

∂/∂a = grad @ bᵀ, ∂/∂b = aᵀ @ grad


Automatic Differentiation

x = tensor([3.0], requires_grad=True)
y = mul(x, x)            # y = x²
z = add(y, x)            # z = x² + x
z.backward()             # dz/dx = 2x + 1 = 7
print(x.grad.tolist())   # [7.0]

Chain rule composes naturally:

x = tensor([2.0], requires_grad=True)
y = pow(x, 2.0)          # y = x²
z = sin(y)               # z = sin(x²)
z.backward()             # dz/dx = cos(x²) · 2x
# ≈ cos(4.0) * 4.0 = -2.614...

Optimizer

from latpy.torch import SGD

Signature

Description

SGD(params, lr=0.01)

Stochastic gradient descent

.step()

Update all parameters: p ← p − lr · p.grad

.zero_grad()

Clear gradients from all parameters

from latpy.torch import tensor, pow, SGD

x = tensor([5.0], requires_grad=True)
opt = SGD([x], lr=0.1)

# Training loop
for _ in range(50):
    loss = pow(x, 2.0)   # minimize x²
    loss.backward()
    opt.step()
    opt.zero_grad()

print(x.tolist()[0])  # ≈ 0.0

Design Notes

  • The computation graph is dynamic — rebuilt on every forward pass.

  • Gradients accumulate across multiple backward() calls (call zero_grad() between iterations).

  • SGD is a minimal optimizer. Extend the pattern for momentum, Adam, etc.

  • All tensors are float64 (F64) — no mixed-precision support yet.

  • The ndarray .T property works for transposition in matmul backward.