`latpy.ml` — Machine Learning

Pure stdlib ML algorithms backed by NDArray. Every model operates on latpy.latmath.array.NDArray and requires zero third-party dependencies — no numpy, scipy, or sklearn.

Clustering

from latpy.ml import kmeans

Signature	Description
`kmeans(X, k, max_iter=100, tol=1e-4, seed=42) -> (centroids, labels, inertia)`	K-Means clustering. Returns (k×d centroids, n×1 labels, inertia).

Algorithm: Lloyd’s K-Means

K-Means minimizes the within-cluster sum of squares:

[ \arg\min_{{c_j},{z_i}} \sum_{i=1}^n |x_i - c_{z_i}|_2^2 ]

by alternating between assignment (each point to nearest centroid) and update (recompute centroids as cluster means). It is a greedy, iterative algorithm that converges to a local optimum (the objective is non-convex). Initial centroids are chosen as k distinct random samples from X (seeded for reproducibility).

Why K-Means? Simple, interpretable, O(nkd) per iteration. Works well on spherical, well-separated clusters. Sensitive to initialization and outliers.

Complexity: O(nkd) per iteration, O(kd) memory for centroids.

Example

from latpy.latmath.array import array
from latpy.ml import kmeans

X = array([[1.0, 2.0], [1.5, 1.8], [5.0, 8.0],
           [8.0, 8.0], [1.0, 0.6], [9.0, 11.0]])

centroids, labels, inertia = kmeans(X, k=2, seed=42)

print(centroids.tolist())
# [[1.167, 1.467], [7.333, 9.0]]

print(labels.tolist())
# [0, 0, 1, 1, 0, 1]

print(inertia)
# 9.207   (sum of squared distances to nearest centroid)

Edge Cases

Scenario	Behavior	Rationale
k=1	Returns single centroid at data mean, all labels 0	A single cluster is just the global mean; every point is assigned to it.
k >= n_samples	Raises `ValueError`	Cannot choose `k` distinct random samples from fewer points.
k=0 or negative	Raises `ValueError`	Guarded at entry.
Convergence tolerance (tol)	Halts when max centroid shift < tol (default 1e-4)	Avoids wasted iterations once movement is negligible. Also halts early if no labels change between iterations.
Different seeds	Different random initial centroids → different local optima	Lloyd’s is sensitive to initialization; seed gives reproducible restarts.
Axis-aligned data	Works, but may split elongated clusters artificially	K-Means assumes isotropic (spherical) clusters; axis-aligned elongated clusters can be split across the “middle” where Euclidean distance is ambiguous.
Single-sample clusters	Centroids collapse to the lone point, inertia contribution zero for that point	A cluster with count=1 yields zero shift but is still valid.

Linear Models

from latpy.ml import LinearRegression

Signature	Description
`LinearRegression(fit_intercept=True)`	OLS via normal equations
`.fit(X, y) -> self`	Fit model
`.predict(X) -> NDArray`	Predict
`.score(X, y) -> float`	R² coefficient of determination

Algorithm: Ordinary Least Squares (Closed-Form)

Solves the normal equations:

[ \hat{w} = (X^T X)^{-1} X^T y ]

via Gaussian elimination with partial pivoting. When fit_intercept=True, a column of ones is prepended to the design matrix; the first coefficient becomes intercept_ and the rest become coef_.

Why OLS closed-form? No learning rate, no iteration, globally optimal for the linear least-squares objective. The normal equations are the standard approach for small-to-moderate d (number of features). For very large d, iterative methods (gradient descent) would be more efficient.

Complexity: O(d²n + d³) — constructing X^T X is O(d²n), solving the d×d system is O(d³).

Example

from latpy.latmath.array import array
from latpy.ml import LinearRegression

X = array([[1.0], [2.0], [3.0], [4.0]])
y = array([2.0, 4.0, 6.0, 8.0])

lr = LinearRegression(fit_intercept=True)
lr.fit(X, y)

print(lr.coef_.tolist())   # [2.0]
print(lr.intercept_)       # ~0.0
print(lr.predict(array([[5.0]])).tolist())  # [10.0]
print(lr.score(X, y))      # 1.0  (perfect fit)

# Without intercept — forces line through origin
lr2 = LinearRegression(fit_intercept=False)
lr2.fit(array([[1.0], [2.0], [3.0]]), array([1.0, 2.0, 3.0]))
print(lr2.coef_.tolist())  # [1.0]
print(lr2.predict(array([[4.0]])).tolist())  # [4.0]

Edge Cases

Scenario	Behavior	Rationale
Singular X (rank-deficient)	Gaussian elimination produces a solution, but coefficients may be numerically unstable; no explicit rank check	X^T X is singular when features are linearly dependent. The pivot search in `_gaussian_elimination` may skip zero columns, producing an arbitrary solution among infinitely many. Always check that features are linearly independent.
n < d (underdetermined)	OLS still runs but is ill-posed — infinitely many solutions	Normal equations produce a particular solution; use regularization or reduce features.
Single feature	Works as simple linear regression; slope in `coef_[0]`, intercept in `intercept_`	Trivially correct.
Constant features (entire column same value)	Makes X^T X nearly singular; coefficients may be large or NaN	A constant column is collinear with the intercept column when `fit_intercept=True`. Either drop the column or set `fit_intercept=False`.
fit_intercept=False	No column of ones; model forced through origin	Useful when the relationship is known to be proportional. R² can be negative if the origin-constrained fit is worse than predicting the mean.
Empty X (n=0)	Raises `ValueError`	Need at least one sample.
Zero-feature X (p=0)	Raises `ValueError`	Need at least one feature.

Metrics

from latpy.ml import accuracy, precision, recall, f1_score, confusion_matrix
from latpy.ml import mean_squared_error, mean_absolute_error, r2_score

Signature	Description
`accuracy(y_true, y_pred) -> float`	Classification accuracy (correct / total)
`precision(y_true, y_pred, pos_label=1) -> float`	Precision = TP / (TP + FP)
`recall(y_true, y_pred, pos_label=1) -> float`	Recall = TP / (TP + FN)
`f1_score(y_true, y_pred, pos_label=1) -> float`	F1 = 2·P·R / (P + R)
`confusion_matrix(y_true, y_pred, labels=None) -> NDArray(I64)`	Confusion matrix
`mean_squared_error(y_true, y_pred) -> float`	MSE
`mean_absolute_error(y_true, y_pred) -> float`	MAE
`r2_score(y_true, y_pred) -> float`	R² coefficient of determination

Examples

from latpy.latmath.array import array
from latpy.ml import accuracy, precision, recall, f1_score, confusion_matrix
from latpy.ml import mean_squared_error, mean_absolute_error, r2_score

# --- Classification ---
y_true = array([1, 0, 1, 1, 0])
y_pred = array([1, 0, 0, 1, 0])

print(accuracy(y_true, y_pred))        # 0.8
print(precision(y_true, y_pred))       # 1.0    (TP=2, FP=0)
print(recall(y_true, y_pred))          # 0.6667 (TP=2, FN=1)
print(f1_score(y_true, y_pred))        # 0.8

cm = confusion_matrix(y_true, y_pred)
print(cm.tolist())
# [[2, 0],
#  [1, 2]]
# Rows: true, Cols: pred.  Order: [0, 1]

# --- Multiclass confusion matrix ---
y_mc_t = array([0, 1, 2, 0, 1, 2])
y_mc_p = array([0, 2, 1, 0, 1, 2])
cm3 = confusion_matrix(y_mc_t, y_mc_p)
print(cm3.tolist())
# [[2, 0, 0],
#  [0, 1, 1],
#  [0, 1, 1]]

# Custom label ordering
cm_custom = confusion_matrix(y_mc_t, y_mc_p, labels=[2, 1, 0])
print(cm_custom.tolist())
# [[1, 1, 0],
#  [1, 1, 0],
#  [0, 0, 2]]

# --- Regression ---
y_t = array([1.0, 2.0, 3.0])
y_p = array([1.5, 2.5, 3.5])

print(mean_squared_error(y_t, y_p))    # 0.25
print(mean_absolute_error(y_t, y_p))   # 0.5
print(r2_score(y_t, y_p))             # negative (worse than mean predictor)

Edge Cases

Scenario	Behavior	Rationale
Empty y_true/y_pred (n=0)	Accuracy returns 0.0; MSE/MAE return nan via division by zero	No samples to evaluate. Accuracy explicitly guards with `n > 0`.
All-correct predictions	accuracy=1.0, precision=1.0, recall=1.0, f1=1.0, MSE=0.0, MAE=0.0, R²=1.0	Ideal scenario; all metrics at their best possible values.
All-wrong predictions	accuracy=0.0; precision depends on FP; recall=0.0; F1=0.0 if P+R=0; R² may be negative	No correct predictions. R² can be < 0 because predicting the mean would be better.
No positive predictions (TP+FP=0)	Precision returns 0.0	Guarded: `tp / (tp + fp)` when denominator is 0 → 0.
No positive ground truth (TP+FN=0)	Recall returns 0.0	Guarded: `tp / (tp + fn)` when denominator is 0 → 0.
P + R = 0	F1 returns 0.0	Guarded before the harmonic mean formula to avoid division by zero.
R² with constant y_true	Returns 0.0 (ss_tot=0 guard)	If the target has zero variance, any model that doesn’t predict exactly the constant will have ss_res > 0, but the guard returns 0.0 instead of -inf.
Multi-class confusion matrix	Auto-detects labels from the union of y_true and y_pred	Ensures all classes (even those with zero predictions) appear in the matrix.

SOV Models (State-Observation-Vector)

from latpy.ml.sov import SOVRegression, SOVClassifier, SOVDynamics

Architecture & Rationale

SOV models use a two-stage architecture:

Encoder (random projection): A fixed random matrix ( W_{\text{enc}} \in \mathbb{R}^{p \times m} ) (entries drawn i.i.d. from ( \mathcal{N}(0, 1/m) )) projects the p-dimensional input into an m-dimensional latent state space. The encoder is never trained — it is seeded with random.seed(42) and fixed after construction.
Decoder (learned): A simple model operates in the reduced state space:
- SOVRegression: Least-squares linear decoder from states to output.
- SOVClassifier: Nearest-centroid classification in state space.
- SOVDynamics: Linear state transition matrix A + observation matrix C.

Why random projections? The Johnson-Lindenstrauss lemma guarantees that random projections approximately preserve pairwise distances with high probability when projecting into dimension ( O(\log n) ). This makes the state space a “sketch” of the input that retains enough structure for linear methods to work. The seed ensures reproducibility — the same projection is always used for a given input dimension.

Why SOV exists: For very high-dimensional inputs where memory or compute is constrained, random projections provide a fast, lightweight dimensionality reduction that doesn’t require the full SVD or gradient-based training. The encoder is O(pm) once, the decoder is trained in the smaller state space.

Complexity: O(nds) where n = samples, d = features, s = n_states (the latent dimension).

SOVRegression

Random-projection encoder + least-squares decoder.

Training: Normalizes features, projects to latent states via random matrix, then solves ( Z^T Z w = Z^T y ) via Gaussian elimination on the state-space representation.

Signature	Description
`SOVRegression(n_states=2)`	SOV regression model
`.fit(X, y) -> self`	Fit encoder/decoder
`.predict(X) -> NDArray`	Predict
`.score(X, y) -> float`	R²

Example

from latpy.latmath.array import array
from latpy.ml.sov import SOVRegression

X = array([[1.0], [2.0], [3.0], [4.0]])
y = array([2.0, 4.0, 6.0, 8.0])

sov = SOVRegression(n_states=2)
sov.fit(X, y)
pred = sov.predict(X)
print(pred.tolist())         # e.g. [2.0, 4.0, 6.0, 8.0] (near-perfect fit)
print(sov.score(X, y))       # ~1.0 (perfect linear relationship captured)

Edge Cases

Scenario	Behavior
n_states > n_features	Works but the random projection expands dimensionality rather than reducing it; decoder still fits via least squares
n_states = 0	Not guarded at init; fit will likely fail with singular Z^T Z
Predict before fit	Raises `RuntimeError`

SOVClassifier

Random-projection encoder + nearest-centroid classification.

Training: Projects inputs to latent states via random encoder, then computes class centroids in state space. Prediction assigns each point to the nearest centroid.

Signature	Description
`SOVClassifier(n_states=4)`	SOV classifier
`.fit(X, y) -> self`	Fit model
`.predict(X) -> NDArray(I64)`	Predict labels
`.score(X, y) -> float`	Accuracy

Example

from latpy.latmath.array import array
from latpy.ml.sov import SOVClassifier

X = array([[1.0, 2.0], [2.0, 1.0], [5.0, 5.0], [6.0, 6.0]])
y = array([0, 0, 1, 1])

clf = SOVClassifier(n_states=2)
clf.fit(X, y)
pred = clf.predict(X)
print(pred.tolist())         # e.g. [0, 0, 1, 1]
print(clf.score(X, y))       # 1.0 (linearly separable in projected space)

Edge Cases

Scenario	Behavior
Predict before fit	Raises `RuntimeError`
Large n_states	Can overfit; state space becomes as large as feature space, nearest-centroid may memorize
Single class	Works; all predictions collapse to that class

SOVDynamics

Discrete-time linear state-space model:

[ z_{t+1} = A , z_t \quad (\text{state transition}) ] [ x_t = C , z_t \quad (\text{observation}) ]

The system is designed to be contractive: A is scaled so its spectral radius < 0.9, ensuring stability. The equilibrium state is the dominant eigenvector of A (the direction that survives repeated application).

Signature	Description
`SOVDynamics(n_states=2, n_obs=3)`	SOV dynamics model
`.fit_random(seed=42) -> self`	Initialize stable random A, C
`.equilibrium(max_iter=1000, tol=1e-10) -> NDArray`	Dominant eigenvector (equilibrium state)
`.step(z) -> NDArray`	Advance state by one step: z_{t+1} = A @ z
`.observe(z) -> NDArray`	Project state to observation: x = C @ z
`.simulate(n_steps, z0=None) -> (states, obs)`	Simulate full trajectory

Example

from latpy.latmath.array import array
from latpy.ml.sov import SOVDynamics

dyn = SOVDynamics(n_states=2, n_obs=3)
dyn.fit_random(seed=42)

# Step
z = array([1.0, 0.0])
z1 = dyn.step(z)
print(z1.tolist())  # e.g. [0.246, -0.121]

# Observe
x = dyn.observe(z)
print(x.tolist())   # e.g. [0.185, -0.291, -0.084]

# Simulate 10 steps
states, obs = dyn.simulate(n_steps=10)
print(states.shape)  # (11, 2)
print(obs.shape)     # (11, 3)

# Find equilibrium
z_star = dyn.equilibrium()
print(z_star.tolist())  # unit-norm vector, dominant eigenvector of A

# Verify equilibrium: A @ z_star ≈ λ z_star (direction preserved)
z_next = dyn.step(z_star)
cos_sim = sum(z_next[i] * z_star[i] for i in range(2)) / \
          (sum(z_next[i]**2 for i in range(2))**0.5 * sum(z_star[i]**2 for i in range(2))**0.5)
print(cos_sim)  # > 0.9 (direction nearly preserved)

Edge Cases

Scenario	Behavior	Rationale
n_states > n_features	Works; A is an n_states×n_states matrix independent of input dimension	The dynamics operate purely in latent state space; input dimension is irrelevant.
Different random seeds	Different A, C matrices → different dynamics, different equilibrium	The seed parameterizes different random systems.
Unstable system (spectral radius > 1)	`fit_random` scales A to have spectral radius < 0.9, so this is prevented by design	Without scaling, power iteration in `equilibrium()` would diverge.
Zero initial state	`simulate` with z0 = zeros produces all-zero trajectory	Linear system: A·0 = 0. Equilibrium is also zero (trivial eigenvector).
fit_random before equilibrium/step/simulate	Required; raises `RuntimeError` if A_ is None	The matrices must be initialized before use.
Equilibrium on unstable system	Power iteration still converges to the dominant eigenvector, but its norm grows without bound	Since `fit_random` guarantees stability, this is not an issue in practice.

latpy.ml — Machine Learning

Clustering

Algorithm: Lloyd’s K-Means

Example

Edge Cases

Linear Models

Algorithm: Ordinary Least Squares (Closed-Form)

Example

Edge Cases

Metrics

Examples

Edge Cases

SOV Models (State-Observation-Vector)

Architecture & Rationale

SOVRegression

Example

Edge Cases

SOVClassifier

Example

Edge Cases

SOVDynamics

Example

Edge Cases

`latpy.ml` — Machine Learning