latpy.ml — Machine Learning

Pure stdlib ML algorithms backed by NDArray. Every model operates on latpy.latmath.array.NDArray and requires zero third-party dependencies — no numpy, scipy, or sklearn.


Clustering

from latpy.ml import kmeans

Signature

Description

kmeans(X, k, max_iter=100, tol=1e-4, seed=42) -> (centroids, labels, inertia)

K-Means clustering. Returns (k×d centroids, n×1 labels, inertia).

Algorithm: Lloyd’s K-Means

K-Means minimizes the within-cluster sum of squares:

[ \arg\min_{{c_j},{z_i}} \sum_{i=1}^n |x_i - c_{z_i}|_2^2 ]

by alternating between assignment (each point to nearest centroid) and update (recompute centroids as cluster means). It is a greedy, iterative algorithm that converges to a local optimum (the objective is non-convex). Initial centroids are chosen as k distinct random samples from X (seeded for reproducibility).

Why K-Means? Simple, interpretable, O(nkd) per iteration. Works well on spherical, well-separated clusters. Sensitive to initialization and outliers.

Complexity: O(nkd) per iteration, O(kd) memory for centroids.

Example

from latpy.latmath.array import array
from latpy.ml import kmeans

X = array([[1.0, 2.0], [1.5, 1.8], [5.0, 8.0],
           [8.0, 8.0], [1.0, 0.6], [9.0, 11.0]])

centroids, labels, inertia = kmeans(X, k=2, seed=42)

print(centroids.tolist())
# [[1.167, 1.467], [7.333, 9.0]]

print(labels.tolist())
# [0, 0, 1, 1, 0, 1]

print(inertia)
# 9.207   (sum of squared distances to nearest centroid)

Edge Cases

Scenario

Behavior

Rationale

k=1

Returns single centroid at data mean, all labels 0

A single cluster is just the global mean; every point is assigned to it.

k >= n_samples

Raises ValueError

Cannot choose k distinct random samples from fewer points.

k=0 or negative

Raises ValueError

Guarded at entry.

Convergence tolerance (tol)

Halts when max centroid shift < tol (default 1e-4)

Avoids wasted iterations once movement is negligible. Also halts early if no labels change between iterations.

Different seeds

Different random initial centroids → different local optima

Lloyd’s is sensitive to initialization; seed gives reproducible restarts.

Axis-aligned data

Works, but may split elongated clusters artificially

K-Means assumes isotropic (spherical) clusters; axis-aligned elongated clusters can be split across the “middle” where Euclidean distance is ambiguous.

Single-sample clusters

Centroids collapse to the lone point, inertia contribution zero for that point

A cluster with count=1 yields zero shift but is still valid.


Linear Models

from latpy.ml import LinearRegression

Signature

Description

LinearRegression(fit_intercept=True)

OLS via normal equations

.fit(X, y) -> self

Fit model

.predict(X) -> NDArray

Predict

.score(X, y) -> float

R² coefficient of determination

Algorithm: Ordinary Least Squares (Closed-Form)

Solves the normal equations:

[ \hat{w} = (X^T X)^{-1} X^T y ]

via Gaussian elimination with partial pivoting. When fit_intercept=True, a column of ones is prepended to the design matrix; the first coefficient becomes intercept_ and the rest become coef_.

Why OLS closed-form? No learning rate, no iteration, globally optimal for the linear least-squares objective. The normal equations are the standard approach for small-to-moderate d (number of features). For very large d, iterative methods (gradient descent) would be more efficient.

Complexity: O(d²n + d³) — constructing X^T X is O(d²n), solving the d×d system is O(d³).

Example

from latpy.latmath.array import array
from latpy.ml import LinearRegression

X = array([[1.0], [2.0], [3.0], [4.0]])
y = array([2.0, 4.0, 6.0, 8.0])

lr = LinearRegression(fit_intercept=True)
lr.fit(X, y)

print(lr.coef_.tolist())   # [2.0]
print(lr.intercept_)       # ~0.0
print(lr.predict(array([[5.0]])).tolist())  # [10.0]
print(lr.score(X, y))      # 1.0  (perfect fit)
# Without intercept — forces line through origin
lr2 = LinearRegression(fit_intercept=False)
lr2.fit(array([[1.0], [2.0], [3.0]]), array([1.0, 2.0, 3.0]))
print(lr2.coef_.tolist())  # [1.0]
print(lr2.predict(array([[4.0]])).tolist())  # [4.0]

Edge Cases

Scenario

Behavior

Rationale

Singular X (rank-deficient)

Gaussian elimination produces a solution, but coefficients may be numerically unstable; no explicit rank check

X^T X is singular when features are linearly dependent. The pivot search in _gaussian_elimination may skip zero columns, producing an arbitrary solution among infinitely many. Always check that features are linearly independent.

n < d (underdetermined)

OLS still runs but is ill-posed — infinitely many solutions

Normal equations produce a particular solution; use regularization or reduce features.

Single feature

Works as simple linear regression; slope in coef_[0], intercept in intercept_

Trivially correct.

Constant features (entire column same value)

Makes X^T X nearly singular; coefficients may be large or NaN

A constant column is collinear with the intercept column when fit_intercept=True. Either drop the column or set fit_intercept=False.

fit_intercept=False

No column of ones; model forced through origin

Useful when the relationship is known to be proportional. R² can be negative if the origin-constrained fit is worse than predicting the mean.

Empty X (n=0)

Raises ValueError

Need at least one sample.

Zero-feature X (p=0)

Raises ValueError

Need at least one feature.


Metrics

from latpy.ml import accuracy, precision, recall, f1_score, confusion_matrix
from latpy.ml import mean_squared_error, mean_absolute_error, r2_score

Signature

Description

accuracy(y_true, y_pred) -> float

Classification accuracy (correct / total)

precision(y_true, y_pred, pos_label=1) -> float

Precision = TP / (TP + FP)

recall(y_true, y_pred, pos_label=1) -> float

Recall = TP / (TP + FN)

f1_score(y_true, y_pred, pos_label=1) -> float

F1 = 2·P·R / (P + R)

confusion_matrix(y_true, y_pred, labels=None) -> NDArray(I64)

Confusion matrix

mean_squared_error(y_true, y_pred) -> float

MSE

mean_absolute_error(y_true, y_pred) -> float

MAE

r2_score(y_true, y_pred) -> float

R² coefficient of determination

Examples

from latpy.latmath.array import array
from latpy.ml import accuracy, precision, recall, f1_score, confusion_matrix
from latpy.ml import mean_squared_error, mean_absolute_error, r2_score

# --- Classification ---
y_true = array([1, 0, 1, 1, 0])
y_pred = array([1, 0, 0, 1, 0])

print(accuracy(y_true, y_pred))        # 0.8
print(precision(y_true, y_pred))       # 1.0    (TP=2, FP=0)
print(recall(y_true, y_pred))          # 0.6667 (TP=2, FN=1)
print(f1_score(y_true, y_pred))        # 0.8

cm = confusion_matrix(y_true, y_pred)
print(cm.tolist())
# [[2, 0],
#  [1, 2]]
# Rows: true, Cols: pred.  Order: [0, 1]

# --- Multiclass confusion matrix ---
y_mc_t = array([0, 1, 2, 0, 1, 2])
y_mc_p = array([0, 2, 1, 0, 1, 2])
cm3 = confusion_matrix(y_mc_t, y_mc_p)
print(cm3.tolist())
# [[2, 0, 0],
#  [0, 1, 1],
#  [0, 1, 1]]

# Custom label ordering
cm_custom = confusion_matrix(y_mc_t, y_mc_p, labels=[2, 1, 0])
print(cm_custom.tolist())
# [[1, 1, 0],
#  [1, 1, 0],
#  [0, 0, 2]]

# --- Regression ---
y_t = array([1.0, 2.0, 3.0])
y_p = array([1.5, 2.5, 3.5])

print(mean_squared_error(y_t, y_p))    # 0.25
print(mean_absolute_error(y_t, y_p))   # 0.5
print(r2_score(y_t, y_p))             # negative (worse than mean predictor)

Edge Cases

Scenario

Behavior

Rationale

Empty y_true/y_pred (n=0)

Accuracy returns 0.0; MSE/MAE return nan via division by zero

No samples to evaluate. Accuracy explicitly guards with n > 0.

All-correct predictions

accuracy=1.0, precision=1.0, recall=1.0, f1=1.0, MSE=0.0, MAE=0.0, R²=1.0

Ideal scenario; all metrics at their best possible values.

All-wrong predictions

accuracy=0.0; precision depends on FP; recall=0.0; F1=0.0 if P+R=0; R² may be negative

No correct predictions. R² can be < 0 because predicting the mean would be better.

No positive predictions (TP+FP=0)

Precision returns 0.0

Guarded: tp / (tp + fp) when denominator is 0 → 0.

No positive ground truth (TP+FN=0)

Recall returns 0.0

Guarded: tp / (tp + fn) when denominator is 0 → 0.

P + R = 0

F1 returns 0.0

Guarded before the harmonic mean formula to avoid division by zero.

R² with constant y_true

Returns 0.0 (ss_tot=0 guard)

If the target has zero variance, any model that doesn’t predict exactly the constant will have ss_res > 0, but the guard returns 0.0 instead of -inf.

Multi-class confusion matrix

Auto-detects labels from the union of y_true and y_pred

Ensures all classes (even those with zero predictions) appear in the matrix.


SOV Models (State-Observation-Vector)

from latpy.ml.sov import SOVRegression, SOVClassifier, SOVDynamics

Architecture & Rationale

SOV models use a two-stage architecture:

  1. Encoder (random projection): A fixed random matrix ( W_{\text{enc}} \in \mathbb{R}^{p \times m} ) (entries drawn i.i.d. from ( \mathcal{N}(0, 1/m) )) projects the p-dimensional input into an m-dimensional latent state space. The encoder is never trained — it is seeded with random.seed(42) and fixed after construction.

  2. Decoder (learned): A simple model operates in the reduced state space:

    • SOVRegression: Least-squares linear decoder from states to output.

    • SOVClassifier: Nearest-centroid classification in state space.

    • SOVDynamics: Linear state transition matrix A + observation matrix C.

Why random projections? The Johnson-Lindenstrauss lemma guarantees that random projections approximately preserve pairwise distances with high probability when projecting into dimension ( O(\log n) ). This makes the state space a “sketch” of the input that retains enough structure for linear methods to work. The seed ensures reproducibility — the same projection is always used for a given input dimension.

Why SOV exists: For very high-dimensional inputs where memory or compute is constrained, random projections provide a fast, lightweight dimensionality reduction that doesn’t require the full SVD or gradient-based training. The encoder is O(pm) once, the decoder is trained in the smaller state space.

Complexity: O(nds) where n = samples, d = features, s = n_states (the latent dimension).


SOVRegression

Random-projection encoder + least-squares decoder.

Training: Normalizes features, projects to latent states via random matrix, then solves ( Z^T Z w = Z^T y ) via Gaussian elimination on the state-space representation.

Signature

Description

SOVRegression(n_states=2)

SOV regression model

.fit(X, y) -> self

Fit encoder/decoder

.predict(X) -> NDArray

Predict

.score(X, y) -> float

Example

from latpy.latmath.array import array
from latpy.ml.sov import SOVRegression

X = array([[1.0], [2.0], [3.0], [4.0]])
y = array([2.0, 4.0, 6.0, 8.0])

sov = SOVRegression(n_states=2)
sov.fit(X, y)
pred = sov.predict(X)
print(pred.tolist())         # e.g. [2.0, 4.0, 6.0, 8.0] (near-perfect fit)
print(sov.score(X, y))       # ~1.0 (perfect linear relationship captured)

Edge Cases

Scenario

Behavior

n_states > n_features

Works but the random projection expands dimensionality rather than reducing it; decoder still fits via least squares

n_states = 0

Not guarded at init; fit will likely fail with singular Z^T Z

Predict before fit

Raises RuntimeError


SOVClassifier

Random-projection encoder + nearest-centroid classification.

Training: Projects inputs to latent states via random encoder, then computes class centroids in state space. Prediction assigns each point to the nearest centroid.

Signature

Description

SOVClassifier(n_states=4)

SOV classifier

.fit(X, y) -> self

Fit model

.predict(X) -> NDArray(I64)

Predict labels

.score(X, y) -> float

Accuracy

Example

from latpy.latmath.array import array
from latpy.ml.sov import SOVClassifier

X = array([[1.0, 2.0], [2.0, 1.0], [5.0, 5.0], [6.0, 6.0]])
y = array([0, 0, 1, 1])

clf = SOVClassifier(n_states=2)
clf.fit(X, y)
pred = clf.predict(X)
print(pred.tolist())         # e.g. [0, 0, 1, 1]
print(clf.score(X, y))       # 1.0 (linearly separable in projected space)

Edge Cases

Scenario

Behavior

Predict before fit

Raises RuntimeError

Large n_states

Can overfit; state space becomes as large as feature space, nearest-centroid may memorize

Single class

Works; all predictions collapse to that class


SOVDynamics

Discrete-time linear state-space model:

[ z_{t+1} = A , z_t \quad (\text{state transition}) ] [ x_t = C , z_t \quad (\text{observation}) ]

The system is designed to be contractive: A is scaled so its spectral radius < 0.9, ensuring stability. The equilibrium state is the dominant eigenvector of A (the direction that survives repeated application).

Signature

Description

SOVDynamics(n_states=2, n_obs=3)

SOV dynamics model

.fit_random(seed=42) -> self

Initialize stable random A, C

.equilibrium(max_iter=1000, tol=1e-10) -> NDArray

Dominant eigenvector (equilibrium state)

.step(z) -> NDArray

Advance state by one step: z_{t+1} = A @ z

.observe(z) -> NDArray

Project state to observation: x = C @ z

.simulate(n_steps, z0=None) -> (states, obs)

Simulate full trajectory

Example

from latpy.latmath.array import array
from latpy.ml.sov import SOVDynamics

dyn = SOVDynamics(n_states=2, n_obs=3)
dyn.fit_random(seed=42)

# Step
z = array([1.0, 0.0])
z1 = dyn.step(z)
print(z1.tolist())  # e.g. [0.246, -0.121]

# Observe
x = dyn.observe(z)
print(x.tolist())   # e.g. [0.185, -0.291, -0.084]

# Simulate 10 steps
states, obs = dyn.simulate(n_steps=10)
print(states.shape)  # (11, 2)
print(obs.shape)     # (11, 3)

# Find equilibrium
z_star = dyn.equilibrium()
print(z_star.tolist())  # unit-norm vector, dominant eigenvector of A

# Verify equilibrium: A @ z_star ≈ λ z_star (direction preserved)
z_next = dyn.step(z_star)
cos_sim = sum(z_next[i] * z_star[i] for i in range(2)) / \
          (sum(z_next[i]**2 for i in range(2))**0.5 * sum(z_star[i]**2 for i in range(2))**0.5)
print(cos_sim)  # > 0.9 (direction nearly preserved)

Edge Cases

Scenario

Behavior

Rationale

n_states > n_features

Works; A is an n_states×n_states matrix independent of input dimension

The dynamics operate purely in latent state space; input dimension is irrelevant.

Different random seeds

Different A, C matrices → different dynamics, different equilibrium

The seed parameterizes different random systems.

Unstable system (spectral radius > 1)

fit_random scales A to have spectral radius < 0.9, so this is prevented by design

Without scaling, power iteration in equilibrium() would diverge.

Zero initial state

simulate with z0 = zeros produces all-zero trajectory

Linear system: A·0 = 0. Equilibrium is also zero (trivial eigenvector).

fit_random before equilibrium/step/simulate

Required; raises RuntimeError if A_ is None

The matrices must be initialized before use.

Equilibrium on unstable system

Power iteration still converges to the dominant eigenvector, but its norm grows without bound

Since fit_random guarantees stability, this is not an issue in practice.