latpy.ml — Machine Learning
Pure stdlib ML algorithms backed by NDArray. Every model operates on latpy.latmath.array.NDArray and requires zero third-party dependencies — no numpy, scipy, or sklearn.
Clustering
from latpy.ml import kmeans
Signature |
Description |
|---|---|
|
K-Means clustering. Returns (k×d centroids, n×1 labels, inertia). |
Algorithm: Lloyd’s K-Means
K-Means minimizes the within-cluster sum of squares:
[ \arg\min_{{c_j},{z_i}} \sum_{i=1}^n |x_i - c_{z_i}|_2^2 ]
by alternating between assignment (each point to nearest centroid) and update (recompute centroids as cluster means). It is a greedy, iterative algorithm that converges to a local optimum (the objective is non-convex). Initial centroids are chosen as k distinct random samples from X (seeded for reproducibility).
Why K-Means? Simple, interpretable, O(nkd) per iteration. Works well on spherical, well-separated clusters. Sensitive to initialization and outliers.
Complexity: O(nkd) per iteration, O(kd) memory for centroids.
Example
from latpy.latmath.array import array
from latpy.ml import kmeans
X = array([[1.0, 2.0], [1.5, 1.8], [5.0, 8.0],
[8.0, 8.0], [1.0, 0.6], [9.0, 11.0]])
centroids, labels, inertia = kmeans(X, k=2, seed=42)
print(centroids.tolist())
# [[1.167, 1.467], [7.333, 9.0]]
print(labels.tolist())
# [0, 0, 1, 1, 0, 1]
print(inertia)
# 9.207 (sum of squared distances to nearest centroid)
Edge Cases
Scenario |
Behavior |
Rationale |
|---|---|---|
k=1 |
Returns single centroid at data mean, all labels 0 |
A single cluster is just the global mean; every point is assigned to it. |
k >= n_samples |
Raises |
Cannot choose |
k=0 or negative |
Raises |
Guarded at entry. |
Convergence tolerance (tol) |
Halts when max centroid shift < tol (default 1e-4) |
Avoids wasted iterations once movement is negligible. Also halts early if no labels change between iterations. |
Different seeds |
Different random initial centroids → different local optima |
Lloyd’s is sensitive to initialization; seed gives reproducible restarts. |
Axis-aligned data |
Works, but may split elongated clusters artificially |
K-Means assumes isotropic (spherical) clusters; axis-aligned elongated clusters can be split across the “middle” where Euclidean distance is ambiguous. |
Single-sample clusters |
Centroids collapse to the lone point, inertia contribution zero for that point |
A cluster with count=1 yields zero shift but is still valid. |
Linear Models
from latpy.ml import LinearRegression
Signature |
Description |
|---|---|
|
OLS via normal equations |
|
Fit model |
|
Predict |
|
R² coefficient of determination |
Algorithm: Ordinary Least Squares (Closed-Form)
Solves the normal equations:
[ \hat{w} = (X^T X)^{-1} X^T y ]
via Gaussian elimination with partial pivoting. When fit_intercept=True, a column of ones is prepended to the design matrix; the first coefficient becomes intercept_ and the rest become coef_.
Why OLS closed-form? No learning rate, no iteration, globally optimal for the linear least-squares objective. The normal equations are the standard approach for small-to-moderate d (number of features). For very large d, iterative methods (gradient descent) would be more efficient.
Complexity: O(d²n + d³) — constructing X^T X is O(d²n), solving the d×d system is O(d³).
Example
from latpy.latmath.array import array
from latpy.ml import LinearRegression
X = array([[1.0], [2.0], [3.0], [4.0]])
y = array([2.0, 4.0, 6.0, 8.0])
lr = LinearRegression(fit_intercept=True)
lr.fit(X, y)
print(lr.coef_.tolist()) # [2.0]
print(lr.intercept_) # ~0.0
print(lr.predict(array([[5.0]])).tolist()) # [10.0]
print(lr.score(X, y)) # 1.0 (perfect fit)
# Without intercept — forces line through origin
lr2 = LinearRegression(fit_intercept=False)
lr2.fit(array([[1.0], [2.0], [3.0]]), array([1.0, 2.0, 3.0]))
print(lr2.coef_.tolist()) # [1.0]
print(lr2.predict(array([[4.0]])).tolist()) # [4.0]
Edge Cases
Scenario |
Behavior |
Rationale |
|---|---|---|
Singular X (rank-deficient) |
Gaussian elimination produces a solution, but coefficients may be numerically unstable; no explicit rank check |
X^T X is singular when features are linearly dependent. The pivot search in |
n < d (underdetermined) |
OLS still runs but is ill-posed — infinitely many solutions |
Normal equations produce a particular solution; use regularization or reduce features. |
Single feature |
Works as simple linear regression; slope in |
Trivially correct. |
Constant features (entire column same value) |
Makes X^T X nearly singular; coefficients may be large or NaN |
A constant column is collinear with the intercept column when |
fit_intercept=False |
No column of ones; model forced through origin |
Useful when the relationship is known to be proportional. R² can be negative if the origin-constrained fit is worse than predicting the mean. |
Empty X (n=0) |
Raises |
Need at least one sample. |
Zero-feature X (p=0) |
Raises |
Need at least one feature. |
Metrics
from latpy.ml import accuracy, precision, recall, f1_score, confusion_matrix
from latpy.ml import mean_squared_error, mean_absolute_error, r2_score
Signature |
Description |
|---|---|
|
Classification accuracy (correct / total) |
|
Precision = TP / (TP + FP) |
|
Recall = TP / (TP + FN) |
|
F1 = 2·P·R / (P + R) |
|
Confusion matrix |
|
MSE |
|
MAE |
|
R² coefficient of determination |
Examples
from latpy.latmath.array import array
from latpy.ml import accuracy, precision, recall, f1_score, confusion_matrix
from latpy.ml import mean_squared_error, mean_absolute_error, r2_score
# --- Classification ---
y_true = array([1, 0, 1, 1, 0])
y_pred = array([1, 0, 0, 1, 0])
print(accuracy(y_true, y_pred)) # 0.8
print(precision(y_true, y_pred)) # 1.0 (TP=2, FP=0)
print(recall(y_true, y_pred)) # 0.6667 (TP=2, FN=1)
print(f1_score(y_true, y_pred)) # 0.8
cm = confusion_matrix(y_true, y_pred)
print(cm.tolist())
# [[2, 0],
# [1, 2]]
# Rows: true, Cols: pred. Order: [0, 1]
# --- Multiclass confusion matrix ---
y_mc_t = array([0, 1, 2, 0, 1, 2])
y_mc_p = array([0, 2, 1, 0, 1, 2])
cm3 = confusion_matrix(y_mc_t, y_mc_p)
print(cm3.tolist())
# [[2, 0, 0],
# [0, 1, 1],
# [0, 1, 1]]
# Custom label ordering
cm_custom = confusion_matrix(y_mc_t, y_mc_p, labels=[2, 1, 0])
print(cm_custom.tolist())
# [[1, 1, 0],
# [1, 1, 0],
# [0, 0, 2]]
# --- Regression ---
y_t = array([1.0, 2.0, 3.0])
y_p = array([1.5, 2.5, 3.5])
print(mean_squared_error(y_t, y_p)) # 0.25
print(mean_absolute_error(y_t, y_p)) # 0.5
print(r2_score(y_t, y_p)) # negative (worse than mean predictor)
Edge Cases
Scenario |
Behavior |
Rationale |
|---|---|---|
Empty y_true/y_pred (n=0) |
Accuracy returns 0.0; MSE/MAE return nan via division by zero |
No samples to evaluate. Accuracy explicitly guards with |
All-correct predictions |
accuracy=1.0, precision=1.0, recall=1.0, f1=1.0, MSE=0.0, MAE=0.0, R²=1.0 |
Ideal scenario; all metrics at their best possible values. |
All-wrong predictions |
accuracy=0.0; precision depends on FP; recall=0.0; F1=0.0 if P+R=0; R² may be negative |
No correct predictions. R² can be < 0 because predicting the mean would be better. |
No positive predictions (TP+FP=0) |
Precision returns 0.0 |
Guarded: |
No positive ground truth (TP+FN=0) |
Recall returns 0.0 |
Guarded: |
P + R = 0 |
F1 returns 0.0 |
Guarded before the harmonic mean formula to avoid division by zero. |
R² with constant y_true |
Returns 0.0 (ss_tot=0 guard) |
If the target has zero variance, any model that doesn’t predict exactly the constant will have ss_res > 0, but the guard returns 0.0 instead of -inf. |
Multi-class confusion matrix |
Auto-detects labels from the union of y_true and y_pred |
Ensures all classes (even those with zero predictions) appear in the matrix. |
SOV Models (State-Observation-Vector)
from latpy.ml.sov import SOVRegression, SOVClassifier, SOVDynamics
Architecture & Rationale
SOV models use a two-stage architecture:
Encoder (random projection): A fixed random matrix ( W_{\text{enc}} \in \mathbb{R}^{p \times m} ) (entries drawn i.i.d. from ( \mathcal{N}(0, 1/m) )) projects the
p-dimensional input into anm-dimensional latent state space. The encoder is never trained — it is seeded withrandom.seed(42)and fixed after construction.Decoder (learned): A simple model operates in the reduced state space:
SOVRegression: Least-squares linear decoder from states to output.
SOVClassifier: Nearest-centroid classification in state space.
SOVDynamics: Linear state transition matrix A + observation matrix C.
Why random projections? The Johnson-Lindenstrauss lemma guarantees that random projections approximately preserve pairwise distances with high probability when projecting into dimension ( O(\log n) ). This makes the state space a “sketch” of the input that retains enough structure for linear methods to work. The seed ensures reproducibility — the same projection is always used for a given input dimension.
Why SOV exists: For very high-dimensional inputs where memory or compute is constrained, random projections provide a fast, lightweight dimensionality reduction that doesn’t require the full SVD or gradient-based training. The encoder is O(pm) once, the decoder is trained in the smaller state space.
Complexity: O(nds) where n = samples, d = features, s = n_states (the latent dimension).
SOVRegression
Random-projection encoder + least-squares decoder.
Training: Normalizes features, projects to latent states via random matrix, then solves ( Z^T Z w = Z^T y ) via Gaussian elimination on the state-space representation.
Signature |
Description |
|---|---|
|
SOV regression model |
|
Fit encoder/decoder |
|
Predict |
|
R² |
Example
from latpy.latmath.array import array
from latpy.ml.sov import SOVRegression
X = array([[1.0], [2.0], [3.0], [4.0]])
y = array([2.0, 4.0, 6.0, 8.0])
sov = SOVRegression(n_states=2)
sov.fit(X, y)
pred = sov.predict(X)
print(pred.tolist()) # e.g. [2.0, 4.0, 6.0, 8.0] (near-perfect fit)
print(sov.score(X, y)) # ~1.0 (perfect linear relationship captured)
Edge Cases
Scenario |
Behavior |
|---|---|
n_states > n_features |
Works but the random projection expands dimensionality rather than reducing it; decoder still fits via least squares |
n_states = 0 |
Not guarded at init; fit will likely fail with singular Z^T Z |
Predict before fit |
Raises |
SOVClassifier
Random-projection encoder + nearest-centroid classification.
Training: Projects inputs to latent states via random encoder, then computes class centroids in state space. Prediction assigns each point to the nearest centroid.
Signature |
Description |
|---|---|
|
SOV classifier |
|
Fit model |
|
Predict labels |
|
Accuracy |
Example
from latpy.latmath.array import array
from latpy.ml.sov import SOVClassifier
X = array([[1.0, 2.0], [2.0, 1.0], [5.0, 5.0], [6.0, 6.0]])
y = array([0, 0, 1, 1])
clf = SOVClassifier(n_states=2)
clf.fit(X, y)
pred = clf.predict(X)
print(pred.tolist()) # e.g. [0, 0, 1, 1]
print(clf.score(X, y)) # 1.0 (linearly separable in projected space)
Edge Cases
Scenario |
Behavior |
|---|---|
Predict before fit |
Raises |
Large n_states |
Can overfit; state space becomes as large as feature space, nearest-centroid may memorize |
Single class |
Works; all predictions collapse to that class |
SOVDynamics
Discrete-time linear state-space model:
[ z_{t+1} = A , z_t \quad (\text{state transition}) ] [ x_t = C , z_t \quad (\text{observation}) ]
The system is designed to be contractive: A is scaled so its spectral radius < 0.9, ensuring stability. The equilibrium state is the dominant eigenvector of A (the direction that survives repeated application).
Signature |
Description |
|---|---|
|
SOV dynamics model |
|
Initialize stable random A, C |
|
Dominant eigenvector (equilibrium state) |
|
Advance state by one step: z_{t+1} = A @ z |
|
Project state to observation: x = C @ z |
|
Simulate full trajectory |
Example
from latpy.latmath.array import array
from latpy.ml.sov import SOVDynamics
dyn = SOVDynamics(n_states=2, n_obs=3)
dyn.fit_random(seed=42)
# Step
z = array([1.0, 0.0])
z1 = dyn.step(z)
print(z1.tolist()) # e.g. [0.246, -0.121]
# Observe
x = dyn.observe(z)
print(x.tolist()) # e.g. [0.185, -0.291, -0.084]
# Simulate 10 steps
states, obs = dyn.simulate(n_steps=10)
print(states.shape) # (11, 2)
print(obs.shape) # (11, 3)
# Find equilibrium
z_star = dyn.equilibrium()
print(z_star.tolist()) # unit-norm vector, dominant eigenvector of A
# Verify equilibrium: A @ z_star ≈ λ z_star (direction preserved)
z_next = dyn.step(z_star)
cos_sim = sum(z_next[i] * z_star[i] for i in range(2)) / \
(sum(z_next[i]**2 for i in range(2))**0.5 * sum(z_star[i]**2 for i in range(2))**0.5)
print(cos_sim) # > 0.9 (direction nearly preserved)
Edge Cases
Scenario |
Behavior |
Rationale |
|---|---|---|
n_states > n_features |
Works; A is an n_states×n_states matrix independent of input dimension |
The dynamics operate purely in latent state space; input dimension is irrelevant. |
Different random seeds |
Different A, C matrices → different dynamics, different equilibrium |
The seed parameterizes different random systems. |
Unstable system (spectral radius > 1) |
|
Without scaling, power iteration in |
Zero initial state |
|
Linear system: A·0 = 0. Equilibrium is also zero (trivial eigenvector). |
fit_random before equilibrium/step/simulate |
Required; raises |
The matrices must be initialized before use. |
Equilibrium on unstable system |
Power iteration still converges to the dominant eigenvector, but its norm grows without bound |
Since |