# Getting Started with latpy latpy is a **pure-Python** array and data library built from the ground up for clarity, portability, and zero compiled dependencies. It provides a NumPy-like experience — arrays, broadcasting, linear algebra, stats, ML, labeled tables, and SVG visualisation — entirely in Python, with no C extensions or Fortran libraries. This guide walks you through every feature with practical examples, explains *why* things work the way they do, and points out edge cases you'll encounter in real use. --- ## Installation latpy is installed from its Git repository. The `-e` (editable) flag means changes to the source take effect immediately — useful if you're developing or debugging. ```bash git clone https://gitlab.com/tyarc-lab/latpy.git cd latpy pip install -e . ``` Alternatively, you can add the source directory to your `PYTHONPATH`. This works when you want to point an existing environment at a specific checkout without re-running pip. ```bash set PYTHONPATH=C:\path\to\latpy\src;%PYTHONPATH% ``` **Edge case — fresh environment:** If you see `ModuleNotFoundError: No module named 'latpy'`, double-check that (a) the `src/` directory is on `PYTHONPATH`, or (b) `pip install -e .` completed without error. latpy has **no required runtime dependencies**, so a missing module is almost always a path issue. --- ## Your First Array `array()` and `zeros()` are your primary constructors. They live in the `latmath.array` namespace and return `NDArray` objects — the core data structure. ```python from latpy.latmath.array import array, zeros # From a Python list a = array([1, 2, 3, 4, 5]) print(a) # NDArray([1, 2, 3, 4, 5]) print(a.shape) # (5,) print(a.dtype) # DType(name='i64', code='q', size=8) ``` **Why I64?** When you pass integers, latpy chooses `I64` (signed 64-bit) — the widest platform-safe integer type. This avoids overflow on most common operations and mimics NumPy's default int64 on 64-bit platforms. ```python # 2D array b = array([[1, 2], [3, 4], [5, 6]]) print(b.shape) # (3, 2) ``` Shapes are always tuples: `(N,)` for 1-D, `(M, N)` for 2-D, etc. A scalar value extracted via `a[0]` is a plain Python `int` or `float`, not a 0-D array. ```python # Zero-filled c = zeros((2, 3)) print(c.tolist()) # [[0, 0, 0], [0, 0, 0]] ``` `zeros()` returns an `I64` array by default. Use `dtype="f64"` to get floats. ```python # Float array d = array([1.5, 2.5], dtype="f64") ``` **Edge cases:** - **Empty list:** `array([])` creates an array of shape `(0,)`. Most reductions (`.sum()`, `.mean()`) return `0` or `0.0` on empty arrays; `min()` / `max()` raise `ValueError`. - **Mixed int/float:** `array([1, 2.0])` promotes to `F64` (float has higher priority). See the type-promotion rules in "Troubleshooting". - **Ragged nesting:** `array([[1, 2], [3]])` raises `ValueError` — all sub-lists must have the same length. --- ## Data Types Three built-in dtypes cover the vast majority of use cases. There are no unsigned or half-precision types. ```python from latpy.latmath.array.dtypes import I64, F64, B1, parse_dtype # Three built-in dtypes: # I64 — signed 64-bit integer # F64 — double-precision float # B1 — boolean (0/1) ``` **Why only three?** latpy is designed for teaching, prototyping, and data analysis — not system programming. Restricting to `I64`, `F64`, and `B1` eliminates the "which int size?" confusion that beginners face in NumPy, while covering every operation in this guide. ```python # Parse from string name: dt = parse_dtype("f64") # F64 dt = parse_dtype("b1") # B1 dt = parse_dtype(None) # I64 (default) ``` `parse_dtype(None)` returns `I64`, which is the fallback when no dtype is specified. ```python # DType properties: print(F64.name) # "f64" print(F64.kind) # "f" — "i" for int, "f" for float, "b" for bool print(F64.size) # 8 (bytes) ``` **Type promotion rules:** When two dtypes meet in an operation: - `F64` wins over everything (float + int → float, float + bool → float). - `I64` wins over `B1` (int + bool → int, treating `True` as 1 and `False` as 0). - `B1` with `B1` stays `B1`. This mirrors NumPy's "safe" promotion rules but with a much smaller type set. --- ## Indexing Indexing supports scalars, slices, `None` (newaxis), boolean masks, and integer arrays (fancy indexing). Understanding the **copy vs. view** distinction is critical. ```python a = array([10, 20, 30, 40, 50]) # Scalar print(a[0]) # 10 ``` Scalar indexing returns a plain Python `int` or `float`. ```python # Slice (returns view) print(a[1:4]) # NDArray([20, 30, 40]) ``` **Why views?** Slices return a *view* (not a copy) — they share memory with the original. This makes slicing cheap (O(1)) and avoids data duplication. Changes to the slice affect the original array, and vice versa. If you need an independent copy, call `.copy()` explicitly. ```python # Newaxis (inserts dimension of size 1) print(a[None].shape) # (1, 5) print(a[:, None].shape) # (5, 1) ``` `None` (or `np.newaxis`) inserts a dimension of size 1 at that position. This is primarily used for broadcasting — for example, `a[:, None] - a[None, :]` builds a pairwise-difference matrix. ```python # Boolean mask mask = a > 25 print(mask) # NDArray([0, 0, 1, 1, 1]) print(a[mask]) # NDArray([30, 40, 50]) ``` **Boolean indexing always copies.** Because the selected elements may not occupy a contiguous memory region, latpy returns a fresh array. ```python # Fancy indexing idx = array([0, 2, 4]) print(a[idx]) # NDArray([10, 30, 50]) ``` **Fancy indexing (using integer arrays) also always copies.** This matches NumPy's contract: when you index with an array of positions, the result is guaranteed to be contiguous and independent of the original. ```python # 2D fancy indexing (row selection + paired indices) A = array([[10, 20, 30], [40, 50, 60], [70, 80, 90]]) print(A[array([0, 2])]) # NDArray([[10, 20, 30], [70, 80, 90]]) print(A[array([0, 2]), array([1, 2])]) # NDArray([20, 90]) print(A[array([0, 1]), 1]) # NDArray([20, 50]) ``` When you pass *two* index arrays (`A[[i0, i1], [j0, j1]]`), they are paired element-wise: you get `A[i0, j0], A[i1, j1]`. Mixing a 1-D array with a scalar broadcasts the scalar. **Edge cases:** - **Out-of-bounds scalar:** `a[100]` raises `IndexError`. latpy validates bounds eagerly. - **Out-of-bounds slice:** `a[3:100]` does **not** raise — it silently returns whatever elements overlap (like Python's own list slicing). - **Boolean mask size mismatch:** `a[np.array([True, False])]` on a length-5 array raises `IndexError` — the mask must match the axis size. - **Empty slice:** `a[2:2]` returns an empty array of shape `(0,)`. --- ## Operations Arithmetic, comparison, and reduction operations follow **NumPy broadcasting semantics** and **type promotion** rules. ```python a = array([1, 2, 3]) b = array([4, 5, 6]) # Arithmetic print(a + b) # NDArray([5, 7, 9]) print(a - b) # NDArray([-3, -3, -3]) print(a * b) # NDArray([4, 10, 18]) print(a / b) # NDArray([0.25, 0.4, 0.5]) print(a ** 2) # NDArray([1, 4, 9]) ``` **Why NumPy broadcasting?** Broadcasting allows arrays of different shapes to be combined without explicit looping or replication. latpy follows the same rules: 1. Right-align shapes: `(3,)` vs `()` becomes `(3,)` vs `(1,)`. 2. Dimensions of size 1 stretch to match the other. 3. Mismatched sizes raise `ValueError`. So `a ** 2` works because `2` (shape `()`) broadcasts to `(3,)`. Equivalently, `a + array([[1], [2]])` would broadcast to `(3,) + (2, 1) → (3, 2)`. **Why division returns float:** `a / b` with integer entries produces `F64` results. This avoids integer-truncation surprises. All other operations preserve dtype unless promotion is needed (e.g., `F64 + I64 → F64`). ```python # Comparisons (returns B1 mask) print(a > 1) # NDArray([0, 1, 1]) (B1 dtype) print(a == 2) # NDArray([0, 1, 0]) ``` Comparisons **always** return a `B1` (0/1) array, even when comparing floats or mixed types. ```python # Reductions print(a.sum()) # 6 print(a.mean()) # 2.0 ``` **Why reductions support an `axis` parameter:** When you call `a.sum(axis=0)` on a 2-D array, you collapse that dimension — useful for row-wise or column-wise aggregation. Without axis, reductions flatten the array. The `axis` parameter exists so you can control *which* dimension to eliminate. **Edge cases:** - **Empty array:** `array([]).sum()` returns `0` (the identity). `array([]).mean()` returns `0.0`. `array([]).min()` raises `ValueError` — there is no minimum of nothing. - **Integer overflow:** latpy does **not** check for overflow on `I64`. `array([2**62]).sum()` will silently wrap on CPython. Use `F64` for large accumulations. - **Division by zero:** `array([1, 0]) / array([0, 0])` produces `F64` values of `inf` and `nan` — not an exception. --- ## Linear Algebra Linear algebra routines live in `latmath.array.linalg`. They operate on 1-D and 2-D `NDArray` objects and are pure-Python implementations (not LAPACK wrappers). ```python from latpy.latmath.array.linalg import sub, ssd, argmin, qr, eig, solve a = array([1, 2, 3, 4]) b = array([4, 3, 2, 1]) print(sub(a, b)) # NDArray([-3, -1, 1, 3]) print(ssd(a)) # 5.0 (sum of squared diffs from mean) print(argmin(a)) # NDArray([0]) (index of minimum) ``` `sub` computes element-wise subtraction (equivalent to `a - b` but explicit). `ssd` computes the sum of squared deviations from the mean — a building block for variance. `argmin` returns the *index* of the minimum as an array (so you can use it for fancy indexing). ```python # QR decomposition A = array([[3.0, 2.0], [1.0, 4.0]], dtype="f64") Q, R = qr(A) ``` QR decomposition factors `A = Q @ R` where `Q` is orthogonal and `R` is upper-triangular. It is used internally for least-squares and eigenvalue computation. ```python # Dominant eigenvalue lam, v = eig(A) print(lam) # ~4.0 ``` `eig` computes the **dominant** eigenvalue (largest magnitude) and its corresponding eigenvector via power iteration — it does **not** return all eigenvalues. For the full spectrum, use `np.linalg.eigvals` from the NumPy compat layer (which calls `numpy` if available). ```python # Linear solve x = solve(A, array([5.0, 6.0], dtype="f64")) print(x.tolist()) # [0.8, 1.3] ``` `solve` solves `Ax = b` using Gaussian elimination. It requires `A` to be square and invertible. **Edge cases:** - **Non-square matrices for `solve`:** Raises `ValueError` — only square systems are supported. - **Singular matrix for `solve`:** Raises `LinAlgError` — no solution exists (or infinite solutions). - **QR on rectangular matrices:** Supported, but `Q` and `R` may not be the "thin" form you expect from LAPACK. - **Power iteration convergence:** `eig` on a matrix with two eigenvalues of equal magnitude may not converge to a single result. --- ## NumPy Compatibility Layer The `np` singleton wraps a subset of the NumPy API so that you can write latpy code that *looks* like NumPy — useful for migration or for users familiar with the NumPy ecosystem. ```python from latpy.latmath.array.numpy_compat import np a = np.array([1, 2, 3, 4, 5]) b = np.zeros((2, 3), dtype=np.float64) c = np.eye(3) d = np.arange(0, 10, 2) # NDArray([0, 2, 4, 6, 8]) e = np.linspace(0, 1, 5) # NDArray([0., 0.25, 0.5, 0.75, 1.]) f = np.concatenate([a, np.array([6, 7])]) g = np.dot(a, a) # 55 (dot product) ``` **Why a compatibility layer?** If you already know NumPy, `np` removes the need to learn a new API. It also makes it easy to swap real NumPy in later if performance demands it — just change the import. Note that `np` here is **not** the real NumPy; it's a latpy object that returns `NDArray`. ```python # Random np.random.seed(42) r = np.random.randn(3, 3) ``` Seeding and random generation are forwarded to latpy's own `random` module, not `numpy.random`. ```python # Linear algebra M = np.array([[1, 2], [3, 4]]) print(np.linalg.det(M)) # -2.0 print(np.linalg.inv(M)) print(np.linalg.qr(M)) ``` `np.linalg` provides `det`, `inv`, `qr`, `eigvals`, `solve`, and more. Each delegates to the corresponding latpy `linalg` module. **Edge cases:** - **`np.float64` vs `np.float32`:** Only `np.float64` is defined; `np.float32` raises `AttributeError`. Use `F64` or `"f64"` for precision control. - **`np.random.rand` vs `np.random.randn`:** Both exist but return latpy arrays, not NumPy arrays. - **Missing functions:** `np.fft`, `np.linalg.svd`, `np.unique` are not provided. Check `dir(np)` for available names. --- ## Statistics The `stats` module provides descriptive statistics, histogramming, moment calculations, and probability density functions — no SciPy required. ```python from latpy.latmath.stats import describe, histogram, skew, kurtosis, norm_pdf a = array([1, 2, 3, 4, 5]) # Five-number summary desc = describe(a) print(desc["mean"]) # 3.0 print(desc["std"]) # 1.414... ``` `describe` returns a dict with `min`, `max`, `mean`, `median`, `std`, `q1`, `q3`. It computes sample standard deviation (denominator `n-1`). ```python # Histogram counts, edges = histogram(a, bins=3, range_=(1, 5)) print(counts.tolist()) # [1, 1, 3] ``` `histogram` computes bin counts and bin edges. The `range_` parameter constrains the domain (data outside is ignored). Note the trailing underscore to avoid shadowing Python's `range`. ```python # Moments print(skew(a)) # 0.0 print(kurtosis(a)) # -1.3 ``` Skewness measures asymmetry (0 = symmetric). Kurtosis here is *excess* kurtosis (Fisher definition, normal = 0). A uniform distribution's excess kurtosis is -1.2, so `-1.3` for `[1,2,3,4,5]` is expected. ```python # Normal PDF print(norm_pdf(0.0)) # 0.3989... ``` `norm_pdf(x)` returns the standard Normal PDF at `x`: `exp(-x²/2) / sqrt(2π)`. **Edge cases:** - **Single-element array:** `describe([5])` works, but `std` will be 0 (single value, zero variance). - **Zero-variance data:** `skew([5, 5, 5])` returns `nan` — skewness is undefined when there is no spread. - **Histogram with zero bins or invalid range:** Raises `ValueError`. - **Out-of-range `norm_pdf`:** Handled gracefully — `norm_pdf(1e10)` returns `0.0` (underflow to zero is mathematically correct). --- ## Random Numbers The `random` module provides a self-contained pseudo-random number generator (Mersenne Twister, independently implemented). It does **not** depend on Python's `random` or NumPy's. ```python from latpy.latmath.random import seed, randn, randint, uniform, choice, shuffle seed(42) # deterministic reproducibility ``` **Why `seed(42)`?** Setting a seed makes random output deterministic and reproducible. This is essential for tests, tutorials, and debugging. Any integer seed works; 42 is simply a convention. ```python # Continuous print(randn(3).tolist()) # 3 standard normal samples print(uniform(0, 10, size=4)) # 4 samples in [0, 10) print(rand(2, 2).tolist()) # 2x2 uniform[0,1) ``` - `randn(n)` samples from N(0, 1) using the Box-Muller transform. - `uniform(low, high, size=n)` samples uniformly in `[low, high)`. - `rand(m, n)` is shorthand for `uniform(0, 1, size=(m, n))`. ```python # Discrete print(randint(0, 10, size=5)) # 5 integers 0..9 ``` `randint(low, high, size)` samples uniformly from `{low, low+1, ..., high-1}` — the upper bound is **exclusive**, matching NumPy's convention. ```python # Sampling deck = array([1, 2, 3, 4, 5]) print(choice(deck, size=3)) # 3 draws with replacement shuffle(deck) # in-place shuffle ``` `choice` draws with replacement by default (elements can repeat). `shuffle` permutes the array **in place** and returns `None`. **Edge cases:** - **Seed repeatability:** Two `seed(42)` calls in the same session reset the generator to the same state — you'll get the same sequence again. - **Empty array in `choice`:** Raises `ValueError`. - **`size=0` in `randint` or `uniform`:** Returns an empty array of shape `(0,)`. - **`shuffle` on empty or 1-element array:** No-op (no error). --- ## Working with Labeled Data latpy's `latdata` module adds named axes (like pandas Index) on top of NDArray, giving you labeled tables. ```python from latpy.latdata import Axis, Table # Named axis rows = Axis("row", ["a", "b", "c"]) cols = Axis("col", ["x", "y"]) # Table from nested lists t = Table.from_list( [[1, 2], [3, 4], [5, 6]], row_labels=["a", "b", "c"], col_labels=["x", "y"], ) ``` `Table.from_list` infers the shape from the data and creates named axes. The underlying data is an `NDArray` stored at `t.data`. ```python # Label-based indexing print(t["a", "x"]) # 1 print(t["a":"c", :]) # Table (3 rows, 2 cols) ``` **How label indexing works:** The first index labels rows, the second labels columns. Slices use label strings (not positions) — `"a":"c"` selects rows from `"a"` up to **and including** `"c"`, unlike Python slices which exclude the end. This matches pandas' label-slice behavior. **Edge cases:** - **Key not found:** `t["z", :]` raises `KeyError`. - **Duplicate labels:** Not prohibited. Label-based slicing on duplicate labels may skip intervening rows. - **Slice with non-existent label:** `t["a":"z", :]` raises `KeyError` — the end label must exist. - **Empty table:** `Table.from_list([[]])` is allowed; shape will reflect the empty dimension. --- ### GroupBy GroupBy partitions a table's rows (or columns) by matching label values — like SQL's `GROUP BY` or pandas' `groupby`. ```python from latpy.latdata import GroupBy t = Table.from_list( [[1, 2], [3, 4], [5, 6]], row_labels=["a", "b", "a"], col_labels=["x", "y"], ) # Group rows by label gb = GroupBy(t, "row") print(gb.sum().data.tolist()) # [[6, 8], [3, 4]] print(gb.mean().data.tolist()) # [[3.0, 4.0], [3.0, 4.0]] print(gb.count().data.tolist()) # [[2], [1]] ``` Rows `"a"` (indices 0 and 2) are aggregated together; row `"b"` (index 1) forms its own group. `sum`, `mean`, and `count` each reduce the grouped axis. The `count` result has shape `(2, 1)` because count returns a single value per group (not per column). **Edge cases:** - **No matching groups:** A label value with no rows is simply absent from the result. - **Unordered labels:** Groups are returned in order of first appearance, not sorted. - **Single group:** If all labels are identical, `gb.sum()` returns a single-row table. --- ## I/O latpy reads and writes arrays and tables in CSV and JSON formats. CSV is portable (works with Excel, pandas); JSON preserves dtype, shape, and axis metadata. ```python from latpy.io import save_csv, load_csv, save_json, load_json a = array([[1, 2, 3], [4, 5, 6]]) # CSV with automatic header save_csv("data.csv", a) b = load_csv("data.csv") ``` CSV output includes a header row by default. Data is comma-separated with each row on its own line. On load, latpy infers dtype from the CSV content. If you use integer CSV data, it loads as `I64`; if a column contains decimal points, it becomes `F64`. ```python # JSON with full metadata (dtype, shape, axes) from latpy.latdata import Axis, Table save_json("data.json", a) c = load_json("data.json") ``` JSON output stores the array as a flat list alongside `dtype`, `shape`, and optionally axis labels. This means `save_json` / `load_json` round-trip perfectly — even for labeled `Table` objects. **Edge cases:** - **Empty array to CSV:** Writes a file containing only the header row. On load, you get an array of shape `(0,)`. - **Missing file:** `load_csv` / `load_json` raise `FileNotFoundError`. - **Corrupt JSON:** Raises `json.JSONDecodeError`. - **Non-ASCII data:** CSV is written with UTF-8 encoding; JSON uses ASCII-safe escaped Unicode by default. --- ## Machine Learning The `ml` module provides simple, pure-Python implementations of common ML algorithms — k-means, linear regression, and classification metrics. These are **not** production-grade (no GPU, no regularization paths), but are suitable for learning, prototyping, and small datasets. ```python from latpy.ml import kmeans, LinearRegression, accuracy, f1_score, confusion_matrix # K-Means clustering X = array([[1.0, 2.0], [1.5, 1.8], [5.0, 8.0], [8.0, 8.0], [1.0, 0.6], [9.0, 11.0]]) centroids, labels, inertia = kmeans(X, k=2) print(labels.tolist()) # cluster assignments ``` `kmeans` uses random initialisation (not k-means++), so results vary between runs unless you `seed()` first. `centroids` is the final cluster centers (shape `(k, n_features)`), `labels` is the assignment per point, and `inertia` is the sum of squared distances to the nearest centroid. **Seed dependence:** `kmeans` is particularly sensitive to the random seed. Calling `seed(42)` before `kmeans` guarantees reproducible cluster assignments. ```python # Linear regression X = array([[1.0], [2.0], [3.0], [4.0]]) y = array([2.0, 4.0, 6.0, 8.0]) lr = LinearRegression() lr.fit(X, y) print(lr.predict(array([[5.0]]))) # ~10.0 print(lr.score(X, y)) # R² = 1.0 ``` `LinearRegression` fits an ordinary least-squares model. The `score` method returns R² (coefficient of determination). Perfect fit gives 1.0. ```python # Classification metrics y_true = array([1, 0, 1, 1, 0]) y_pred = array([1, 0, 0, 1, 0]) print(accuracy(y_true, y_pred)) # 0.8 print(f1_score(y_true, y_pred)) # 0.8 print(confusion_matrix(y_true, y_pred).tolist()) ``` Metrics compare predicted labels against ground truth. `confusion_matrix` returns a 2-D array where row `i`, column `j` counts the number of times true class `i` was predicted as class `j`. **Edge cases:** - **k=1 for kmeans:** Returns a single centroid at the data mean; `inertia` is total variance. - **k > n_samples:** Raises `ValueError` — cannot have more clusters than data points. - **LinearRegression with singular X:** Raises `LinAlgError` if the normal equations matrix is not invertible. - **All-zero predictions in `f1_score`:** Raises `ZeroDivisionError` — F1 is undefined when both precision and recall are zero. - **Mismatched `y_true` / `y_pred` lengths:** Raises `ValueError`. --- ## SOV Models (State-Observation-Vector) SOV is a lightweight state-space / linear dynamical system framework built on latpy arrays. It models hidden states that evolve linearly and emit observations. ```python from latpy.ml.sov import SOVRegression, SOVClassifier, SOVDynamics # SOV Regression X = array([[1.0], [2.0], [3.0]]) y = array([2.0, 4.0, 6.0]) sov = SOVRegression(n_states=2) sov.fit(X, y) print(sov.score(X, y)) ``` SOVRegression maps observations `X` to outputs `y` through a latent state of dimension `n_states`. The internal state captures temporal or latent structure that a direct regression might miss. ```python # SOV dynamics simulation dyn = SOVDynamics(n_states=2, n_obs=3) dyn.fit_random(seed=42) states, obs = dyn.simulate(n_steps=10) print(states.shape) # (11, 2) print(dyn.equilibrium().tolist()) # stable equilibrium state ``` `simulate` runs the dynamics forward for `n_steps` timesteps, returning both hidden states (shape `(n_steps+1, n_states)`) and observations (shape `(n_steps, n_obs)`). The extra `+1` on states is the initial state. `equilibrium()` computes the fixed point `S = A @ S` (if stable). **Edge cases:** - `n_states` larger than data rank: The fit may underdetermine the state. - **Unstable dynamics:** `equilibrium()` may diverge if the transition matrix has eigenvalues > 1 in magnitude. --- ## Visualization latpy's `viz` module renders plots as **SVG** — a vector format viewable in any browser. No GUI, no matplotlib, no JavaScript. ```python from latpy.viz import plot, scatter, bar, hist, Figure # Line plot (auto-scales, returns SVG) fig, line_el = plot([1, 2, 3, 4, 5], [2, 4, 1, 8, 6]) fig.save("line_plot.svg") ``` `plot(x, y)` automatically determines axis ranges to fit all data points. It returns a `Figure` object and the SVG line element. The figure is not displayed automatically — you must call `fig.save(filename)` to write the SVG file. ```python # Scatter plot fig, dots = scatter([1, 2, 3, 4], [2, 5, 3, 7], r=5) fig.save("scatter.svg") ``` The `r=5` argument controls the radius of each plotted circle. ```python # Bar chart fig, bars = bar(["A", "B", "C", "D"], [3, 7, 2, 5]) fig.save("bar.svg") ``` Labels are strings; numeric axes are auto-scaled. ```python # Histogram fig, bins_ = hist([1, 1, 2, 3, 3, 3, 4, 5], bins=4) fig.save("hist.svg") ``` `hist(data, bins=n)` bins the data, computes frequencies, and draws rectangles. ```python # Graph visualization from latpy.viz import draw_graph svg = draw_graph(["A", "B", "C", "D"], [("A", "B"), ("B", "C"), ("C", "D"), ("A", "D")], width=400, height=300) with open("graph.svg", "w") as f: f.write(svg) ``` `draw_graph` returns a raw SVG string (not a `Figure`) that you write to a file yourself. It places nodes in a simple layout and draws edges with lines. **Why SVG?** SVG is pure text (XML), not a binary format. You can view it in any browser, embed it in web pages, and include it in Jupyter Notebooks (the browser renders it inline). The downside is that SVG files can be larger than PNG for the same data. **Edge cases:** - **Empty data for `plot`:** Raises `ValueError` — at least two points are needed. - **Single bar for `bar`:** Works fine; a lone bar is drawn. - **All identical values for `plot`:** The y-range defaults to `[value-1, value+1]` to avoid a zero-height plot. - **SVG showing as text in browser:** This happens if you open the `.svg` file in a text editor or serve it with the wrong MIME type. Save with a `.svg` extension and open in a browser, or configure your server to serve `.svg` files as `image/svg+xml`. --- ## Performance Notes latpy is a **pure-Python** library with no C, Cython, or Fortran extensions. This means: - **Slower than NumPy:** For most operations (addition, multiplication, reductions), latpy is 10–50× slower than NumPy, because the tight loops run in CPython rather than compiled C. This is acceptable for teaching, prototyping, and datasets under ~10⁵ elements. - **Comparable to native Python lists:** For small arrays (under ~1,000 elements), latpy overhead is small — on par with manual list comprehensions. - **No parallelism:** latpy does not use threads, multiprocessing, or SIMD. All operations are single-threaded Python. **Big-O complexity of common operations:** | Operation | Complexity | Notes | |-----------|-----------|-------| | `sum()`, `mean()`, `min()`, `max()` | O(n) | Single pass over data | | `a + b`, `a * b` (element-wise) | O(n) | Scalar operations are O(n) | | `dot(a, b)` | O(n) for 1-D, O(mnk) for matmul | | | `solve(A, b)` | O(n³) | Gaussian elimination | | `eig(A)` | O(k·n²) | Power iteration, k iterations | | `qr(A)` | O(m·n²) | Gram-Schmidt | | `sort()` | O(n log n) | Uses Python's TimSort | | `kmeans()` | O(k·n·d·iter) per run | k clusters, n points, d dimensions | | `histogram(data, bins=b)` | O(n + b) | Count in bins | **Migration path:** If you outgrow latpy's performance, switch to NumPy by: 1. Using `np = latpy.latmath.array.numpy_compat.np` — the API is similar. 2. Converting latpy arrays to NumPy with `np.array(ndarray_obj)` (requires real NumPy installed). 3. Replacing `from latpy.ml import ...` with `sklearn` equivalents. --- ## Troubleshooting ### "Why is my array `I64` when I passed floats?" latpy picks the dtype that fits **all** input values. If you write: ```python a = array([1, 2, 3]) # all ints → I64 b = array([1, 2, 3.0]) # has a float → F64 ``` If you explicitly want floats, pass `dtype="f64"` or include a decimal value. See the "Type promotion rules" in the Data Types section. ### "Why did my indexing return a copy instead of a view?" Only **slices** (and `None`) return views. Everything else returns a copy: - `a[1:4]` → **view** (contiguous, shared memory) - `a[[0, 2, 4]]` → **copy** (fancy indexing, non-contiguous) - `a[a > 25]` → **copy** (boolean mask, non-contiguous) You can check identity: `arr[1:4] is arr` is `False` even for views (Python `is` checks identity), but modifications to the slice will affect the original. If you need a guaranteed copy, call `.copy()`. ### "Why did `kmeans` give different results each time?" `kmeans` initialises centroids randomly. Without a fixed seed, each run picks different starting points, which can converge to different local minima. For reproducible results: ```python from latpy.latmath.random import seed seed(42) # or any integer centroids, labels, inertia = kmeans(X, k=2) ``` For a deterministic run, also consider that the data order affects tie-breaking in label assignment. ### "Why is my SVG showing as text?" You're likely viewing the `.svg` file in a text editor or terminal. SVG is plain XML, so it looks like `...`. To see the rendered graphic: - **Save to a `.svg` file**, then open that file in a web browser (Chrome, Firefox, Edge). - **Serve over HTTP** with the correct MIME type: your web server must serve `.svg` files as `image/svg+xml`. If you see XML in the browser, the MIME type is wrong. - **In Jupyter Notebook**, calling `fig._repr_svg_()` (if available) or using `IPython.display.SVG(fig.svg)` will render inline. ### "ImportError: No module named 'latpy'" Python cannot find the latpy package. Solutions: 1. **Did you install?** Run `pip install -e .` from the `latpy/` directory (the one containing `pyproject.toml` or `setup.py`). 2. **Check PYTHONPATH:** The `src/` directory inside the repository must be discoverable. Either install with pip, or set: - **Windows:** `set PYTHONPATH=C:\path\to\latpy\src;%PYTHONPATH%` - **Linux/macOS:** `export PYTHONPATH=/path/to/latpy/src:$PYTHONPATH` 3. **Virtual environment:** If you're using a virtual environment, activate it before running pip install. A library installed globally won't be visible inside a virtual environment (and vice versa). 4. **Spelling:** The package name is `latpy` (all lowercase, no hyphen). `import latpy` works; `import latPy` does not. --- ## Next Steps - Browse the [API documentation](api/overview.md) for detailed reference - Run tests: `python -m pytest tests/` - Read the [CHANGELOG](../CHANGELOG.md) for version history