OPEN-SOURCE SCRIPT

Static K-means Clustering | InvestorUnknown

412
Static K-Means Clustering is a machine-learning-driven market regime classifier designed for traders who want a data-driven structure instead of subjective indicators or manually drawn zones.

This script performs offline (static) K-means training on your chosen historical window. Using four engineered features:
  • RSI (Momentum)
  • CCI (Price deviation / Mean reversion)
  • CMF (Money flow / Strength)
  • MACD Histogram (Trend acceleration)


It groups past market conditions into K distinct clusters (regimes). After training, every new bar is assigned to the nearest cluster via Euclidean distance in 4-dimensional standardized feature space.

This allows you to create models like:
  • Regime-based long/short filters
  • Volatility phase detectors
  • Trend vs. chop separation
  • Mean-reversion vs. breakout classification
  • Volume-enhanced money-flow regime shifts
  • Full machine-learning trading systems based solely on regimes


Note:
  • * This script is not a universal ML strategy out of the box.
    * The user must engineer the feature set to match their trading style and target market.
    * K-means is a tool, not a ready made system, this script provides the framework.


Core Idea
K-means clustering takes raw, unlabeled market observations and attempts to discover structure by grouping similar bars together.

Pine Script®
// STEP 1 — DATA POINTS ON A COORDINATE PLANE // We start with raw, unlabeled data scattered in 2D space (x/y). // At this point, nothing is grouped—these are just observations. // K-means will try to discover structure by grouping nearby points. // // y ↑ // | // 12 | • // | • // 10 | • // | • // 8 | • • // | // 6 | • // | // 4 | • // | // 2 |______________________________________________→ x // 2 4 6 8 10 12 14 // // // // STEP 2 — RANDOMLY PLACE INITIAL CENTROIDS // The algorithm begins by placing K centroids at random positions. // These centroids act as the temporary “representatives” of clusters. // Their starting positions heavily influence the first assignment step. // // y ↑ // | // 12 | • // | • // 10 | • C2 × // | • // 8 | • • // | // 6 | C1 × • // | // 4 | • // | // 2 |______________________________________________→ x // 2 4 6 8 10 12 14 // // // // STEP 3 — ASSIGN POINTS TO NEAREST CENTROID // Each point is compared to all centroids. // Using simple Euclidean distance, each point joins the cluster // of the centroid it is closest to. // This creates a temporary grouping of the data. // // (Coloring concept shown using labels) // // - Points closer to C1 → Cluster 1 // - Points closer to C2 → Cluster 2 // // y ↑ // | // 12 | 2 // | 1 // 10 | 1 C2 × // | 2 // 8 | 1 2 // | // 6 | C1 × 2 // | // 4 | 1 // | // 2 |______________________________________________→ x // 2 4 6 8 10 12 14 // // (1 = assigned to Cluster 1, 2 = assigned to Cluster 2) // At this stage, clusters are formed purely by distance.


Your chosen historical window becomes the static training dataset, and after fitting, the centroids never change again.

This makes the model:
  • Predictable
  • Repeatable
  • Consistent across backtests
  • Fast for live use (no recalculation of centroids every bar)


Static Training Window

You select a period with:
  • Training Start
  • Training End


Only bars inside this range are used to fit the K-means model. This window defines:
  • the market regime examples
  • the statistical distributions (means/std) for each feature
  • how the centroids will be positioned post-trainin


ảnh chụp nhanh
  • Bars before training = fully transparent
  • Training bars = gray
  • Post-training bars = full colored regimes


Feature Engineering (4D Input Vector)

Every bar during training becomes a 4-dimensional point: [rsi, cci, cmf, macd_histogram]
This combination balances: momentum, volatility, mean-reversion, trend acceleration giving the algorithm a richer "market fingerprint" per bar.

Standardization
To prevent any feature from dominating due to scale differences (e.g., CMF near zero vs CCI ±200), all features are standardized:

Pine Script®
standardize(value, mean, std) => (value - mean) / std


Centroid Initialization

Centroids start at diverse coordinates using various curves:
  • linear
  • sinusoidal
  • sign-preserving quadratic
  • tanh compression


Pine Script®
init_centroids() => // Spread centroids across [-1, 1] using different shapes per feature for c = 0 to k_clusters - 1 frac = k_clusters == 1 ? 0.0 : c / (k_clusters - 1.0) // 0 → 1 v = frac * 2 - 1 // -1 → +1 array.set(cent_rsi, c, v) // linear array.set(cent_cci, c, math.sin(v)) // sinusoidal array.set(cent_cmf, c, v * v * (v < 0 ? -1 : 1)) // quadratic sign-preserving array.set(cent_mac, c, tanh(v)) // compressed


This makes initial cluster spread “random” even though true randomness is hardly achieved in pinescript.

K-Means Iterative Refinement

The algorithm repeats these steps:
(A) Assignment Step, Each bar is assigned to the nearest centroid via Euclidean distance in 4D:
  • distance = sqrt(dx² + dy² + dz² + dw²)


(B) Update Step, Centroids update to the mean of points assigned to them. This repeats iterations times (configurable).

LIVE REGIME CLASSIFICATION

After training, each new bar is:
  • Standardized using the training mean/std
  • Compared to all centroids
  • Assigned to the nearest cluster
  • Bar color updates based on cluster


No re-training occurs. This ensures:
  • No lookahead bias
  • Clean historical testing
  • Stable regimes over time


CLUSTER BEHAVIOR & TRADING LOGIC

Clusters (0, 1, 2, 3…) hold no inherent meaning. The user defines what each cluster does.
Example of custom actions:
  • Cluster 0 → Cash
  • Cluster 1 → Long
  • Cluster 2 → Short
  • Cluster 3+ → Cash (noise regime)


ảnh chụp nhanh

This flexibility means:
  • One trader might have cluster 0 as consolidation.
  • Another might repurpose it as a breakout-loading zone.
  • A third might ignore 3 clusters entirely.


ảnh chụp nhanh
Example on ETHUSD

Important Note:
  • Any change of parameters or chart timeframe or ticker can cause the “order” of clusters to change
  • The script does NOT assume any cluster equals any actionable bias, user decides.


PERFORMANCE METRICS & ROC TABLE

The indicator computes average 1-bar ROC for each cluster in:
  • Training set
  • Test (live) set


This helps measure:
  • Cluster profitability consistency
  • Regime forward predictability
  • Whether a regime is noise, trend, or reversion-biased


ảnh chụp nhanh

EQUITY SIMULATION & FEES

Designed for close-to-close realistic backtesting.
Position = cluster of previous bar
Fees applied only on regime switches. Meaning:
  • Staying long → no fee
  • Switching long→short → fee applied
  • Switching any→cash → fee applied


Fee input is percentage, but script already converts internally.

ảnh chụp nhanh

Disclaimers
⚠️ This indicator uses machine-learning but does not predict the future. It classifies similarity to past regimes, nothing more.
⚠️ Backtest results are not indicative of future performance.
⚠️ Clusters have no inherent “bullish” or “bearish” meaning. You must interpret them based on your testing and your own feature engineering.

Thông báo miễn trừ trách nhiệm

Thông tin và các ấn phẩm này không nhằm mục đích, và không cấu thành, lời khuyên hoặc khuyến nghị về tài chính, đầu tư, giao dịch hay các loại khác do TradingView cung cấp hoặc xác nhận. Đọc thêm tại Điều khoản Sử dụng.