OPEN-SOURCE SCRIPT

Static K-means Clustering | InvestorUnknown

Static K-Means Clustering is a machine-learning-driven market regime classifier designed for traders who want a data-driven structure instead of subjective indicators or manually drawn zones.

This script performs offline (static) K-means training on your chosen historical window. Using four engineered features:

RSI (Momentum)
CCI (Price deviation / Mean reversion)
CMF (Money flow / Strength)
MACD Histogram (Trend acceleration)

It groups past market conditions into K distinct clusters (regimes). After training, every new bar is assigned to the nearest cluster via Euclidean distance in 4-dimensional standardized feature space.

This allows you to create models like:

Regime-based long/short filters
Volatility phase detectors
Trend vs. chop separation
Mean-reversion vs. breakout classification
Volume-enhanced money-flow regime shifts
Full machine-learning trading systems based solely on regimes

Note:

* This script is not a universal ML strategy out of the box.
* The user must engineer the feature set to match their trading style and target market.
* K-means is a tool, not a ready made system, this script provides the framework.

Core Idea
K-means clustering takes raw, unlabeled market observations and attempts to discover structure by grouping similar bars together.

Pine Script®

// STEP 1 — DATA POINTS ON A COORDINATE PLANE // We start with raw, unlabeled data scattered in 2D space (x/y). // At this point, nothing is grouped—these are just observations. // K-means will try to discover structure by grouping nearby points. // // y ↑ // | // 12 | • // | • // 10 | • // | • // 8 | • • // | // 6 | • // | // 4 | • // | // 2 |______________________________________________→ x // 2 4 6 8 10 12 14 // // // // STEP 2 — RANDOMLY PLACE INITIAL CENTROIDS // The algorithm begins by placing K centroids at random positions. // These centroids act as the temporary “representatives” of clusters. // Their starting positions heavily influence the first assignment step. // // y ↑ // | // 12 | • // | • // 10 | • C2 × // | • // 8 | • • // | // 6 | C1 × • // | // 4 | • // | // 2 |______________________________________________→ x // 2 4 6 8 10 12 14 // // // // STEP 3 — ASSIGN POINTS TO NEAREST CENTROID // Each point is compared to all centroids. // Using simple Euclidean distance, each point joins the cluster // of the centroid it is closest to. // This creates a temporary grouping of the data. // // (Coloring concept shown using labels) // // - Points closer to C1 → Cluster 1 // - Points closer to C2 → Cluster 2 // // y ↑ // | // 12 | 2 // | 1 // 10 | 1 C2 × // | 2 // 8 | 1 2 // | // 6 | C1 × 2 // | // 4 | 1 // | // 2 |______________________________________________→ x // 2 4 6 8 10 12 14 // // (1 = assigned to Cluster 1, 2 = assigned to Cluster 2) // At this stage, clusters are formed purely by distance.

Your chosen historical window becomes the static training dataset, and after fitting, the centroids never change again.

This makes the model:

Predictable
Repeatable
Consistent across backtests
Fast for live use (no recalculation of centroids every bar)

Static Training Window

You select a period with:

Training Start
Training End

Only bars inside this range are used to fit the K-means model. This window defines:

the market regime examples
the statistical distributions (means/std) for each feature
how the centroids will be positioned post-trainin

Bars before training = fully transparent
Training bars = gray
Post-training bars = full colored regimes

Feature Engineering (4D Input Vector)

Every bar during training becomes a 4-dimensional point: [rsi, cci, cmf, macd_histogram]
This combination balances: momentum, volatility, mean-reversion, trend acceleration giving the algorithm a richer "market fingerprint" per bar.

Standardization
To prevent any feature from dominating due to scale differences (e.g., CMF near zero vs CCI ±200), all features are standardized:

Pine Script®

standardize(value, mean, std) => (value - mean) / std

Centroid Initialization

Centroids start at diverse coordinates using various curves:

linear
sinusoidal
sign-preserving quadratic
tanh compression

Pine Script®

init_centroids() => // Spread centroids across [-1, 1] using different shapes per feature for c = 0 to k_clusters - 1 frac = k_clusters == 1 ? 0.0 : c / (k_clusters - 1.0) // 0 → 1 v = frac * 2 - 1 // -1 → +1 array.set(cent_rsi, c, v) // linear array.set(cent_cci, c, math.sin(v)) // sinusoidal array.set(cent_cmf, c, v * v * (v < 0 ? -1 : 1)) // quadratic sign-preserving array.set(cent_mac, c, tanh(v)) // compressed

This makes initial cluster spread “random” even though true randomness is hardly achieved in pinescript.

K-Means Iterative Refinement

The algorithm repeats these steps:
(A) Assignment Step, Each bar is assigned to the nearest centroid via Euclidean distance in 4D:

distance = sqrt(dx² + dy² + dz² + dw²)

(B) Update Step, Centroids update to the mean of points assigned to them. This repeats iterations times (configurable).

LIVE REGIME CLASSIFICATION

After training, each new bar is:

Standardized using the training mean/std
Compared to all centroids
Assigned to the nearest cluster
Bar color updates based on cluster

No re-training occurs. This ensures:

No lookahead bias
Clean historical testing
Stable regimes over time

CLUSTER BEHAVIOR & TRADING LOGIC

Clusters (0, 1, 2, 3…) hold no inherent meaning. The user defines what each cluster does.
Example of custom actions:

Cluster 0 → Cash
Cluster 1 → Long
Cluster 2 → Short
Cluster 3+ → Cash (noise regime)

This flexibility means:

One trader might have cluster 0 as consolidation.
Another might repurpose it as a breakout-loading zone.
A third might ignore 3 clusters entirely.

Example on ETHUSD

Important Note:

Any change of parameters or chart timeframe or ticker can cause the “order” of clusters to change
The script does NOT assume any cluster equals any actionable bias, user decides.

PERFORMANCE METRICS & ROC TABLE

The indicator computes average 1-bar ROC for each cluster in:

Training set
Test (live) set

This helps measure:

Cluster profitability consistency
Regime forward predictability
Whether a regime is noise, trend, or reversion-biased

EQUITY SIMULATION & FEES

Designed for close-to-close realistic backtesting.
Position = cluster of previous bar
Fees applied only on regime switches. Meaning:

Staying long → no fee
Switching long→short → fee applied
Switching any→cash → fee applied

Fee input is percentage, but script already converts internally.

Disclaimers
⚠️ This indicator uses machine-learning but does not predict the future. It classifies similarity to past regimes, nothing more.
⚠️ Backtest results are not indicative of future performance.
⚠️ Clusters have no inherent “bullish” or “bearish” meaning. You must interpret them based on your testing and your own feature engineering.

Mã nguồn mở

Theo đúng tinh thần TradingView, tác giả của tập lệnh này đã công bố nó dưới dạng mã nguồn mở, để các nhà giao dịch có thể xem xét và xác minh chức năng. Chúc mừng tác giả! Mặc dù bạn có thể sử dụng miễn phí, hãy nhớ rằng việc công bố lại mã phải tuân theo Nội quy.

InvestorUnknown

Thông báo miễn trừ trách nhiệm

Thông tin và các ấn phẩm này không nhằm mục đích, và không cấu thành, lời khuyên hoặc khuyến nghị về tài chính, đầu tư, giao dịch hay các loại khác do TradingView cung cấp hoặc xác nhận. Đọc thêm tại Điều khoản Sử dụng.

Mã nguồn mở

InvestorUnknown