Bayesian LDA - Bayesian Linear Discriminant Analysis
Python: NimbusLDA | Julia: RxLDAModel
Mathematical Model: Pooled Gaussian Classifier (PGC)
Bayesian LDA is a Bayesian classification model with uncertainty quantification. It uses a shared precision matrix across all classes, making it fast and efficient for BCI classification.
Available in Both SDKs:
- Python SDK:
NimbusLDA class (sklearn-compatible)
- Julia SDK:
RxLDAModel (RxInfer.jl-based)
Both implementations provide the same Bayesian inference with uncertainty quantification.
Overview
Bayesian LDA extends traditional Linear Discriminant Analysis with full Bayesian inference, providing:
- ✅ Posterior probability distributions (not just point estimates)
- ✅ Uncertainty quantification for each prediction
- ✅ Probabilistic confidence scores
- ✅ Fast inference (<20ms per trial)
- ✅ Training and calibration support
- ✅ Batch and streaming inference modes
Quick Start
from nimbus_bci import NimbusLDA
import numpy as np
# Create and fit classifier
clf = NimbusLDA(mu_scale=3.0)
clf.fit(X_train, y_train)
# Predict with uncertainty
predictions = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)
# Online learning
clf.partial_fit(X_new, y_new)
using NimbusSDK
# Train model
model = train_model(
RxLDAModel,
train_data;
iterations=50
)
# Predict
results = predict_batch(model, test_data)
When to Use Bayesian LDA
Bayesian LDA is ideal for:
- Motor Imagery classification (2-4 classes)
- Well-separated class distributions
- Fast inference requirements (<20ms)
- Interpretable results for medical applications
- When classes have similar covariance structures
Consider Bayesian GMM instead if:
- Classes have significantly different covariance structures
- Complex, overlapping distributions
- Need more flexibility in modeling per-class variances
Model Architecture
Mathematical Foundation (Pooled Gaussian Classifier)
Bayesian LDA implements a Pooled Gaussian Classifier (PGC), which models class-conditional distributions with a shared precision matrix:
p(x | y=k) = N(μ_k, W^-1)
Where:
μ_k = mean vector for class k (learned from data)
W = shared precision matrix (same for all classes)
- Assumes classes have similar covariance structure
Key Assumption: All classes share the same covariance structure, which makes training and inference faster.
Hyperparameters
Bayesian LDA supports configurable hyperparameters for optimal performance tuning:
Available Hyperparameters (training):
| Parameter | Type | Default | Range | Description |
|---|
dof_offset | Int | 2 | [1, 5] | Degrees of freedom offset for Wishart prior |
mean_prior_precision | Float64 | 0.01 | [0.001, 0.1] | Prior precision for class means |
Parameter Effects:
- dof_offset: Controls regularization strength
- Lower values (1) → More data-driven, less regularization
- Higher values (3-5) → More regularization, more conservative
- mean_prior_precision: Controls prior strength on class means
- Lower values (0.001) → Weaker prior, trusts data more
- Higher values (0.05-0.1) → Stronger prior, more regularization
Model Structure
struct RxLDAModel <: BCIModel
mean_posteriors::Vector # MvNormal posteriors for class means
precision_posterior::Any # Wishart posterior for shared precision
priors::Vector{Float64} # Empirical class priors
metadata::ModelMetadata # Model info
dof_offset::Int # Degrees of freedom offset (training)
mean_prior_precision::Float64 # Mean prior precision (training)
end
RxInfer Implementation
The Bayesian LDA model uses RxInfer.jl for variational Bayesian inference:
Learning Phase:
@model function RxLDA_learning_model(y, labels, n_features, n_classes, dof_offset, mean_prior_precision)
# Prior on shared precision
dof = n_features + dof_offset
W ~ Wishart(dof, I)
# Priors on class means
for k in 1:n_classes
m[k] ~ MvNormal(mean=zeros(n_features), precision=mean_prior_precision * I)
end
# Likelihood with known labels
for i in eachindex(y)
k = labels[i]
y[i] ~ MvNormal(mean=m[k], precision=W)
end
end
Prediction Phase (Per-Class Likelihood):
To avoid mixture collapse with single observations, inference computes per-class likelihoods independently:
@model function RxLDA_predictive_single_class(y, class_mean, class_precision)
n_features = length(class_mean)
# Prior for this specific class (illustrative; actual implementation uses full posteriors)
m ~ MvNormalMeanPrecision(class_mean, 1e6 * I)
w ~ Wishart(n_features + 2, class_precision)
# Observations (no mixture variable)
for i in eachindex(y)
y[i] ~ MvNormal(mean=m, precision=w)
end
end
For each class k, compute p(y | μ_k, W) independently, then combine using Bayes’ rule with softmax normalization.
Usage
1. Load Pre-trained Model
Python SDK: The Python SDK (nimbus-bci) trains models locally and doesn’t require pre-trained model loading. See Python SDK Quickstart for training examples.
from nimbus_bci import NimbusLDA
import pickle
# Python SDK: Train your own model locally
# No authentication or model zoo needed
# Quick training example
clf = NimbusLDA(mu_scale=3.0)
clf.fit(X_train, y_train)
# Save for later use
with open("my_motor_imagery.pkl", "wb") as f:
pickle.dump(clf, f)
# Load saved model
with open("my_motor_imagery.pkl", "rb") as f:
clf = pickle.load(f)
print("Model info:")
print(f" Classes: {clf.classes_}")
print(f" Features: {clf.n_features_in_}")
using NimbusSDK
# Authenticate
NimbusSDK.install_core("nbci_live_your_key")
# Load from Nimbus model zoo
model = load_model(RxLDAModel, "motor_imagery_4class_v1")
println("Model loaded:")
println(" Features: $(get_n_features(model))")
println(" Classes: $(get_n_classes(model))")
println(" Paradigm: $(get_paradigm(model))")
2. Train Custom Model
from nimbus_bci import NimbusLDA
from nimbus_bci.compat import extract_csp_features
import mne
import pickle
# Load and preprocess EEG data
raw = mne.io.read_raw_gdf("motor_imagery.gdf", preload=True)
raw.filter(8, 30)
events = mne.find_events(raw)
event_id = {'left': 1, 'right': 2, 'feet': 3, 'tongue': 4}
epochs = mne.Epochs(raw, events, event_id, tmin=0, tmax=4, preload=True)
# Extract CSP features
csp_features, csp = extract_csp_features(epochs, n_components=8)
labels = epochs.events[:, 2]
# Train NimbusLDA model with default hyperparameters
clf = NimbusLDA()
clf.fit(csp_features, labels)
# Or train with custom hyperparameters
clf_tuned = NimbusLDA(
mu_scale=3.0, # Prior strength for means (default: 1.0)
sigma_scale=1.0 # Covariance regularization (default: 1.0)
)
clf_tuned.fit(csp_features, labels)
# Save for later use
with open("my_motor_imagery.pkl", "wb") as f:
pickle.dump(clf_tuned, f)
print("Model trained successfully!")
print(f"Training accuracy: {clf_tuned.score(csp_features, labels):.1%}")
using NimbusSDK
# Prepare training data with labels
train_features = csp_features # (16 × 250 × 100)
train_labels = [1, 2, 3, 4, 1, 2, ...] # 100 labels
train_data = BCIData(
train_features,
BCIMetadata(
sampling_rate = 250.0,
paradigm = :motor_imagery,
feature_type = :csp,
n_features = 16,
n_classes = 4,
chunk_size = nothing
),
train_labels # Required for training!
)
# Train RxLDA model with default hyperparameters
model = train_model(
RxLDAModel,
train_data;
iterations = 50, # Inference iterations
showprogress = true, # Show progress bar
name = "my_motor_imagery",
description = "4-class MI classifier with CSP"
)
# Or train with custom hyperparameters (v0.2.0+)
model = train_model(
RxLDAModel,
train_data;
iterations = 50,
showprogress = true,
name = "my_motor_imagery_tuned",
description = "4-class MI with tuned hyperparameters",
dof_offset = 2, # DOF offset (default: 2)
mean_prior_precision = 0.01 # Prior precision (default: 0.01)
)
# Save for later use
save_model(model, "my_model.jld2")
Training Parameters:
iterations: Number of variational inference iterations (default: 50)
- More iterations = better convergence but slower training
- 50-100 is typically sufficient
showprogress: Display progress bar during training
name: Model identifier
description: Model description for documentation
dof_offset: Degrees of freedom offset (default: 2, range: [1, 5])
mean_prior_precision: Prior precision for means (default: 0.01, range: [0.001, 0.1])
3. Subject-Specific Calibration
Fine-tune a pre-trained model with subject-specific data (much faster than training from scratch):
from nimbus_bci import NimbusLDA
import numpy as np
import pickle
# Load baseline model (trained on multiple subjects)
with open("motor_imagery_baseline.pkl", "rb") as f:
base_clf = pickle.load(f)
# Collect 10-20 calibration trials from new subject
X_calib, y_calib = collect_calibration_trials() # Your function
# Personalize using online learning (partial_fit)
personalized_clf = NimbusLDA()
personalized_clf.fit(X_baseline, y_baseline) # Start with baseline
# Fine-tune on calibration data
for _ in range(10): # Multiple passes for adaptation
personalized_clf.partial_fit(X_calib, y_calib)
# Save personalized model
with open("subject_001_calibrated.pkl", "wb") as f:
pickle.dump(personalized_clf, f)
print("Model personalized successfully!")
# Load base model
base_model = load_model(RxLDAModel, "motor_imagery_baseline_v1")
# Collect 10-20 calibration trials from new subject
calib_features = collect_calibration_trials() # Your function
calib_labels = [1, 2, 3, 4, 1, 2, ...]
calib_data = BCIData(calib_features, metadata, calib_labels)
# Calibrate (personalize) the model
personalized_model = calibrate_model(
base_model,
calib_data;
iterations = 20 # Fewer iterations needed
)
save_model(personalized_model, "subject_001_calibrated.jld2")
Calibration Benefits:
- Requires only 10-20 trials per class (vs 50-100 for training from scratch)
- Faster: 20 iterations vs 50-100
- Better generalization: Uses pre-trained model as prior
- Typical accuracy improvement: 5-15% over generic model
- Hyperparameters automatically preserved:
calibrate_model() inherits training hyperparameters from the base model (dof_offset, mean_prior_precision) ensuring consistency
Important: You don’t need to specify hyperparameters when calibrating - they are automatically inherited from the base model. This ensures the calibrated model uses the same regularization strategy that worked well during initial training.
4. Batch Inference
Process multiple trials efficiently:
from nimbus_bci import NimbusLDA
import numpy as np
# Run batch inference
predictions = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)
confidences = np.max(probabilities, axis=1)
# Analyze results
print(f"Predictions: {predictions}")
print(f"Mean confidence: {np.mean(confidences):.3f}")
# Calculate accuracy
accuracy = np.mean(predictions == y_test)
print(f"Accuracy: {accuracy * 100:.1f}%")
# Prepare test data
test_data = BCIData(test_features, metadata, test_labels)
# Run batch inference
results = predict_batch(model, test_data; iterations=10)
# Analyze results
println("Predictions: ", results.predictions)
println("Mean confidence: ", mean(results.confidences))
# Calculate accuracy
accuracy = sum(results.predictions .== test_labels) / length(test_labels)
println("Accuracy: $(round(accuracy * 100, digits=1))%")
# Calculate ITR
itr = calculate_ITR(accuracy, 4, 4.0) # 4 classes, 4-second trials
println("ITR: $(round(itr, digits=1)) bits/minute")
5. Streaming Inference
Real-time chunk-by-chunk processing:
from nimbus_bci import StreamingSession
from nimbus_bci.data import BCIMetadata
# Configure metadata with chunk size
metadata = BCIMetadata(
sampling_rate=250.0,
paradigm="motor_imagery",
feature_type="csp",
n_features=16,
n_classes=4,
chunk_size=250 # 1-second chunks
)
# Initialize streaming session
session = StreamingSession(clf.model_, metadata)
# Process chunks as they arrive
for chunk in eeg_feature_stream:
result = session.process_chunk(chunk)
print(f"Chunk: pred={result.prediction}, conf={result.confidence:.3f}")
# Finalize trial with aggregation
final_result = session.finalize_trial()
print(f"Final: pred={final_result.prediction}, conf={final_result.confidence:.3f}")
# Reset for next trial
session.reset()
# Initialize streaming session
session = init_streaming(model, metadata_with_chunk_size)
# Process chunks as they arrive
for chunk in eeg_feature_stream
result = process_chunk(session, chunk; iterations=10)
println("Chunk: pred=$(result.prediction), conf=$(round(result.confidence, digits=3))")
end
# Finalize trial with aggregation
final_result = finalize_trial(session; method=:weighted_vote)
println("Final: pred=$(final_result.prediction), conf=$(round(final_result.confidence, digits=3))")
Hyperparameter Tuning (v0.2.0+)
Fine-tune Bayesian LDA for optimal performance on your specific dataset.
When to Tune Hyperparameters
Consider tuning when:
- Default performance is unsatisfactory
- You have specific data characteristics (very noisy or very clean)
- You have limited or extensive training data
- You want to optimize for your specific paradigm
Tuning Strategies
For High SNR / Clean Data / Many Trials
Use lower regularization to let the data drive the model:
from nimbus_bci import NimbusLDA
# Lower regularization for clean data
clf = NimbusLDA(
mu_scale=1.0, # Weaker prior, trust data more
sigma_scale=0.1 # Less covariance regularization
)
clf.fit(X_train, y_train)
model = train_model(
RxLDAModel,
train_data;
iterations = 50,
dof_offset = 1, # Less regularization
mean_prior_precision = 0.001 # Weaker prior, trust data more
)
Use when:
- SNR > 5 dB
- 100+ trials per class
- Clean, artifact-free data
- Well-controlled experimental conditions
For Low SNR / Noisy Data / Few Trials
Use higher regularization for stability:
from nimbus_bci import NimbusLDA
# Higher regularization for noisy data
clf = NimbusLDA(
mu_scale=5.0, # Stronger prior
sigma_scale=10.0 # More covariance regularization
)
clf.fit(X_train, y_train)
model = train_model(
RxLDAModel,
train_data;
iterations = 50,
dof_offset = 3, # More regularization
mean_prior_precision = 0.05 # Stronger prior
)
Use when:
- SNR < 2 dB
- 40-80 trials per class
- Noisy data or limited artifact removal
- Challenging recording conditions
Balanced / Default Settings
The defaults work well for most scenarios:
model = train_model(
RxLDAModel,
train_data;
iterations = 50,
dof_offset = 2, # Balanced (default)
mean_prior_precision = 0.01 # Balanced (default)
)
Use when:
- Moderate SNR (2-5 dB)
- 80-150 trials per class
- Standard BCI recording conditions
- Starting point for experimentation
Hyperparameter Search Example
Systematically search for optimal hyperparameters:
from nimbus_bci import NimbusLDA
from sklearn.model_selection import GridSearchCV, train_test_split
# Define parameter grid
param_grid = {
'mu_scale': [1.0, 3.0, 5.0, 10.0],
'sigma_scale': [0.1, 1.0, 10.0]
}
# Split data
X_train_sub, X_val, y_train_sub, y_val = train_test_split(
X_train, y_train, test_size=0.2, stratify=y_train, random_state=42
)
print("Searching hyperparameters...")
clf = NimbusLDA()
grid_search = GridSearchCV(
clf,
param_grid,
cv=5,
scoring='accuracy',
verbose=1
)
grid_search.fit(X_train_sub, y_train_sub)
print("\nBest hyperparameters:")
print(f" mu_scale: {grid_search.best_params_['mu_scale']}")
print(f" sigma_scale: {grid_search.best_params_['sigma_scale']}")
print(f" CV accuracy: {grid_search.best_score_*100:.1f}%")
# Get best model
best_clf = grid_search.best_estimator_
# Retrain on all training data
best_clf.fit(X_train, y_train)
using NimbusSDK
# Define search grid
dof_values = [1, 2, 3, 4]
prior_values = [0.001, 0.01, 0.05, 0.1]
# Split data into train/validation
train_data, val_data = split_data(all_data, ratio=0.8)
best_accuracy = 0.0
best_params = nothing
println("Searching hyperparameters...")
for dof in dof_values
for prior in prior_values
# Train model with these hyperparameters
model = train_model(
RxLDAModel,
train_data;
iterations = 50,
dof_offset = dof,
mean_prior_precision = prior,
showprogress = false
)
# Validate
results = predict_batch(model, val_data)
accuracy = sum(results.predictions .== val_data.labels) / length(val_data.labels)
println(" dof=$dof, prior=$prior: $(round(accuracy*100, digits=1))%")
# Track best
if accuracy > best_accuracy
best_accuracy = accuracy
best_params = (dof=dof, prior=prior)
end
end
end
println("\nBest hyperparameters:")
println(" dof_offset: $(best_params.dof)")
println(" mean_prior_precision: $(best_params.prior)")
println(" Validation accuracy: $(round(best_accuracy*100, digits=1))%")
# Retrain on all data with best hyperparameters
final_model = train_model(
RxLDAModel,
all_data;
iterations = 50,
dof_offset = best_params.dof,
mean_prior_precision = best_params.prior
)
Quick Tuning Guidelines
| Scenario | dof_offset | mean_prior_precision | Notes |
|---|
| Excellent data quality | 1 | 0.001 | Minimal regularization |
| Good data quality | 2 (default) | 0.01 (default) | Balanced approach |
| Moderate data quality | 2-3 | 0.01-0.03 | Slight regularization |
| Poor data quality | 3-4 | 0.05-0.1 | Strong regularization |
| Very limited trials | 4 | 0.1 | Maximum regularization |
Pro Tip: Start with defaults (dof_offset=2, mean_prior_precision=0.01) and only tune if performance is unsatisfactory. The defaults are optimized for typical BCI scenarios.
Training Requirements
Data Requirements
- Minimum: 40 trials per class (160 total for 4-class)
- Recommended: 80+ trials per class (320+ total for 4-class)
- For calibration: 10-20 trials per class sufficient
Bayesian LDA requires at least 2 trials to estimate class statistics and shared precision matrix. Single-trial training is not statistically valid for LDA and will raise an ArgumentError.Your training data must have shape (n_features, n_samples, n_trials) where n_trials >= 2.
Feature Normalization
Critical for cross-session BCI performance!Normalize your features before training for 15-30% accuracy improvement across sessions.
from sklearn.preprocessing import StandardScaler
import pickle
# Estimate normalization from training data
scaler = StandardScaler()
X_train_norm = scaler.fit_transform(X_train)
# Train with normalized features
clf = NimbusLDA()
clf.fit(X_train_norm, y_train)
# Save model and scaler together
with open("model_with_scaler.pkl", "wb") as f:
pickle.dump({'model': clf, 'scaler': scaler}, f)
# Later: Apply same normalization to test data
X_test_norm = scaler.transform(X_test)
predictions = clf.predict(X_test_norm)
# Estimate normalization from training data
norm_params = estimate_normalization_params(train_features; method=:zscore)
train_norm = apply_normalization(train_features, norm_params)
# Train with normalized features
train_data = BCIData(train_norm, metadata, labels)
model = train_model(RxLDAModel, train_data)
# Save params with model
@save "model.jld2" model norm_params
# Later: Apply same params to test data
test_norm = apply_normalization(test_features, norm_params)
See the Feature Normalization guide for complete details.
Feature Requirements
Bayesian LDA expects preprocessed features, not raw EEG:
✅ Required preprocessing:
- Bandpass filtering (8-30 Hz for motor imagery)
- Artifact removal (ICA recommended)
- Spatial filtering (CSP for motor imagery)
- Feature extraction (log-variance for CSP features)
- Temporal aggregation (handled automatically during training/inference)
❌ NOT accepted:
- Raw EEG channels
- Unfiltered data
- Non-extracted features
See Preprocessing Requirements for details.
Temporal Aggregation
Critical Preprocessing Step: Before training, the SDK automatically aggregates the temporal dimension of each trial into a single feature vector. This prevents treating temporally correlated samples as independent observations, which would violate the i.i.d. assumption of the model.
The aggregation method depends on your feature type and paradigm:
- CSP features: Log-variance aggregation (default for motor imagery)
- Power spectral features: Mean or median aggregation
- Other features: Configurable via
BCIMetadata.temporal_aggregation
This aggregation happens automatically during train_model() and predict_batch() calls.
| Operation | Latency | Notes |
|---|
| Training | 10-30 seconds | 50 iterations, 100 trials per class |
| Calibration | 5-15 seconds | 20 iterations, 20 trials per class |
| Batch Inference | 10-20ms per trial | 10 iterations |
| Streaming Chunk | 10-20ms | 10 iterations per chunk |
All measurements on standard CPU (no GPU required).
Classification Accuracy
| Paradigm | Classes | Typical Accuracy | ITR |
|---|
| Motor Imagery | 2 (L/R hand) | 75-90% | 15-25 bits/min |
| Motor Imagery | 4 (L/R/Feet/Tongue) | 70-85% | 20-35 bits/min |
| P300 | 2 (Target/Non-target) | 80-95% | 25-40 bits/min |
Accuracy is highly subject-dependent. Subject-specific calibration typically improves accuracy by 5-15%.
Model Inspection
View Model Parameters
import numpy as np
# Class means
print("Class means:")
for k, class_label in enumerate(clf.classes_):
print(f" Class {class_label}: {clf.model_['means'][k]}")
# Shared covariance matrix (first 3x3)
print("\nShared covariance matrix (first 3x3):")
print(clf.model_['covariance'][:3, :3])
# Model info
print("\nModel info:")
print(f" Features: {clf.n_features_in_}")
print(f" Classes: {clf.classes_}")
# Class priors (learned from data)
print("\nClass priors:")
for k, class_label in enumerate(clf.classes_):
print(f" Class {class_label}: {clf.model_['priors'][k]:.3f}")
# Extract point estimates from posterior distributions
using Distributions
# Class means (extract from posterior distributions)
println("Class means:")
for (k, mean_posterior) in enumerate(model.mean_posteriors)
mean_point = mean(mean_posterior) # Extract point estimate
println(" Class $k: ", mean_point)
end
# Shared precision matrix (extract from posterior distribution)
println("\nShared precision matrix (first 3x3):")
precision_point = mean(model.precision_posterior) # Extract point estimate
println(precision_point[1:3, 1:3])
# Model metadata
println("\nMetadata:")
println(" Name: ", model.metadata.name)
println(" Paradigm: ", model.metadata.paradigm)
println(" Features: ", model.metadata.n_features)
println(" Classes: ", model.metadata.n_classes)
# Class priors
println("\nClass priors:")
for (k, prior) in enumerate(model.priors)
println(" Class $k: ", prior)
end
Accessing model parameters: The SDK stores full posterior distributions (not just point estimates) for proper Bayesian inference. To get point estimates, use mean(posterior) to extract the mean of the posterior distribution. For precision matrices, use mean(precision_posterior) to get the expected precision matrix.
Compare Models
from nimbus_bci import NimbusLDA
import numpy as np
# Train multiple models with different hyperparameters and compare
models = []
for mu_scale in [1.0, 3.0, 5.0]:
clf = NimbusLDA(mu_scale=mu_scale)
clf.fit(X_train, y_train)
accuracy = clf.score(X_test, y_test)
print(f"mu_scale: {mu_scale}, Accuracy: {accuracy*100:.1f}%")
models.append((mu_scale, clf, accuracy))
# Select best model
best_config, best_clf, best_acc = max(models, key=lambda x: x[2])
print(f"\nBest configuration: mu_scale={best_config} ({best_acc*100:.1f}%)")
# Train multiple models and compare
models = []
for n_iter in [20, 50, 100]
model = train_model(RxLDAModel, train_data; iterations=n_iter)
results = predict_batch(model, test_data)
accuracy = sum(results.predictions .== test_labels) / length(test_labels)
println("Iterations: $n_iter, Accuracy: $(round(accuracy*100, digits=1))%")
push!(models, (n_iter, model, accuracy))
end
Advantages & Limitations
Advantages
✅ Fast Training: Shared covariance estimation is efficient
✅ Fast Inference: Analytical posterior computation (<20ms)
✅ Interpretable: Clear probabilistic formulation
✅ Memory Efficient: Single shared precision matrix
✅ Robust: Handles uncertainty naturally via Bayesian inference
✅ Production-Ready: Battle-tested in real BCI applications
Limitations
❌ Shared Covariance Assumption: May not fit well if classes have very different spreads
❌ Linear Decision Boundary: Cannot capture non-linear class boundaries
❌ Gaussian Assumption: Assumes normal class distributions
❌ Not Ideal for Overlapping Classes: Use RxGMM for complex distributions
Comparison: Bayesian LDA vs Bayesian GMM
| Aspect | Bayesian LDA (RxLDA) | Bayesian GMM (RxGMM) |
|---|
| Precision Matrix | Shared across all classes | Class-specific |
| Mathematical Model | Pooled Gaussian Classifier (PGC) | Heteroscedastic Gaussian Classifier (HGC) |
| Training Speed | Faster | Slower |
| Inference Speed | Faster (~10-15ms) | Slightly slower (~15-20ms) |
| Flexibility | Less flexible | More flexible |
| Best For | Well-separated classes | Overlapping/complex distributions |
| Memory | Lower | Higher |
| Parameters | Fewer (n_classes means + 1 precision) | More (n_classes means + precisions) |
Rule of thumb: Start with Bayesian LDA. Switch to Bayesian GMM if accuracy is unsatisfactory.
Next Steps
References
Implementation:
Theory:
- Fisher, R. A. (1936). “The use of multiple measurements in taxonomic problems”
- Bishop, C. M. (2006). “Pattern Recognition and Machine Learning” (Chapter 4)
- Pooled Gaussian Classifier (PGC) with shared covariance structure
BCI Applications:
- Blankertz et al. (2008). “Optimizing spatial filters for robust EEG single-trial analysis”
- Lotte et al. (2018). “A review of classification algorithms for EEG-based BCI”