Skip to main content

Bayesian LDA - Bayesian Linear Discriminant Analysis

Python: NimbusLDA | Julia: RxLDAModel
Mathematical Model: Pooled Gaussian Classifier (PGC)
Bayesian LDA is a Bayesian classification model with uncertainty quantification. It uses a shared precision matrix across all classes, making it fast and efficient for BCI classification.
Available in Both SDKs:
  • Python SDK: NimbusLDA class (sklearn-compatible)
  • Julia SDK: RxLDAModel (RxInfer.jl-based)
Both implementations provide the same Bayesian inference with uncertainty quantification.

Overview

Bayesian LDA extends traditional Linear Discriminant Analysis with full Bayesian inference, providing:
  • Posterior probability distributions (not just point estimates)
  • Uncertainty quantification for each prediction
  • Probabilistic confidence scores
  • Fast inference (<20ms per trial)
  • Training and calibration support
  • Batch and streaming inference modes

Quick Start

from nimbus_bci import NimbusLDA
import numpy as np

# Create and fit classifier
clf = NimbusLDA(mu_scale=3.0)
clf.fit(X_train, y_train)

# Predict with uncertainty
predictions = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)

# Online learning
clf.partial_fit(X_new, y_new)

When to Use Bayesian LDA

Bayesian LDA is ideal for:
  • Motor Imagery classification (2-4 classes)
  • Well-separated class distributions
  • Fast inference requirements (<20ms)
  • Interpretable results for medical applications
  • When classes have similar covariance structures
Consider Bayesian GMM instead if:
  • Classes have significantly different covariance structures
  • Complex, overlapping distributions
  • Need more flexibility in modeling per-class variances

Model Architecture

Mathematical Foundation (Pooled Gaussian Classifier)

Bayesian LDA implements a Pooled Gaussian Classifier (PGC), which models class-conditional distributions with a shared precision matrix:
p(x | y=k) = N(μ_k, W^-1)
Where:
  • μ_k = mean vector for class k (learned from data)
  • W = shared precision matrix (same for all classes)
  • Assumes classes have similar covariance structure
Key Assumption: All classes share the same covariance structure, which makes training and inference faster.

Hyperparameters

Bayesian LDA supports configurable hyperparameters for optimal performance tuning: Available Hyperparameters (training):
ParameterTypeDefaultRangeDescription
dof_offsetInt2[1, 5]Degrees of freedom offset for Wishart prior
mean_prior_precisionFloat640.01[0.001, 0.1]Prior precision for class means
Parameter Effects:
  • dof_offset: Controls regularization strength
    • Lower values (1) → More data-driven, less regularization
    • Higher values (3-5) → More regularization, more conservative
  • mean_prior_precision: Controls prior strength on class means
    • Lower values (0.001) → Weaker prior, trusts data more
    • Higher values (0.05-0.1) → Stronger prior, more regularization

Model Structure

struct RxLDAModel <: BCIModel
    mean_posteriors::Vector          # MvNormal posteriors for class means
    precision_posterior::Any         # Wishart posterior for shared precision
    priors::Vector{Float64}          # Empirical class priors
    metadata::ModelMetadata          # Model info
    dof_offset::Int                  # Degrees of freedom offset (training)
    mean_prior_precision::Float64    # Mean prior precision (training)
end

RxInfer Implementation

The Bayesian LDA model uses RxInfer.jl for variational Bayesian inference: Learning Phase:
@model function RxLDA_learning_model(y, labels, n_features, n_classes, dof_offset, mean_prior_precision)
    # Prior on shared precision
    dof = n_features + dof_offset
    W ~ Wishart(dof, I)
    
    # Priors on class means
    for k in 1:n_classes
        m[k] ~ MvNormal(mean=zeros(n_features), precision=mean_prior_precision * I)
    end
    
    # Likelihood with known labels
    for i in eachindex(y)
        k = labels[i]
        y[i] ~ MvNormal(mean=m[k], precision=W)
    end
end
Prediction Phase (Per-Class Likelihood): To avoid mixture collapse with single observations, inference computes per-class likelihoods independently:
@model function RxLDA_predictive_single_class(y, class_mean, class_precision)
    n_features = length(class_mean)
    
    # Prior for this specific class (illustrative; actual implementation uses full posteriors)
    m ~ MvNormalMeanPrecision(class_mean, 1e6 * I)
    w ~ Wishart(n_features + 2, class_precision)
    
    # Observations (no mixture variable)
    for i in eachindex(y)
        y[i] ~ MvNormal(mean=m, precision=w)
    end
end
For each class k, compute p(y | μ_k, W) independently, then combine using Bayes’ rule with softmax normalization.

Usage

1. Load Pre-trained Model

Python SDK: The Python SDK (nimbus-bci) trains models locally and doesn’t require pre-trained model loading. See Python SDK Quickstart for training examples.
from nimbus_bci import NimbusLDA
import pickle

# Python SDK: Train your own model locally
# No authentication or model zoo needed

# Quick training example
clf = NimbusLDA(mu_scale=3.0)
clf.fit(X_train, y_train)

# Save for later use
with open("my_motor_imagery.pkl", "wb") as f:
    pickle.dump(clf, f)

# Load saved model
with open("my_motor_imagery.pkl", "rb") as f:
    clf = pickle.load(f)

print("Model info:")
print(f"  Classes: {clf.classes_}")
print(f"  Features: {clf.n_features_in_}")

2. Train Custom Model

from nimbus_bci import NimbusLDA
from nimbus_bci.compat import extract_csp_features
import mne
import pickle

# Load and preprocess EEG data
raw = mne.io.read_raw_gdf("motor_imagery.gdf", preload=True)
raw.filter(8, 30)

events = mne.find_events(raw)
event_id = {'left': 1, 'right': 2, 'feet': 3, 'tongue': 4}
epochs = mne.Epochs(raw, events, event_id, tmin=0, tmax=4, preload=True)

# Extract CSP features
csp_features, csp = extract_csp_features(epochs, n_components=8)
labels = epochs.events[:, 2]

# Train NimbusLDA model with default hyperparameters
clf = NimbusLDA()
clf.fit(csp_features, labels)

# Or train with custom hyperparameters
clf_tuned = NimbusLDA(
    mu_scale=3.0,        # Prior strength for means (default: 1.0)
    sigma_scale=1.0      # Covariance regularization (default: 1.0)
)
clf_tuned.fit(csp_features, labels)

# Save for later use
with open("my_motor_imagery.pkl", "wb") as f:
    pickle.dump(clf_tuned, f)

print("Model trained successfully!")
print(f"Training accuracy: {clf_tuned.score(csp_features, labels):.1%}")
Training Parameters:
  • iterations: Number of variational inference iterations (default: 50)
    • More iterations = better convergence but slower training
    • 50-100 is typically sufficient
  • showprogress: Display progress bar during training
  • name: Model identifier
  • description: Model description for documentation
  • dof_offset: Degrees of freedom offset (default: 2, range: [1, 5])
  • mean_prior_precision: Prior precision for means (default: 0.01, range: [0.001, 0.1])

3. Subject-Specific Calibration

Fine-tune a pre-trained model with subject-specific data (much faster than training from scratch):
from nimbus_bci import NimbusLDA
import numpy as np
import pickle

# Load baseline model (trained on multiple subjects)
with open("motor_imagery_baseline.pkl", "rb") as f:
    base_clf = pickle.load(f)

# Collect 10-20 calibration trials from new subject
X_calib, y_calib = collect_calibration_trials()  # Your function

# Personalize using online learning (partial_fit)
personalized_clf = NimbusLDA()
personalized_clf.fit(X_baseline, y_baseline)  # Start with baseline

# Fine-tune on calibration data
for _ in range(10):  # Multiple passes for adaptation
    personalized_clf.partial_fit(X_calib, y_calib)

# Save personalized model
with open("subject_001_calibrated.pkl", "wb") as f:
    pickle.dump(personalized_clf, f)

print("Model personalized successfully!")
Calibration Benefits:
  • Requires only 10-20 trials per class (vs 50-100 for training from scratch)
  • Faster: 20 iterations vs 50-100
  • Better generalization: Uses pre-trained model as prior
  • Typical accuracy improvement: 5-15% over generic model
  • Hyperparameters automatically preserved: calibrate_model() inherits training hyperparameters from the base model (dof_offset, mean_prior_precision) ensuring consistency
Important: You don’t need to specify hyperparameters when calibrating - they are automatically inherited from the base model. This ensures the calibrated model uses the same regularization strategy that worked well during initial training.

4. Batch Inference

Process multiple trials efficiently:
from nimbus_bci import NimbusLDA
import numpy as np

# Run batch inference
predictions = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)
confidences = np.max(probabilities, axis=1)

# Analyze results
print(f"Predictions: {predictions}")
print(f"Mean confidence: {np.mean(confidences):.3f}")

# Calculate accuracy
accuracy = np.mean(predictions == y_test)
print(f"Accuracy: {accuracy * 100:.1f}%")

5. Streaming Inference

Real-time chunk-by-chunk processing:
For detailed Python streaming examples, see Python SDK Streaming Inference.
from nimbus_bci import StreamingSession
from nimbus_bci.data import BCIMetadata

# Configure metadata with chunk size
metadata = BCIMetadata(
    sampling_rate=250.0,
    paradigm="motor_imagery",
    feature_type="csp",
    n_features=16,
    n_classes=4,
    chunk_size=250  # 1-second chunks
)

# Initialize streaming session
session = StreamingSession(clf.model_, metadata)

# Process chunks as they arrive
for chunk in eeg_feature_stream:
    result = session.process_chunk(chunk)
    print(f"Chunk: pred={result.prediction}, conf={result.confidence:.3f}")

# Finalize trial with aggregation
final_result = session.finalize_trial()
print(f"Final: pred={final_result.prediction}, conf={final_result.confidence:.3f}")

# Reset for next trial
session.reset()

Hyperparameter Tuning (v0.2.0+)

Fine-tune Bayesian LDA for optimal performance on your specific dataset.

When to Tune Hyperparameters

Consider tuning when:
  • Default performance is unsatisfactory
  • You have specific data characteristics (very noisy or very clean)
  • You have limited or extensive training data
  • You want to optimize for your specific paradigm

Tuning Strategies

For High SNR / Clean Data / Many Trials

Use lower regularization to let the data drive the model:
from nimbus_bci import NimbusLDA

# Lower regularization for clean data
clf = NimbusLDA(
    mu_scale=1.0,      # Weaker prior, trust data more
    sigma_scale=0.1    # Less covariance regularization
)
clf.fit(X_train, y_train)
Use when:
  • SNR > 5 dB
  • 100+ trials per class
  • Clean, artifact-free data
  • Well-controlled experimental conditions

For Low SNR / Noisy Data / Few Trials

Use higher regularization for stability:
from nimbus_bci import NimbusLDA

# Higher regularization for noisy data
clf = NimbusLDA(
    mu_scale=5.0,      # Stronger prior
    sigma_scale=10.0   # More covariance regularization
)
clf.fit(X_train, y_train)
Use when:
  • SNR < 2 dB
  • 40-80 trials per class
  • Noisy data or limited artifact removal
  • Challenging recording conditions

Balanced / Default Settings

The defaults work well for most scenarios:
model = train_model(
    RxLDAModel,
    train_data;
    iterations = 50,
    dof_offset = 2,               # Balanced (default)
    mean_prior_precision = 0.01   # Balanced (default)
)
Use when:
  • Moderate SNR (2-5 dB)
  • 80-150 trials per class
  • Standard BCI recording conditions
  • Starting point for experimentation

Hyperparameter Search Example

Systematically search for optimal hyperparameters:
from nimbus_bci import NimbusLDA
from sklearn.model_selection import GridSearchCV, train_test_split

# Define parameter grid
param_grid = {
    'mu_scale': [1.0, 3.0, 5.0, 10.0],
    'sigma_scale': [0.1, 1.0, 10.0]
}

# Split data
X_train_sub, X_val, y_train_sub, y_val = train_test_split(
    X_train, y_train, test_size=0.2, stratify=y_train, random_state=42
)

print("Searching hyperparameters...")
clf = NimbusLDA()
grid_search = GridSearchCV(
    clf,
    param_grid,
    cv=5,
    scoring='accuracy',
    verbose=1
)

grid_search.fit(X_train_sub, y_train_sub)

print("\nBest hyperparameters:")
print(f"  mu_scale: {grid_search.best_params_['mu_scale']}")
print(f"  sigma_scale: {grid_search.best_params_['sigma_scale']}")
print(f"  CV accuracy: {grid_search.best_score_*100:.1f}%")

# Get best model
best_clf = grid_search.best_estimator_

# Retrain on all training data
best_clf.fit(X_train, y_train)

Quick Tuning Guidelines

Scenariodof_offsetmean_prior_precisionNotes
Excellent data quality10.001Minimal regularization
Good data quality2 (default)0.01 (default)Balanced approach
Moderate data quality2-30.01-0.03Slight regularization
Poor data quality3-40.05-0.1Strong regularization
Very limited trials40.1Maximum regularization
Pro Tip: Start with defaults (dof_offset=2, mean_prior_precision=0.01) and only tune if performance is unsatisfactory. The defaults are optimized for typical BCI scenarios.

Training Requirements

Data Requirements

  • Minimum: 40 trials per class (160 total for 4-class)
  • Recommended: 80+ trials per class (320+ total for 4-class)
  • For calibration: 10-20 trials per class sufficient
Bayesian LDA requires at least 2 trials to estimate class statistics and shared precision matrix. Single-trial training is not statistically valid for LDA and will raise an ArgumentError.Your training data must have shape (n_features, n_samples, n_trials) where n_trials >= 2.

Feature Normalization

Critical for cross-session BCI performance!Normalize your features before training for 15-30% accuracy improvement across sessions.
from sklearn.preprocessing import StandardScaler
import pickle

# Estimate normalization from training data
scaler = StandardScaler()
X_train_norm = scaler.fit_transform(X_train)

# Train with normalized features
clf = NimbusLDA()
clf.fit(X_train_norm, y_train)

# Save model and scaler together
with open("model_with_scaler.pkl", "wb") as f:
    pickle.dump({'model': clf, 'scaler': scaler}, f)

# Later: Apply same normalization to test data
X_test_norm = scaler.transform(X_test)
predictions = clf.predict(X_test_norm)
See the Feature Normalization guide for complete details.

Feature Requirements

Bayesian LDA expects preprocessed features, not raw EEG: Required preprocessing:
  • Bandpass filtering (8-30 Hz for motor imagery)
  • Artifact removal (ICA recommended)
  • Spatial filtering (CSP for motor imagery)
  • Feature extraction (log-variance for CSP features)
  • Temporal aggregation (handled automatically during training/inference)
NOT accepted:
  • Raw EEG channels
  • Unfiltered data
  • Non-extracted features
See Preprocessing Requirements for details.

Temporal Aggregation

Critical Preprocessing Step: Before training, the SDK automatically aggregates the temporal dimension of each trial into a single feature vector. This prevents treating temporally correlated samples as independent observations, which would violate the i.i.d. assumption of the model.
The aggregation method depends on your feature type and paradigm:
  • CSP features: Log-variance aggregation (default for motor imagery)
  • Power spectral features: Mean or median aggregation
  • Other features: Configurable via BCIMetadata.temporal_aggregation
This aggregation happens automatically during train_model() and predict_batch() calls.

Performance Characteristics

Computational Performance

OperationLatencyNotes
Training10-30 seconds50 iterations, 100 trials per class
Calibration5-15 seconds20 iterations, 20 trials per class
Batch Inference10-20ms per trial10 iterations
Streaming Chunk10-20ms10 iterations per chunk
All measurements on standard CPU (no GPU required).

Classification Accuracy

ParadigmClassesTypical AccuracyITR
Motor Imagery2 (L/R hand)75-90%15-25 bits/min
Motor Imagery4 (L/R/Feet/Tongue)70-85%20-35 bits/min
P3002 (Target/Non-target)80-95%25-40 bits/min
Accuracy is highly subject-dependent. Subject-specific calibration typically improves accuracy by 5-15%.

Model Inspection

View Model Parameters

import numpy as np

# Class means
print("Class means:")
for k, class_label in enumerate(clf.classes_):
    print(f"  Class {class_label}: {clf.model_['means'][k]}")

# Shared covariance matrix (first 3x3)
print("\nShared covariance matrix (first 3x3):")
print(clf.model_['covariance'][:3, :3])

# Model info
print("\nModel info:")
print(f"  Features: {clf.n_features_in_}")
print(f"  Classes: {clf.classes_}")

# Class priors (learned from data)
print("\nClass priors:")
for k, class_label in enumerate(clf.classes_):
    print(f"  Class {class_label}: {clf.model_['priors'][k]:.3f}")
Accessing model parameters: The SDK stores full posterior distributions (not just point estimates) for proper Bayesian inference. To get point estimates, use mean(posterior) to extract the mean of the posterior distribution. For precision matrices, use mean(precision_posterior) to get the expected precision matrix.

Compare Models

from nimbus_bci import NimbusLDA
import numpy as np

# Train multiple models with different hyperparameters and compare
models = []
for mu_scale in [1.0, 3.0, 5.0]:
    clf = NimbusLDA(mu_scale=mu_scale)
    clf.fit(X_train, y_train)
    accuracy = clf.score(X_test, y_test)
    
    print(f"mu_scale: {mu_scale}, Accuracy: {accuracy*100:.1f}%")
    models.append((mu_scale, clf, accuracy))

# Select best model
best_config, best_clf, best_acc = max(models, key=lambda x: x[2])
print(f"\nBest configuration: mu_scale={best_config} ({best_acc*100:.1f}%)")

Advantages & Limitations

Advantages

Fast Training: Shared covariance estimation is efficient
Fast Inference: Analytical posterior computation (<20ms)
Interpretable: Clear probabilistic formulation
Memory Efficient: Single shared precision matrix
Robust: Handles uncertainty naturally via Bayesian inference
Production-Ready: Battle-tested in real BCI applications

Limitations

Shared Covariance Assumption: May not fit well if classes have very different spreads
Linear Decision Boundary: Cannot capture non-linear class boundaries
Gaussian Assumption: Assumes normal class distributions
Not Ideal for Overlapping Classes: Use RxGMM for complex distributions

Comparison: Bayesian LDA vs Bayesian GMM

AspectBayesian LDA (RxLDA)Bayesian GMM (RxGMM)
Precision MatrixShared across all classesClass-specific
Mathematical ModelPooled Gaussian Classifier (PGC)Heteroscedastic Gaussian Classifier (HGC)
Training SpeedFasterSlower
Inference SpeedFaster (~10-15ms)Slightly slower (~15-20ms)
FlexibilityLess flexibleMore flexible
Best ForWell-separated classesOverlapping/complex distributions
MemoryLowerHigher
ParametersFewer (n_classes means + 1 precision)More (n_classes means + precisions)
Rule of thumb: Start with Bayesian LDA. Switch to Bayesian GMM if accuracy is unsatisfactory.

Next Steps

References

Implementation: Theory:
  • Fisher, R. A. (1936). “The use of multiple measurements in taxonomic problems”
  • Bishop, C. M. (2006). “Pattern Recognition and Machine Learning” (Chapter 4)
  • Pooled Gaussian Classifier (PGC) with shared covariance structure
BCI Applications:
  • Blankertz et al. (2008). “Optimizing spatial filters for robust EEG single-trial analysis”
  • Lotte et al. (2018). “A review of classification algorithms for EEG-based BCI”