Bayesian Multinomial (NimbusSoftmax / NimbusProbit)
Python : NimbusSoftmax | Julia : NimbusProbit
Model family : Non-Gaussian multinomial classification with uncertainty quantification
Bayesian Softmax is a Bayesian multinomial classification model with uncertainty quantification. It uses continuous transitions to map features to a latent space for probabilistic classification, making it ideal for complex multinomial tasks.
Availability
Python SDK : ✅ NimbusSoftmax (Polya-Gamma VI; optional dependency)
Julia SDK : ✅ NimbusProbit
Start Here
Quickstart Start with SDK setup and first inference workflow.
Model Selection Compare Nimbus models by data characteristics and use case.
Examples See practical BCI examples for training and inference.
Overview
Bayesian Softmax provides a powerful alternative to traditional Gaussian classifiers:
✅ Full Bayesian inference with posterior probability distributions
✅ Uncertainty quantification for each prediction
✅ Continuous transition mapping from features to latent space
✅ Flexible decision boundaries (non-Gaussian)
✅ Fast inference (~15-25ms per trial)
✅ Training and calibration support
✅ Batch and streaming inference modes
Quick Start
from nimbus_bci import NimbusSoftmax
import numpy as np
# Create and fit classifier
clf = NimbusSoftmax( w_scale = 1.0 )
clf.fit(X_train, y_train)
# Predict with uncertainty
predictions = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)
# Good for non-Gaussian decision boundaries
using NimbusSDK
# Train model
model = train_model (
NimbusProbit,
train_data;
iterations = 50
)
# Predict
results = predict_batch (model, test_data)
When to Use Bayesian Softmax
Bayesian Softmax is ideal for:
Complex multinomial classification tasks
When you need full posterior uncertainty over multinomial distributions
Advanced BCI applications requiring flexible classification
Problems where Gaussian class-conditional structure (NimbusLDA/NimbusQDA) is a poor fit
When NimbusLDA/NimbusQDA accuracy is insufficient for complex distributions
Consider Bayesian LDA or Bayesian QDA instead if:
You need faster inference (NimbusLDA/NimbusQDA are 5-10ms faster)
Classes are well-separated with Gaussian distributions
Interpretability of class centers is important (Bayesian Softmax is discriminative)
You want Mahalanobis distance-based outlier detection
Model Architecture
Mathematical Foundation (Bayesian Multinomial Logistic Regression)
Bayesian Softmax implements Bayesian Multinomial Logistic Regression using continuous transitions and MultinomialPolya likelihood:
Generative Model:
B ~ N(ξβ, Wβ⁻¹) # Regression coefficients prior
W ~ Wishart(ν, Λ) # Precision matrix prior
Ψᵢ ~ ContinuousTransition(xᵢ, B, W) # Latent transformation
yᵢ ~ MultinomialPolya(N, Ψᵢ) # Observation model
Where:
B ∈ ℝ^((K-1)×D) = regression coefficient matrix
W = precision matrix in (K-1)-dimensional latent space
xᵢ = feature vector for observation i
Ψᵢ = latent (K-1)-dimensional representation
N = number of trials per observation (typically 1 for classification)
Key Feature : The model works in (K-1) dimensions due to the sum-to-one constraint in multinomial distributions, with the K-th class serving as the reference category.
Hyperparameters
Bayesian Softmax supports configurable hyperparameters for optimal performance tuning:
Available Hyperparameters:
Parameter Type Default Description NInt 1 Number of trials per observation (typically 1 for classification) ξβVector ones((K-1)×D) Prior mean for regression coefficients B WβMatrix 1e-5 × I Prior precision for regression coefficients B W_dfFloat64 K + 5 Wishart degrees of freedom for precision matrix W W_scaleMatrix I Wishart scale matrix (K-1 × K-1)
Parameter Effects:
N : Number of trials per observation
Typically set to 1 for classification tasks
Higher values model count data (less common in BCI)
ξβ : Prior mean for regression coefficients
Default: ones((K-1)×D) provides mild regularization
Can be customized if you have prior knowledge of coefficient values
Wβ : Prior precision for regression coefficients
Lower values (1e-6) → Weaker prior, more data-driven
Higher values (1e-4) → Stronger prior, more regularization
Default (1e-5) provides balanced regularization
W_df : Wishart degrees of freedom
Controls strength of precision matrix prior
Higher values → Stronger regularization
Default (K + 5) provides reasonable regularization
W_scale : Wishart scale matrix
Shape of the prior precision distribution
Default (identity matrix) assumes no prior covariance structure
Hyperparameter configuration allows you to optimize model behavior for your specific dataset characteristics (SNR, trial count, data quality).
Model Structure
struct NimbusProbit <: BCIModel
B_posterior # Learned regression coefficients posterior
W_posterior # Learned precision matrix posterior
metadata :: ModelMetadata # Model info
N :: Int # Trials per observation
ξβ :: Vector{Float64} # Prior mean for B
Wβ :: Matrix{Float64} # Prior precision for B
W_df :: Float64 # Wishart DOF
W_scale :: Matrix{Float64} # Wishart scale matrix
end
RxInfer Implementation
The NimbusSoftmax model uses RxInfer.jl for variational Bayesian inference:
Learning Phase:
@model function RxPolya_learning_model (obs, N, X, n_classes, n_features,
ξβ, Wβ, W_df, W_scale)
# Prior on regression coefficients
B ~ MvNormalWeightedMeanPrecision (ξβ, Wβ)
# Prior on precision matrix
W ~ Wishart (W_df, W_scale)
# Likelihood with continuous transition
for i in eachindex (obs)
Ψ[i] ~ ContinuousTransition (X[i], B, W)
obs[i] ~ MultinomialPolya (N, Ψ[i])
end
end
Prediction Phase:
@model function RxPolya_predictive (obs, N, X, B_posterior, W_posterior,
n_classes, n_features)
B ~ B_posterior
W ~ W_posterior
# Predict latent representations
for i in eachindex (X)
Ψ[i] ~ ContinuousTransition (X[i], B, W)
obs[i] ~ MultinomialPolya (N, Ψ[i])
end
end
Usage
1. Load Pre-trained Model
from nimbus_bci import NimbusSoftmax
import pickle
# Python SDK: Train locally
clf = NimbusSoftmax( reg_strength = 1.0 )
clf.fit(X_train, y_train)
# Save and load
with open ( "my_motor_imagery_softmax.pkl" , "wb" ) as f:
pickle.dump(clf, f)
with open ( "my_motor_imagery_softmax.pkl" , "rb" ) as f:
clf = pickle.load(f)
print ( f "Model info: { clf.classes_ } classes, { clf.n_features_in_ } features" )
using NimbusSDK
# Authenticate
NimbusSDK . install_core ( "nbci_live_your_key" )
# Load from Nimbus model zoo
model = load_model (NimbusProbit, "motor_imagery_polya_v1" )
println ( "Model loaded:" )
println ( " Features: $( get_n_features (model)) " )
println ( " Classes: $( get_n_classes (model)) " )
println ( " Paradigm: $( get_paradigm (model)) " )
2. Train Custom Model
from nimbus_bci import NimbusSoftmax
from nimbus_bci.compat import extract_csp_features
import mne
import pickle
# Load and preprocess EEG data
raw = mne.io.read_raw_gdf( "motor_imagery.gdf" , preload = True )
raw.filter( 8 , 30 )
events = mne.find_events(raw)
event_id = { 'left' : 1 , 'right' : 2 , 'feet' : 3 , 'tongue' : 4 }
epochs = mne.Epochs(raw, events, event_id, tmin = 0 , tmax = 4 , preload = True )
# Extract CSP features
csp_features, csp = extract_csp_features(epochs, n_components = 8 )
labels = epochs.events[:, 2 ]
# Train NimbusSoftmax with default hyperparameters
clf = NimbusSoftmax()
clf.fit(csp_features, labels)
# Or train with custom hyperparameters
clf_tuned = NimbusSoftmax(
reg_strength = 1.0 # L2 regularization strength
)
clf_tuned.fit(csp_features, labels)
# Save
with open ( "my_motor_imagery_softmax.pkl" , "wb" ) as f:
pickle.dump(clf_tuned, f)
print ( f "Training accuracy: { clf_tuned.score(csp_features, labels) :.1%} " )
using NimbusSDK
# Prepare training data with labels
train_features = csp_features # (16 × 250 × 100)
train_labels = [ 1 , 2 , 3 , 4 , 1 , 2 , ... ] # 100 labels
train_data = BCIData (
train_features,
BCIMetadata (
sampling_rate = 250.0 ,
paradigm = :motor_imagery ,
feature_type = :csp ,
n_features = 16 ,
n_classes = 4 ,
chunk_size = nothing
),
train_labels # Required for training!
)
# Train NimbusSoftmax model with default hyperparameters
model = train_model (
NimbusProbit,
train_data;
iterations = 50 , # Inference iterations
showprogress = true , # Show progress bar
name = "my_motor_imagery_polya" ,
description = "4-class MI classifier with NimbusSoftmax"
)
# Or train with custom hyperparameters
model = train_model (
NimbusProbit,
train_data;
iterations = 50 ,
showprogress = true ,
name = "my_motor_imagery_polya_tuned" ,
description = "4-class MI with tuned hyperparameters" ,
N = 1 , # Trials per observation (default: 1)
ξβ = nothing , # Auto-configured if not provided
Wβ = nothing , # Auto-configured if not provided
W_df = nothing , # Auto-configured if not provided
W_scale = nothing # Auto-configured if not provided
)
# Save for later use
save_model (model, "my_model.jld2" )
Training Parameters:
iterations: Number of variational inference iterations (default: 50)
More iterations = better convergence but slower training
50-100 is typically sufficient
showprogress: Display progress bar during training
name: Model identifier
description: Model description for documentation
N: Trials per observation (default: 1, range: [1, ∞))
ξβ: Prior mean for B (default: ones((K-1)×D))
Wβ: Prior precision for B (default: 1e-5 × I)
W_df: Wishart degrees of freedom (default: K + 5)
W_scale: Wishart scale matrix (default: I)
3. Subject-Specific Calibration
Fine-tune a pre-trained model with subject-specific data (much faster than training from scratch):
from nimbus_bci import NimbusSoftmax
import pickle
# Load baseline model
with open ( "motor_imagery_softmax_baseline.pkl" , "rb" ) as f:
base_clf = pickle.load(f)
# Collect 10-20 calibration trials from new subject
X_calib, y_calib = collect_calibration_trials()
# Personalize using online learning
personalized_clf = NimbusSoftmax()
personalized_clf.fit(X_baseline, y_baseline)
for _ in range ( 10 ):
personalized_clf.partial_fit(X_calib, y_calib)
# Save
with open ( "subject_001_calibrated.pkl" , "wb" ) as f:
pickle.dump(personalized_clf, f)
# Load base model
base_model = load_model (NimbusProbit, "motor_imagery_polya_baseline_v1" )
# Collect 10-20 calibration trials from new subject
calib_features = collect_calibration_trials () # Your function
calib_labels = [ 1 , 2 , 3 , 4 , 1 , 2 , ... ]
calib_data = BCIData (calib_features, metadata, calib_labels)
# Calibrate (personalize) the model
personalized_model = calibrate_model (
base_model,
calib_data;
iterations = 20 # Fewer iterations needed
)
save_model (personalized_model, "subject_001_calibrated.jld2" )
Calibration Benefits:
Requires only 10-20 trials per class (vs 50-100 for training from scratch)
Faster: 20 iterations vs 50-100
Better generalization: Uses pre-trained model as prior
Typical accuracy improvement: 5-15% over generic model
Hyperparameters preserved : calibrate_model() automatically uses the same hyperparameters as the base model
4. Batch Inference
Process multiple trials efficiently:
import numpy as np
# Run batch inference
predictions = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)
confidences = np.max(probabilities, axis = 1 )
# Analyze results
print ( f "Predictions: { predictions } " )
print ( f "Mean confidence: { np.mean(confidences) :.3f} " )
# Calculate accuracy
accuracy = np.mean(predictions == y_test)
print ( f "Accuracy: { accuracy * 100 :.1f} %" )
# Calculate ITR
from nimbus_bci.metrics import calculate_itr
itr = calculate_itr(accuracy, n_classes = 4 , trial_time = 4.0 )
print ( f "ITR: { itr :.1f} bits/minute" )
# Prepare test data
test_data = BCIData (test_features, metadata, test_labels)
# Run batch inference
results = predict_batch (model, test_data; iterations = 10 )
# Analyze results
println ( "Predictions: " , results . predictions)
println ( "Mean confidence: " , mean (results . confidences))
# Calculate accuracy
accuracy = sum (results . predictions .== test_labels) / length (test_labels)
println ( "Accuracy: $( round (accuracy * 100 , digits = 1 )) %" )
# Calculate ITR
itr = calculate_ITR (accuracy, 4 , 4.0 ) # 4 classes, 4-second trials
println ( "ITR: $( round (itr, digits = 1 )) bits/minute" )
5. Streaming Inference
Real-time chunk-by-chunk processing:
from nimbus_bci import StreamingSession
# Initialize streaming session
session = StreamingSession(clf.model_, metadata_with_chunk_size)
# Process chunks as they arrive
for chunk in eeg_feature_stream:
result = session.process_chunk(chunk)
print ( f "Chunk: pred= { result.prediction } , conf= { result.confidence :.3f} " )
# Finalize trial with aggregation
final_result = session.finalize_trial()
print ( f "Final: pred= { final_result.prediction } , conf= { final_result.confidence :.3f} " )
session.reset()
# Initialize streaming session
session = init_streaming (model, metadata_with_chunk_size)
# Process chunks as they arrive
for chunk in eeg_feature_stream
result = process_chunk (session, chunk; iterations = 10 )
println ( "Chunk: pred= $(result . prediction) , conf= $( round (result . confidence, digits = 3 )) " )
end
# Finalize trial with aggregation
final_result = finalize_trial (session; method = :weighted_vote )
println ( "Final: pred= $(final_result . prediction) , conf= $( round (final_result . confidence, digits = 3 )) " )
Hyperparameter Tuning
Fine-tune Bayesian Softmax for optimal performance on your specific dataset.
When to Tune Hyperparameters
Consider tuning when:
Default performance is unsatisfactory
You have specific data characteristics (very noisy or very clean)
You have limited or extensive training data
You want to optimize for your specific paradigm
Tuning Strategies
For High SNR / Clean Data / Many Trials
Use weaker priors to let the data drive the model:
from nimbus_bci import NimbusSoftmax
# Weaker regularization for clean data
clf = NimbusSoftmax( reg_strength = 0.1 ) # Lower regularization
clf.fit(X_train, y_train)
model = train_model (
NimbusProbit,
train_data;
iterations = 50 ,
N = 1 ,
Wβ = 1e-6 * diageye ((n_classes - 1 ) * n_features), # Weaker prior
W_df = Float64 (n_classes + 2 ) # Less regularization
)
Use when:
SNR > 5 dB
100+ trials per class
Clean, artifact-free data
Well-controlled experimental conditions
For Low SNR / Noisy Data / Few Trials
Use stronger priors for stability:
from nimbus_bci import NimbusSoftmax
# Stronger regularization for noisy data
clf = NimbusSoftmax( reg_strength = 10.0 ) # Higher regularization
clf.fit(X_train, y_train)
model = train_model (
NimbusProbit,
train_data;
iterations = 50 ,
N = 1 ,
Wβ = 1e-4 * diageye ((n_classes - 1 ) * n_features), # Stronger prior
W_df = Float64 (n_classes + 10 ) # More regularization
)
Use when:
SNR < 2 dB
40-80 trials per class
Noisy data or limited artifact removal
Challenging recording conditions
Balanced / Default Settings
The defaults work well for most scenarios:
from nimbus_bci import NimbusSoftmax
# Default regularization (balanced)
clf = NimbusSoftmax( reg_strength = 1.0 ) # Default
clf.fit(X_train, y_train)
model = train_model (
NimbusProbit,
train_data;
iterations = 50 ,
N = 1 , # Standard (default)
Wβ = 1e-5 * diageye (B_dim), # Balanced (default)
W_df = Float64 (n_classes + 5 ) # Balanced (default)
)
Use when:
Moderate SNR (2-5 dB)
80-150 trials per class
Standard BCI recording conditions
Starting point for experimentation
Hyperparameter Search Example
Systematically search for optimal hyperparameters:
from sklearn.model_selection import GridSearchCV, train_test_split
from nimbus_bci import NimbusSoftmax
# Split data
X_train, X_val, y_train, y_val = train_test_split(
X, y, test_size = 0.2 , random_state = 42
)
# Define search grid
param_grid = {
'reg_strength' : [ 0.1 , 1.0 , 10.0 ]
}
# GridSearchCV
clf = NimbusSoftmax()
grid_search = GridSearchCV(
clf, param_grid, cv = 5 , scoring = 'accuracy' , verbose = 2
)
grid_search.fit(X_train, y_train)
# Best parameters
print ( " \n Best hyperparameters:" )
print ( f " reg_strength: { grid_search.best_params_[ 'reg_strength' ] } " )
print ( f " Validation accuracy: { grid_search.best_score_ * 100 :.1f} %" )
# Use best model
best_clf = grid_search.best_estimator_
using NimbusSDK
# Define search grid
Wβ_scales = [ 1e-6 , 1e-5 , 1e-4 ]
W_df_offsets = [ 2 , 5 , 10 ]
# Split data into train/validation
train_data, val_data = split_data (all_data, ratio = 0.8 )
best_accuracy = 0.0
best_params = nothing
println ( "Searching hyperparameters..." )
for Wβ_scale in Wβ_scales
for df_offset in W_df_offsets
B_dim = (n_classes - 1 ) * n_features
# Train model with these hyperparameters
model = train_model (
NimbusProbit,
train_data;
iterations = 50 ,
N = 1 ,
Wβ = Wβ_scale * diageye (B_dim),
W_df = Float64 (n_classes + df_offset),
showprogress = false
)
# Validate
results = predict_batch (model, val_data)
accuracy = sum (results . predictions .== val_data . labels) / length (val_data . labels)
println ( " Wβ_scale= $Wβ_scale , df_offset= $df_offset : $( round (accuracy * 100 , digits = 1 )) %" )
# Track best
if accuracy > best_accuracy
best_accuracy = accuracy
best_params = (Wβ_scale = Wβ_scale, df_offset = df_offset)
end
end
end
println ( " \n Best hyperparameters:" )
println ( " Wβ_scale: $(best_params . Wβ_scale) " )
println ( " W_df offset: $(best_params . df_offset) " )
println ( " Validation accuracy: $( round (best_accuracy * 100 , digits = 1 )) %" )
# Retrain on all data with best hyperparameters
final_model = train_model (
NimbusProbit,
all_data;
iterations = 50 ,
N = 1 ,
Wβ = best_params . Wβ_scale * diageye ((n_classes - 1 ) * n_features),
W_df = Float64 (n_classes + best_params . df_offset)
)
Quick Tuning Guidelines
Scenario Wβ scaleW_df offsetNotes Excellent data quality 1e-6 2 Minimal regularization Good data quality 1e-5 (default) 5 (default) Balanced approach Moderate data quality 1e-5 to 1e-4 5-8 Slight regularization Poor data quality 1e-4 10 Strong regularization Very limited trials 1e-4 15 Maximum regularization
Pro Tip : Start with defaults (Wβ = 1e-5 × I, W_df = K + 5) and only tune if performance is unsatisfactory. The defaults are optimized for typical BCI scenarios.
Training Requirements
Data Requirements
Minimum : 40 trials per class (160 total for 4-class)
Recommended : 80+ trials per class (320+ total for 4-class)
For calibration : 10-20 trials per class sufficient
NimbusSoftmax requires at least 2 observations to estimate regression coefficients and the precision matrix. Single-trial training will raise an error.
Feature Normalization
Critical for cross-session BCI performance! Normalize your features before training for 15-30% accuracy improvement across sessions.
from sklearn.preprocessing import StandardScaler
import pickle
# Estimate normalization from training data
scaler = StandardScaler()
X_train_norm = scaler.fit_transform(X_train)
# Train with normalized features
clf = NimbusSoftmax()
clf.fit(X_train_norm, y_train)
# Save model and scaler together
with open ( "model_with_scaler.pkl" , "wb" ) as f:
pickle.dump({ 'model' : clf, 'scaler' : scaler}, f)
# Later: Apply same normalization to test data
X_test_norm = scaler.transform(X_test)
predictions = clf.predict(X_test_norm)
# Estimate normalization from training data
norm_params = estimate_normalization_params (train_features; method = :zscore )
train_norm = apply_normalization (train_features, norm_params)
# Train with normalized features
train_data = BCIData (train_norm, metadata, labels)
model = train_model (NimbusProbit, train_data)
# Save params with model
@save "model.jld2" model norm_params
# Later: Apply same params to test data
test_norm = apply_normalization (test_features, norm_params)
See the Feature Normalization guide for complete details.
Feature Requirements
Bayesian Softmax expects preprocessed features , not raw EEG:
✅ Required preprocessing:
Bandpass filtering (8-30 Hz for motor imagery)
Artifact removal (ICA recommended)
Spatial filtering (CSP for motor imagery)
Feature extraction (log-variance for CSP features)
❌ NOT accepted:
Raw EEG channels
Unfiltered data
Non-extracted features
See Preprocessing Requirements for details.
Operation Latency Notes Training 15-40 seconds 50 iterations, 100 trials per class Calibration 8-20 seconds 20 iterations, 20 trials per class Batch Inference 15-25ms per trial 10 iterations Streaming Chunk 15-25ms 10 iterations per chunk
Slightly slower than NimbusLDA/NimbusQDA due to continuous transition complexity.
Classification Accuracy
Paradigm Classes Typical Accuracy When to Use NimbusSoftmax Motor Imagery 2-4 70-85% When NimbusLDA/NimbusQDA insufficient P300 2 (Target/Non-target) 85-95% Complex distributions SSVEP 2-6 85-98% Advanced applications
Bayesian Softmax typically provides comparable or slightly better accuracy than NimbusLDA/NimbusQDA for complex distributions, at the cost of ~5-10ms additional latency.
Model Inspection
View Model Parameters
import numpy as np
# Model coefficients
print ( "Regression coefficients (weights):" )
print ( f " Shape: { clf.model_[ 'coef' ].shape } " )
print ( f " Coefficients: \n { clf.model_[ 'coef' ] } " )
print ( " \n Intercepts:" )
print ( f " Intercepts: { clf.model_[ 'intercept' ] } " )
# Model info
print ( " \n Model info:" )
print ( f " Features: { clf.n_features_in_ } " )
print ( f " Classes: { clf.classes_ } " )
print ( f " Regularization: { clf.reg_strength } " )
# Class probabilities (from training)
print ( " \n Class priors (estimated from training):" )
for k, class_label in enumerate (clf.classes_):
prior = np.sum(y_train == class_label) / len (y_train)
print ( f " Class { class_label } : { prior :.3f} " )
# Learned posteriors
println ( "B posterior (regression coefficients):" )
println ( " Type: " , typeof (model . B_posterior))
println ( " Mean: " , mean (model . B_posterior))
println ( " \n W posterior (precision matrix):" )
println ( " Type: " , typeof (model . W_posterior))
println ( " Mean: " , mean (model . W_posterior))
# Hyperparameters
println ( " \n Hyperparameters:" )
println ( " N: " , model . N)
println ( " ξβ dimensions: " , size (model . ξβ))
println ( " Wβ dimensions: " , size (model . Wβ))
println ( " W_df: " , model . W_df)
println ( " W_scale dimensions: " , size (model . W_scale))
# Model metadata
println ( " \n Metadata:" )
println ( " Name: " , model . metadata . name)
println ( " Paradigm: " , model . metadata . paradigm)
println ( " Features: " , model . metadata . n_features)
println ( " Classes: " , model . metadata . n_classes)
Compare Models
from sklearn.model_selection import cross_val_score
from nimbus_bci import NimbusSoftmax
# Compare different regularization strengths
results = []
for reg in [ 0.1 , 1.0 , 10.0 ]:
clf = NimbusSoftmax( reg_strength = reg)
scores = cross_val_score(clf, X_train, y_train, cv = 5 )
accuracy = np.mean(scores)
print ( f "Regularization: { reg } → Accuracy: { accuracy * 100 :.1f} %" )
results.append((reg, clf, accuracy))
# Train multiple models and compare
models = []
for n_iter in [ 20 , 50 , 100 ]
model = train_model (NimbusProbit, train_data; iterations = n_iter)
results = predict_batch (model, test_data)
accuracy = sum (results . predictions .== test_labels) / length (test_labels)
println ( "Iterations: $n_iter , Accuracy: $( round (accuracy * 100 , digits = 1 )) %" )
push! (models, (n_iter, model, accuracy))
end
Advantages & Limitations
Advantages
✅ Flexible Multinomial Classification : Natural handling of multinomial distributions
✅ Continuous Transition Mapping : Sophisticated feature-to-latent-space transformation
✅ Full Bayesian Uncertainty : Complete posterior distributions over predictions
✅ No Gaussian Assumption : More flexible than Gaussian classifiers (NimbusLDA/NimbusQDA)
✅ Production-Ready : Battle-tested in real BCI applications
✅ Calibration Support : Fast subject-specific adaptation
Limitations
❌ More Complex : More parameters than NimbusLDA, requires careful hyperparameter tuning
❌ Slower Inference : ~15-25ms vs ~10-15ms for NimbusLDA
❌ (K-1) Dimensional Space : Works in reduced dimension (not always intuitive)
❌ No Mahalanobis Distance : Discriminative model lacks explicit class centers for outlier detection
❌ Requires Multiple Trials : Cannot train on single trial (minimum 2 observations)
❌ Less Interpretable : Harder to interpret than generative models with explicit means
Comparison: Bayesian Softmax vs NimbusLDA vs NimbusQDA vs NimbusSTS
Aspect Bayesian Softmax NimbusLDA NimbusQDA NimbusSTS Mathematical Model Bayesian Multinomial Logistic Regression Pooled Gaussian Classifier (PGC) Heteroscedastic Gaussian Classifier (HGC) State-Space Model (EKF) Representation (K-1) dimensional latent space K class means with shared precision K class means with class-specific precisions Features + latent state dynamics Temporal Modeling Static (stateless) Static (stateless) Static (stateless) Dynamic (stateful) Training Speed Moderate Fastest Moderate Moderate Inference Speed 15-25ms 10-15ms 15-20ms 20-30ms Flexibility Highest Lowest High Adaptive Best For Complex multinomial tasks Well-separated classes Overlapping classes with different covariances Non-stationary, long sessions Interpretability Low (discriminative) High (generative with means) High (generative with means) Moderate (state evolution) Mahalanobis Distance ❌ No (no explicit means) ✅ Yes ✅ Yes ❌ N/A Entropy Metrics ✅ Yes ✅ Yes ✅ Yes ✅ Yes Free Energy ✅ Yes (training only) ✅ Yes ✅ Yes ✅ Yes
Decision Tree
Is your data non-stationary (temporal drift)?
├─ Yes → Use NimbusSTS (handles temporal dynamics)
└─ No → Do you need Mahalanobis-based outlier detection?
├─ Yes → Use NimbusLDA or NimbusQDA
└─ No → Are classes well-separated with Gaussian structure?
├─ Yes → Use NimbusLDA (fastest)
└─ No → Do classes have different covariances?
├─ Yes → Use NimbusQDA
└─ No → Are distributions complex/non-Gaussian?
├─ Yes → Use Bayesian Softmax
└─ No → Use NimbusLDA (simplest)
Practical Examples
Motor Imagery Classification
from nimbus_bci import NimbusSoftmax
from sklearn.metrics import classification_report
# Train on motor imagery data (4-class MI)
clf = NimbusSoftmax( reg_strength = 1.0 )
clf.fit(X_train, y_train) # CSP features, 4 classes
# Test
predictions = clf.predict(X_test)
# Calculate per-class accuracy
print (classification_report(y_test, predictions))
# Per-class breakdown
for class_id in clf.classes_:
class_mask = y_test == class_id
class_acc = np.mean(predictions[class_mask] == class_id)
print ( f "Class { class_id } accuracy: { class_acc * 100 :.1f} %" )
# Bayesian Softmax for complex motor imagery with overlapping classes
# Train on motor imagery data
mi_data = BCIData (csp_features, mi_metadata, labels) # 4-class MI
model = train_model (NimbusProbit, mi_data; iterations = 50 )
# Test
results = predict_batch (model, test_data)
# Calculate per-class accuracy
for class_id in 1 : 4
class_mask = test_labels .== class_id
class_accuracy = sum (results . predictions[class_mask] .== class_id) / sum (class_mask)
println ( "Class $class_id accuracy: $( round (class_accuracy * 100 , digits = 1 )) %" )
end
P300 Detection
from nimbus_bci import NimbusSoftmax
from sklearn.metrics import confusion_matrix
# Train on P300 data (binary: 0=non-target, 1=target)
clf = NimbusSoftmax( reg_strength = 1.0 )
clf.fit(X_train, y_train) # ERP features
# Test
predictions = clf.predict(X_test)
# Calculate sensitivity/specificity
cm = confusion_matrix(y_test, predictions)
tn, fp, fn, tp = cm.ravel()
sensitivity = tp / (tp + fn)
specificity = tn / (tn + fp)
print ( f "Sensitivity: { sensitivity * 100 :.1f} %" )
print ( f "Specificity: { specificity * 100 :.1f} %" )
# Bayesian Softmax for P300 target/non-target classification
# Train on P300 data
p300_data = BCIData (erp_features, p300_metadata, labels) # 1=target, 2=non-target
model = train_model (NimbusProbit, p300_data; iterations = 50 )
# Test
results = predict_batch (model, test_data)
# Calculate sensitivity/specificity
targets = test_labels .== 1
target_preds = results . predictions .== 1
sensitivity = sum (target_preds .& targets) / sum (targets)
specificity = sum (. ! target_preds .& . ! targets) / sum (. ! targets)
println ( "Sensitivity: $( round (sensitivity * 100 , digits = 1 )) %" )
println ( "Specificity: $( round (specificity * 100 , digits = 1 )) %" )
Comparing with NimbusLDA/NimbusQDA
from nimbus_bci import NimbusLDA, NimbusQDA, NimbusSoftmax, NimbusSTS
# Train all models
lda_clf = NimbusLDA()
lda_clf.fit(X_train, y_train)
gmm_clf = NimbusQDA()
gmm_clf.fit(X_train, y_train)
softmax_clf = NimbusSoftmax()
softmax_clf.fit(X_train, y_train)
sts_clf = NimbusSTS()
sts_clf.fit(X_train, y_train)
# Evaluate all models
models = [
( "NimbusLDA" , lda_clf),
( "NimbusQDA" , gmm_clf),
( "NimbusSoftmax" , softmax_clf),
( "NimbusSTS" , sts_clf)
]
for name, clf in models:
predictions = clf.predict(X_test)
probas = clf.predict_proba(X_test)
accuracy = np.mean(predictions == y_test)
mean_conf = np.mean(np.max(probas, axis = 1 ))
mean_entropy = - np.mean(np.sum(probas * np.log(probas + 1e-10 ), axis = 1 ))
print ( f " { name } :" )
print ( f " Accuracy: { accuracy * 100 :.1f} %" )
print ( f " Mean confidence: { mean_conf :.3f} " )
print ( f " Mean entropy: { mean_entropy :.3f} bits" )
print ()
# Compare Bayesian Softmax with other models
# Note: NimbusSTS (Bayesian STS) is currently Python-only
# Train all three Julia models
rxlda_model = train_model (NimbusLDA, train_data)
rxgmm_model = train_model (NimbusQDA, train_data)
rxpolya_model = train_model (NimbusProbit, train_data)
# Evaluate all three
models = [
( "NimbusLDA" , rxlda_model),
( "NimbusQDA" , rxgmm_model),
( "Bayesian Softmax" , rxpolya_model)
]
for (name, model) in models
results = predict_batch (model, test_data)
accuracy = sum (results . predictions .== test_labels) / length (test_labels)
mean_conf = mean (results . confidences)
mean_entropy = mean (results . entropy)
println ( " $name :" )
println ( " Accuracy: $( round (accuracy * 100 , digits = 1 )) %" )
println ( " Mean confidence: $( round (mean_conf, digits = 3 )) " )
println ( " Mean entropy: $( round (mean_entropy, digits = 3 )) bits" )
println ( " Latency: $(results . latency_ms) ms" )
println ()
end
Next Read
Bayesian LDA (NimbusLDA) Faster model with shared covariance
Bayesian QDA (NimbusQDA) Flexible model with class-specific covariances
Bayesian STS (NimbusSTS) Adaptive model for non-stationary data
Training Guide Complete training tutorial
Code Examples Working examples
Preprocessing Requirements Feature extraction guide
References
Implementation:
Theory:
Polson, N. G., Scott, J. G., & Windle, J. (2013). “Bayesian inference for logistic models using Pólya–Gamma latent variables”
Windle, J., Polson, N. G., & Scott, J. G. (2014). “Sampling Pólya–Gamma random variates: alternative and approximate techniques”
Bayesian multinomial logistic regression (softmax) with Pólya–Gamma augmentation and variational inference
BCI Applications:
Blankertz et al. (2008). “Optimizing spatial filters for robust EEG single-trial analysis”
Lotte et al. (2018). “A review of classification algorithms for EEG-based BCI”