Bayesian MPR (RxPolya)
Primary Name: Bayesian MPR (Bayesian Multinomial Probit Regression)API Name:
RxPolyaModelMathematical Model: Bayesian Multinomial Probit Regression Bayesian MPR is a Bayesian multinomial classification model with uncertainty quantification, implemented using RxInfer.jl’s reactive message passing framework. It uses continuous transitions to map features to a (K-1)-dimensional latent space and MultinomialPolya likelihood for probabilistic classification, making it ideal for complex multinomial tasks.
Bayesian MPR is currently implemented in NimbusSDK.jl (API name:
RxPolyaModel) and ready for use in production BCI applications. This model provides full Bayesian uncertainty quantification over multinomial distributions, offering advanced capabilities beyond traditional Gaussian classifiers.Overview
Bayesian MPR implements Bayesian Multinomial Probit Regression, providing a powerful alternative to traditional Gaussian classifiers:- ✅ Full Bayesian inference with posterior probability distributions
- ✅ Uncertainty quantification for each prediction
- ✅ Continuous transition mapping from features to latent space
- ✅ MultinomialPolya likelihood for natural multinomial classification
- ✅ (K-1) dimensional latent representation with sum-to-one constraint
- ✅ Fast inference (~15-25ms per trial)
- ✅ Training and calibration support
- ✅ Batch and streaming inference modes
When to Use Bayesian MPR
Bayesian MPR is ideal for:- Complex multinomial classification tasks
- When you need full posterior uncertainty over multinomial distributions
- Advanced BCI applications requiring flexible classification
- Tasks where Bayesian multinomial probit regression is theoretically appropriate
- When RxLDA/RxGMM accuracy is insufficient for complex distributions
- You need faster inference (RxLDA/RxGMM are 5-10ms faster)
- Classes are well-separated with Gaussian distributions
- Interpretability of class centers is important (Bayesian MPR is discriminative)
- You want Mahalanobis distance-based outlier detection
Model Architecture
Mathematical Foundation (Bayesian Multinomial Probit Regression)
Bayesian MPR implements Bayesian Multinomial Probit Regression using continuous transitions and MultinomialPolya likelihood: Generative Model:B ∈ ℝ^((K-1)×D)= regression coefficient matrixW= precision matrix in (K-1)-dimensional latent spacexᵢ= feature vector for observation iΨᵢ= latent (K-1)-dimensional representationN= number of trials per observation (typically 1 for classification)
Hyperparameters
Bayesian MPR supports configurable hyperparameters for optimal performance tuning: Available Hyperparameters:| Parameter | Type | Default | Description |
|---|---|---|---|
N | Int | 1 | Number of trials per observation (typically 1 for classification) |
ξβ | Vector | ones((K-1)×D) | Prior mean for regression coefficients B |
Wβ | Matrix | 1e-5 × I | Prior precision for regression coefficients B |
W_df | Float64 | K + 5 | Wishart degrees of freedom for precision matrix W |
W_scale | Matrix | I | Wishart scale matrix (K-1 × K-1) |
-
N: Number of trials per observation
- Typically set to 1 for classification tasks
- Higher values model count data (less common in BCI)
-
ξβ: Prior mean for regression coefficients
- Default: ones((K-1)×D) provides mild regularization
- Can be customized if you have prior knowledge of coefficient values
-
Wβ: Prior precision for regression coefficients
- Lower values (1e-6) → Weaker prior, more data-driven
- Higher values (1e-4) → Stronger prior, more regularization
- Default (1e-5) provides balanced regularization
-
W_df: Wishart degrees of freedom
- Controls strength of precision matrix prior
- Higher values → Stronger regularization
- Default (K + 5) provides reasonable regularization
-
W_scale: Wishart scale matrix
- Shape of the prior precision distribution
- Default (identity matrix) assumes no prior covariance structure
Hyperparameter configuration allows you to optimize model behavior for your specific dataset characteristics (SNR, trial count, data quality).
Model Structure
RxInfer Implementation
The RxPolya model uses RxInfer.jl for variational Bayesian inference: Learning Phase:Usage
1. Load Pre-trained Model
2. Train Custom Model
iterations: Number of variational inference iterations (default: 50)- More iterations = better convergence but slower training
- 50-100 is typically sufficient
showprogress: Display progress bar during trainingname: Model identifierdescription: Model description for documentationN: Trials per observation (default: 1, range: [1, ∞))ξβ: Prior mean for B (default: ones((K-1)×D))Wβ: Prior precision for B (default: 1e-5 × I)W_df: Wishart degrees of freedom (default: K + 5)W_scale: Wishart scale matrix (default: I)
3. Subject-Specific Calibration
Fine-tune a pre-trained model with subject-specific data (much faster than training from scratch):- Requires only 10-20 trials per class (vs 50-100 for training from scratch)
- Faster: 20 iterations vs 50-100
- Better generalization: Uses pre-trained model as prior
- Typical accuracy improvement: 5-15% over generic model
- Hyperparameters preserved:
calibrate_model()automatically uses the same hyperparameters as the base model
4. Batch Inference
Process multiple trials efficiently:5. Streaming Inference
Real-time chunk-by-chunk processing:Hyperparameter Tuning
Fine-tune Bayesian MPR for optimal performance on your specific dataset.When to Tune Hyperparameters
Consider tuning when:- Default performance is unsatisfactory
- You have specific data characteristics (very noisy or very clean)
- You have limited or extensive training data
- You want to optimize for your specific paradigm
Tuning Strategies
For High SNR / Clean Data / Many Trials
Use weaker priors to let the data drive the model:- SNR > 5 dB
- 100+ trials per class
- Clean, artifact-free data
- Well-controlled experimental conditions
For Low SNR / Noisy Data / Few Trials
Use stronger priors for stability:- SNR < 2 dB
- 40-80 trials per class
- Noisy data or limited artifact removal
- Challenging recording conditions
Balanced / Default Settings
The defaults work well for most scenarios:- Moderate SNR (2-5 dB)
- 80-150 trials per class
- Standard BCI recording conditions
- Starting point for experimentation
Hyperparameter Search Example
Systematically search for optimal hyperparameters:Quick Tuning Guidelines
| Scenario | Wβ scale | W_df offset | Notes |
|---|---|---|---|
| Excellent data quality | 1e-6 | 2 | Minimal regularization |
| Good data quality | 1e-5 (default) | 5 (default) | Balanced approach |
| Moderate data quality | 1e-5 to 1e-4 | 5-8 | Slight regularization |
| Poor data quality | 1e-4 | 10 | Strong regularization |
| Very limited trials | 1e-4 | 15 | Maximum regularization |
Pro Tip: Start with defaults (Wβ = 1e-5 × I, W_df = K + 5) and only tune if performance is unsatisfactory. The defaults are optimized for typical BCI scenarios.
Training Requirements
Data Requirements
- Minimum: 40 trials per class (160 total for 4-class)
- Recommended: 80+ trials per class (320+ total for 4-class)
- For calibration: 10-20 trials per class sufficient
RxPolya requires at least 2 observations to estimate regression coefficients and the precision matrix. Single-trial training will raise an error.
Feature Requirements
Bayesian MPR expects preprocessed features, not raw EEG: ✅ Required preprocessing:- Bandpass filtering (8-30 Hz for motor imagery)
- Artifact removal (ICA recommended)
- Spatial filtering (CSP for motor imagery)
- Feature extraction (log-variance for CSP features)
- Raw EEG channels
- Unfiltered data
- Non-extracted features
Performance Characteristics
Computational Performance
| Operation | Latency | Notes |
|---|---|---|
| Training | 15-40 seconds | 50 iterations, 100 trials per class |
| Calibration | 8-20 seconds | 20 iterations, 20 trials per class |
| Batch Inference | 15-25ms per trial | 10 iterations |
| Streaming Chunk | 15-25ms | 10 iterations per chunk |
Classification Accuracy
| Paradigm | Classes | Typical Accuracy | When to Use RxPolya |
|---|---|---|---|
| Motor Imagery | 2-4 | 70-85% | When RxLDA/RxGMM insufficient |
| P300 | 2 (Target/Non-target) | 85-95% | Complex distributions |
| SSVEP | 2-6 | 85-98% | Advanced applications |
Bayesian MPR typically provides comparable or slightly better accuracy than RxLDA/RxGMM for complex distributions, at the cost of ~5-10ms additional latency.
Model Inspection
View Model Parameters
Compare Models
Advantages & Limitations
Advantages
✅ Flexible Multinomial Classification: Natural handling of multinomial distributions✅ Continuous Transition Mapping: Sophisticated feature-to-latent-space transformation
✅ Full Bayesian Uncertainty: Complete posterior distributions over predictions
✅ No Gaussian Assumption: More flexible than Gaussian classifiers (RxLDA/RxGMM)
✅ Production-Ready: Battle-tested in real BCI applications
✅ Calibration Support: Fast subject-specific adaptation
Limitations
❌ More Complex: More parameters than RxLDA, requires careful hyperparameter tuning❌ Slower Inference: ~15-25ms vs ~10-15ms for RxLDA
❌ (K-1) Dimensional Space: Works in reduced dimension (not always intuitive)
❌ No Mahalanobis Distance: Discriminative model lacks explicit class centers for outlier detection
❌ Requires Multiple Trials: Cannot train on single trial (minimum 2 observations)
❌ Less Interpretable: Harder to interpret than generative models with explicit means
Comparison: Bayesian MPR vs RxLDA vs RxGMM
| Aspect | Bayesian MPR | RxLDA | RxGMM |
|---|---|---|---|
| Mathematical Model | Bayesian Multinomial Probit Regression | Pooled Gaussian Classifier (PGC) | Heteroscedastic Gaussian Classifier (HGC) |
| Representation | (K-1) dimensional latent space | K class means with shared precision | K class means with class-specific precisions |
| Training Speed | Moderate | Fastest | Moderate |
| Inference Speed | 15-25ms | 10-15ms | 15-20ms |
| Flexibility | Highest | Lowest | High |
| Best For | Complex multinomial tasks | Well-separated classes | Overlapping classes with different covariances |
| Interpretability | Low (discriminative) | High (generative with means) | High (generative with means) |
| Mahalanobis Distance | ❌ No (no explicit means) | ✅ Yes | ✅ Yes |
| Entropy Metrics | ✅ Yes | ✅ Yes | ✅ Yes |
| Free Energy | ✅ Yes (training only) | ✅ Yes | ✅ Yes |
Decision Tree
Practical Examples
Motor Imagery Classification
P300 Detection
Comparing with RxLDA/RxGMM
Next Steps
Bayesian LDA (RxLDA)
Faster model with shared covariance
Bayesian GMM (RxGMM)
Flexible model with class-specific covariances
Training Guide
Complete training tutorial
Julia SDK
Full SDK reference
Code Examples
Working examples
Preprocessing Requirements
Feature extraction guide
References
Implementation:- RxInfer.jl: https://rxinfer.com/
- Source code:
/src/models/rxpolya/in NimbusSDK.jl
- Albert, J. H., & Chib, S. (1993). “Bayesian analysis of binary and polychotomous response data”
- Imai, K., & van Dyk, D. A. (2005). “A Bayesian analysis of the multinomial probit model”
- Multinomial probit regression with Bayesian inference
- Blankertz et al. (2008). “Optimizing spatial filters for robust EEG single-trial analysis”
- Lotte et al. (2018). “A review of classification algorithms for EEG-based BCI”