Skip to main content

EEG Preprocessing Requirements

NimbusSDK expects preprocessed features, NOT raw EEG.The SDK performs Bayesian inference on feature-space data. You must preprocess your raw EEG signals before using NimbusSDK.

What NimbusSDK Does

✅ Bayesian inference with Bayesian LDA (RxLDA) and Bayesian GMM (RxGMM) models
✅ Real-time and batch classification
✅ Confidence scoring and quality assessment
✅ Performance metrics (ITR, accuracy)

What NimbusSDK Does NOT Do

❌ Raw EEG filtering (bandpass, notch)
❌ Artifact removal (ICA, regression, ASR)
❌ Spatial filtering (CSP, ICA, Laplacian)
❌ Feature extraction (bandpower, CSP, ERP)

Why This Separation?

  1. Preprocessing is paradigm-specific: Motor Imagery needs CSP at 8-30 Hz, P300 needs ERP at 0.5-10 Hz
  2. Hardware-dependent: Different amplifiers require different artifact handling
  3. Domain expertise: Use established tools (MNE-Python, EEGLAB, etc.)
  4. Flexibility: You can use any preprocessing pipeline (Python, MATLAB, custom)
  5. Focus: NimbusSDK excels at Bayesian inference, not signal processing
For integration examples with preprocessing tools, see Real-time Setup

Required Preprocessing Pipeline

Step 1: Bandpass Filtering

Remove frequencies outside the band of interest:
BCI ParadigmFrequency BandRationale
Motor Imagery8-30 HzMu rhythm (8-13 Hz) and Beta rhythm (13-30 Hz)
P3000.5-10 HzSlow event-related potentials
SSVEPTarget ± 2 HzNarrow band around stimulation frequency
Tools: MNE-Python raw.filter(), EEGLAB pop_eegfiltnew()

Step 2: Artifact Removal

Remove physiological and environmental artifacts:
Artifact TypeFrequency RangeRemoval Method
Eye blinks0-4 Hz, high amplitudeICA (remove components)
Muscle>30 Hz, high amplitudeICA or thresholding
Line noise50/60 HzNotch filter
Movement0-10 Hz, transientRegression or rejection
Recommended: ICA (Independent Component Analysis) - Most versatile method

Step 3: Epoching

Segment continuous EEG into trials aligned to events:
ParadigmTime WindowBaseline
Motor Imagery0 to 4 seconds post-cue-0.5 to 0 seconds
P300-0.2 to 0.8 seconds post-stimulus-0.2 to 0 seconds
SSVEP2 to 4 seconds post-onsetNo baseline (steady-state)

Step 4: Feature Extraction

Convert filtered epochs to discriminative features (see below).

Feature Types

CSP (Common Spatial Patterns) - Motor Imagery

Recommended for Motor Imagery CSP maximizes variance ratio between two classes, making it ideal for motor imagery BCI.
# MNE-Python example
from mne.decoding import CSP
import numpy as np

csp = CSP(n_components=8, reg=None, log=False, transform_into='csp_space')
csp.fit(X_train, y_train)
X_csp = csp.transform(X)  # (n_trials, n_components, n_samples)

# Extract log-variance (standard for MI)
logvar_features = np.log(np.var(X_csp, axis=2))  # (n_trials, n_components)

# Transpose for Julia: (n_features, n_samples, n_trials)
X_julia = np.transpose(X_csp, (1, 2, 0))
Output dimension: 2 × n_components (e.g., 16 for 8 components)

Bandpower Features - SSVEP

Compute power in specific frequency bands:
from scipy.signal import welch

def compute_bandpower(epochs, bands=[(8, 13), (13, 30)]):
    powers = []
    for band in bands:
        freqs, psd = welch(epochs, fs=250, nperseg=256)
        band_power = np.mean(psd[:, (freqs >= band[0]) & (freqs <= band[1])], axis=1)
        powers.append(band_power)
    return np.column_stack(powers)
Output dimension: n_channels × n_bands

ERP Amplitude - P300

Extract amplitude at specific time windows:
# Extract mean amplitude in P300 window (300-500ms)
sfreq = epochs.info['sfreq']
p300_start = int(0.3 * sfreq)
p300_end = int(0.5 * sfreq)

# Mean amplitude per channel in P300 window
p300_amplitude = epochs.get_data()[:, :, p300_start:p300_end].mean(axis=2)

Paradigm-Specific Guidelines

Motor Imagery

Recommended pipeline:
  1. Bandpass: 8-30 Hz (mu + beta)
  2. Artifact removal: ICA (remove eye blinks)
  3. Epoching: 0-4 seconds post-cue
  4. Feature extraction: CSP (8 components → 16 features)
  5. Temporal aggregation: Log-variance
Data requirements:
  • Minimum: 40 trials per class
  • Recommended: 80+ trials per class
Expected accuracy: 70-90% (subject-dependent)

P300

Recommended pipeline:
  1. Bandpass: 0.5-10 Hz
  2. Artifact removal: ICA or rejection
  3. Epoching: -0.2 to 0.8 seconds post-stimulus
  4. Feature extraction: ERP amplitude (300-500ms window)
  5. Temporal aggregation: Mean
Data requirements:
  • Minimum: 200 target, 1000 non-target trials
  • Recommended: 400 target, 2000 non-target trials
Expected accuracy: 80-95% (with averaging)

SSVEP

Recommended pipeline:
  1. Bandpass: Target frequency ± 2 Hz
  2. Artifact removal: Eye blink rejection
  3. Epoching: 2-4 seconds post-onset
  4. Feature extraction: CCA or bandpower
  5. No baseline correction (steady-state)
Data requirements:
  • Minimum: 30 trials per frequency
  • Recommended: 60+ trials per frequency
Expected accuracy: 85-98%

Data Format Requirements

Expected Format

features::Array{Float64, 3}  # (n_features × n_samples × n_trials)
Where:
  • n_features: Number of extracted features (e.g., 16 for CSP)
  • n_samples: Samples per trial (e.g., 1000 for 4 seconds at 250 Hz)
  • n_trials: Number of trials

Converting from Python/NumPy

Python typically uses (n_trials, n_features, n_samples):
# Python shape
X_csp.shape  # (100 trials, 8 components, 1000 samples)

# Convert for Julia
import numpy as np
X_julia = np.transpose(X_csp, (1, 2, 0))  # (8, 1000, 100)

# Save
from scipy.io import savemat
savemat('features.mat', {'features': X_julia, 'labels': y + 1})  # 1-indexed labels!

Labels

Labels must be 1-indexed integers:
labels = [1, 2, 3, 4, 1, 2, ...]  # ✅ Correct (Julia-style)
labels = [0, 1, 2, 3, 0, 1, ...]  # ❌ Wrong (Python-style)
Convert 0-indexed to 1-indexed:
y_julia = y + 1  # If y ∈ {0, 1, 2, 3}, convert to {1, 2, 3, 4}

Common Pitfalls

Pitfall 1: Using Raw EEG Instead of Features

Symptom: Accuracy near chance level (25% for 4-class) Fix: Apply feature extraction (CSP, bandpower, etc.)
# WRONG - Raw EEG channels
features = raw_eeg  # (32 channels × 1000 samples × 100 trials)

# CORRECT - CSP features
csp_features = extract_csp(raw_eeg)  # (16 features × 1000 samples × 100 trials)

Pitfall 2: Wrong Frequency Band

Symptom: Low confidence scores, poor separability
ParadigmCorrect BandWrong BandResult
Motor Imagery8-30 Hz0.5-10 HzAccuracy ↓ 20-30%
P3000.5-10 Hz8-30 HzAmplitude ↓, noise ↑

Pitfall 3: Incorrect Data Shape

Symptom: DimensionMismatch error
# WRONG - Transposed dimensions
features = X_csp  # (trials × features × samples) from numpy

# CORRECT - Transpose to NimbusSDK format
features = permutedims(X_csp, (2, 3, 1))  # (features × samples × trials)

Validation Checklist

Before using NimbusSDK, verify:

Data Quality ✅

  • No NaN values: @assert !any(isnan, features)
  • No Inf values: @assert !any(isinf, features)
  • Finite range: Values are reasonable
  • No constant features: Each feature varies

Preprocessing Steps ✅

  • Bandpass filtered: Paradigm-appropriate frequency band
  • Artifacts removed: ICA or equivalent applied
  • Epoched correctly: Proper time windows
  • Features extracted: CSP/bandpower/ERP, not raw EEG

Format Requirements ✅

  • Correct shape: (n_features × n_samples × n_trials)
  • Correct type: Float64 (or convertible)
  • Labels valid: 1-indexed integers
  • Metadata accurate: Sampling rate, paradigm, feature type

Preprocessing Diagnostics

NimbusSDK includes built-in diagnostics:
using NimbusSDK

# Run diagnostics
report = diagnose_preprocessing(bci_data)

# Check for issues
if !isempty(report.errors)
    @error "Preprocessing errors detected" report.errors
end

# View recommendations
println("Quality score: $(round(report.quality_score * 100, digits=1))%")
for rec in report.recommendations
    println("  • ", rec)
end
What it checks:
  • Line noise (50/60 Hz components)
  • Amplitude range (detects raw EEG vs features)
  • DC offset
  • Temporal correlation
  • Feature normalization
  • NaN/Inf values

Next Steps

Additional Resources

Tools

Papers

  • Ramoser et al. (2000). “Optimal spatial filtering of single trial EEG”
  • Blankertz et al. (2008). “Optimizing spatial filters for robust EEG single-trial analysis”
  • Lotte et al. (2018). “A review of classification algorithms for EEG-based BCI”