> ## Documentation Index > Fetch the complete documentation index at: https://docs.nimbusbci.com/llms.txt > Use this file to discover all available pages before exploring further. # EEG Preprocessing Requirements > Prepare EEG features for Nimbus with CSP, bandpower, ERP extraction, and data-quality requirements for reliable BCI inference. # EEG Preprocessing Requirements **NimbusSDK expects preprocessed features, NOT raw EEG.** The SDK performs Bayesian inference on feature-space data. You must preprocess your raw EEG signals before using NimbusSDK. ## What NimbusSDK Does ✅ Bayesian inference with NimbusLDA and NimbusQDA (Python + Julia), plus NimbusSoftmax and NimbusSTS (Python) and NimbusProbit (Julia)\ ✅ Real-time and batch classification\ ✅ Confidence scoring and quality assessment\ ✅ Performance metrics (ITR, accuracy) ## What NimbusSDK Does NOT Do ❌ Raw EEG filtering (bandpass, notch)\ ❌ Artifact removal (ICA, regression, ASR)\ ❌ Spatial filtering (CSP, ICA, Laplacian)\ ❌ Feature extraction (bandpower, CSP, ERP) ## Why This Separation? 1. **Preprocessing is paradigm-specific**: Motor Imagery needs CSP at 8-30 Hz, P300 needs ERP at 0.5-10 Hz 2. **Hardware-dependent**: Different amplifiers require different artifact handling 3. **Domain expertise**: Use established tools (MNE-Python, EEGLAB, etc.) 4. **Flexibility**: You can use any preprocessing pipeline (Python, MATLAB, custom) 5. **Focus**: NimbusSDK excels at Bayesian inference, not signal processing For integration examples with preprocessing tools, see [Real-time Setup](/inference-configuration/real-time-setup) ## Required Preprocessing Pipeline ### Step 1: Bandpass Filtering Remove frequencies outside the band of interest: | BCI Paradigm | Frequency Band | Rationale | | ----------------- | -------------- | ---------------------------------------------- | | **Motor Imagery** | 8-30 Hz | Mu rhythm (8-13 Hz) and Beta rhythm (13-30 Hz) | | **P300** | 0.5-10 Hz | Slow event-related potentials | | **SSVEP** | Target ± 2 Hz | Narrow band around stimulation frequency | **Tools**: MNE-Python `raw.filter()`, EEGLAB `pop_eegfiltnew()` ### Step 2: Artifact Removal Remove physiological and environmental artifacts: | Artifact Type | Frequency Range | Removal Method | | -------------- | ---------------------- | ----------------------- | | **Eye blinks** | 0-4 Hz, high amplitude | ICA (remove components) | | **Muscle** | >30 Hz, high amplitude | ICA or thresholding | | **Line noise** | 50/60 Hz | Notch filter | | **Movement** | 0-10 Hz, transient | Regression or rejection | **Recommended**: ICA (Independent Component Analysis) - Most versatile method ### Step 3: Epoching Segment continuous EEG into trials aligned to events: | Paradigm | Time Window | Baseline | | ----------------- | --------------------------------- | -------------------------- | | **Motor Imagery** | 0 to 4 seconds post-cue | -0.5 to 0 seconds | | **P300** | -0.2 to 0.8 seconds post-stimulus | -0.2 to 0 seconds | | **SSVEP** | 2 to 4 seconds post-onset | No baseline (steady-state) | ### Step 4: Feature Extraction Convert filtered epochs to discriminative features (see below). ## Feature Types ### CSP (Common Spatial Patterns) - Motor Imagery **Recommended for Motor Imagery** CSP maximizes variance ratio between two classes, making it ideal for motor imagery BCI. ```python theme={null} # MNE-Python example from mne.decoding import CSP import numpy as np csp = CSP(n_components=8, reg=None, log=False, transform_into='csp_space') csp.fit(X_train, y_train) X_csp = csp.transform(X) # (n_trials, n_components, n_samples) # Extract log-variance (standard for MI) logvar_features = np.log(np.var(X_csp, axis=2)) # (n_trials, n_components) # Transpose for Julia: (n_features, n_samples, n_trials) X_julia = np.transpose(X_csp, (1, 2, 0)) ``` **Output dimension**: `2 × n_components` (e.g., 16 for 8 components) ### Bandpower Features - SSVEP Compute power in specific frequency bands: ```python theme={null} from scipy.signal import welch def compute_bandpower(epochs, bands=[(8, 13), (13, 30)]): powers = [] for band in bands: freqs, psd = welch(epochs, fs=250, nperseg=256) band_power = np.mean(psd[:, (freqs >= band[0]) & (freqs <= band[1])], axis=1) powers.append(band_power) return np.column_stack(powers) ``` **Output dimension**: `n_channels × n_bands` ### ERP Amplitude - P300 Extract amplitude at specific time windows: ```python theme={null} # Extract mean amplitude in P300 window (300-500ms) sfreq = epochs.info['sfreq'] p300_start = int(0.3 * sfreq) p300_end = int(0.5 * sfreq) # Mean amplitude per channel in P300 window p300_amplitude = epochs.get_data()[:, :, p300_start:p300_end].mean(axis=2) ``` ## Paradigm-Specific Guidelines ### Motor Imagery **Recommended pipeline:** 1. Bandpass: **8-30 Hz** (mu + beta) 2. Artifact removal: ICA (remove eye blinks) 3. Epoching: 0-4 seconds post-cue 4. Feature extraction: **CSP** (8 components → 16 features) 5. Temporal aggregation: **Log-variance** **Data requirements:** * Minimum: 40 trials per class * Recommended: 80+ trials per class **Expected accuracy**: 70-90% (subject-dependent) ### P300 **Recommended pipeline:** 1. Bandpass: **0.5-10 Hz** 2. Artifact removal: ICA or rejection 3. Epoching: -0.2 to 0.8 seconds post-stimulus 4. Feature extraction: ERP amplitude (300-500ms window) 5. Temporal aggregation: Mean **Data requirements:** * Minimum: 200 target, 1000 non-target trials * Recommended: 400 target, 2000 non-target trials **Expected accuracy**: 80-95% (with averaging) ### SSVEP **Recommended pipeline:** 1. Bandpass: **Target frequency ± 2 Hz** 2. Artifact removal: Eye blink rejection 3. Epoching: 2-4 seconds post-onset 4. Feature extraction: CCA or bandpower 5. No baseline correction (steady-state) **Data requirements:** * Minimum: 30 trials per frequency * Recommended: 60+ trials per frequency **Expected accuracy**: 85-98% ## Data Format Requirements ### Expected Format ```julia theme={null} features::Array{Float64, 3} # (n_features × n_samples × n_trials) ``` Where: * `n_features`: Number of extracted features (e.g., 16 for CSP) * `n_samples`: Samples per trial (e.g., 1000 for 4 seconds at 250 Hz) * `n_trials`: Number of trials ### Converting from Python/NumPy Python typically uses `(n_trials, n_features, n_samples)`: ```python theme={null} # Python shape X_csp.shape # (100 trials, 8 components, 1000 samples) # Convert for Julia import numpy as np X_julia = np.transpose(X_csp, (1, 2, 0)) # (8, 1000, 100) # Save from scipy.io import savemat savemat('features.mat', {'features': X_julia, 'labels': y + 1}) # 1-indexed labels! ``` ### Labels Labels must be **1-indexed integers**: ```julia theme={null} labels = [1, 2, 3, 4, 1, 2, ...] # ✅ Correct (Julia-style) labels = [0, 1, 2, 3, 0, 1, ...] # ❌ Wrong (Python-style) ``` Convert 0-indexed to 1-indexed: ```python theme={null} y_julia = y + 1 # If y ∈ {0, 1, 2, 3}, convert to {1, 2, 3, 4} ``` ## Common Pitfalls ### Pitfall 1: Using Raw EEG Instead of Features **Symptom**: Accuracy near chance level (25% for 4-class) **Fix**: Apply feature extraction (CSP, bandpower, etc.) ```julia theme={null} # WRONG - Raw EEG channels features = raw_eeg # (32 channels × 1000 samples × 100 trials) # CORRECT - CSP features csp_features = extract_csp(raw_eeg) # (16 features × 1000 samples × 100 trials) ``` ### Pitfall 2: Wrong Frequency Band **Symptom**: Low confidence scores, poor separability | Paradigm | Correct Band | Wrong Band | Result | | ------------- | ------------ | ---------- | -------------------- | | Motor Imagery | 8-30 Hz | 0.5-10 Hz | Accuracy ↓ 20-30% | | P300 | 0.5-10 Hz | 8-30 Hz | Amplitude ↓, noise ↑ | ### Pitfall 3: Incorrect Data Shape **Symptom**: `DimensionMismatch` error ```julia theme={null} # WRONG - Transposed dimensions features = X_csp # (trials × features × samples) from numpy # CORRECT - Transpose to NimbusSDK format features = permutedims(X_csp, (2, 3, 1)) # (features × samples × trials) ``` ## Validation Checklist Before using NimbusSDK, verify: ### Data Quality ✅ * [ ] **No NaN values**: `@assert !any(isnan, features)` * [ ] **No Inf values**: `@assert !any(isinf, features)` * [ ] **Finite range**: Values are reasonable * [ ] **No constant features**: Each feature varies ### Preprocessing Steps ✅ * [ ] **Bandpass filtered**: Paradigm-appropriate frequency band * [ ] **Artifacts removed**: ICA or equivalent applied * [ ] **Epoched correctly**: Proper time windows * [ ] **Features extracted**: CSP/bandpower/ERP, not raw EEG ### Format Requirements ✅ * [ ] **Correct shape**: `(n_features × n_samples × n_trials)` * [ ] **Correct type**: `Float64` (or convertible) * [ ] **Labels valid**: 1-indexed integers * [ ] **Metadata accurate**: Sampling rate, paradigm, feature type ## Feature Normalization **Feature normalization is CRITICAL for cross-session BCI!** EEG amplitude varies 50-200% across sessions. Proper normalization improves cross-session accuracy by 15-30%. For optimal performance, especially when using models across different sessions: ```julia theme={null} using NimbusSDK # Training: Estimate normalization parameters norm_params = estimate_normalization_params(train_features; method=:zscore) train_norm = apply_normalization(train_features, norm_params) # Testing: Apply SAME parameters test_norm = apply_normalization(test_features, norm_params) # Save parameters with model @save "model.jld2" model norm_params ``` **Impact**: * Same session: +1% accuracy * Cross-session (next day): +15-25% accuracy * Multi-subject transfer: +15-20% accuracy See [Feature Normalization](/inference-configuration/feature-normalization) for the recommended train/test scaling workflow. ## Preprocessing Diagnostics NimbusSDK includes built-in diagnostics: ```julia theme={null} using NimbusSDK # Run diagnostics report = diagnose_preprocessing(bci_data) # Check for issues if !isempty(report.errors) @error "Preprocessing errors detected" report.errors end # View recommendations println("Quality score: $(round(report.quality_score * 100, digits=1))%") for rec in report.recommendations println(" • ", rec) end ``` **What it checks:** * Line noise (50/60 Hz components) * Amplitude range (detects raw EEG vs features) * DC offset * Temporal correlation * Feature normalization * NaN/Inf values ## Next Read Integration with EEG acquisition systems Process multiple trials efficiently Complete SDK reference Working preprocessing examples ## Additional Resources ### Tools * [MNE-Python](https://mne.tools/) - Python EEG processing * [EEGLAB](https://eeglab.org/) - MATLAB EEG processing * [BrainFlow](https://brainflow.readthedocs.io/) - Hardware integration ### Papers * Ramoser et al. (2000). "Optimal spatial filtering of single trial EEG" * Blankertz et al. (2008). "Optimizing spatial filters for robust EEG single-trial analysis" * Lotte et al. (2018). "A review of classification algorithms for EEG-based BCI"