> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nimbusbci.com/llms.txt
> Use this file to discover all available pages before exploring further.

# EEG Preprocessing Requirements

> Prepare EEG features for Nimbus with CSP, bandpower, ERP extraction, and data-quality requirements for reliable BCI inference.

# EEG Preprocessing Requirements

<Warning>
  **NimbusSDK expects preprocessed features, NOT raw EEG.**

  The SDK performs Bayesian inference on feature-space data. You must preprocess your raw EEG signals before using NimbusSDK.
</Warning>

## What NimbusSDK Does

✅ Bayesian inference with NimbusLDA and NimbusQDA (Python + Julia), plus NimbusSoftmax and NimbusSTS (Python) and NimbusProbit (Julia)\
✅ Real-time and batch classification\
✅ Confidence scoring and quality assessment\
✅ Performance metrics (ITR, accuracy)

## What NimbusSDK Does NOT Do

❌ Raw EEG filtering (bandpass, notch)\
❌ Artifact removal (ICA, regression, ASR)\
❌ Spatial filtering (CSP, ICA, Laplacian)\
❌ Feature extraction (bandpower, CSP, ERP)

## Why This Separation?

1. **Preprocessing is paradigm-specific**: Motor Imagery needs CSP at 8-30 Hz, P300 needs ERP at 0.5-10 Hz
2. **Hardware-dependent**: Different amplifiers require different artifact handling
3. **Domain expertise**: Use established tools (MNE-Python, EEGLAB, etc.)
4. **Flexibility**: You can use any preprocessing pipeline (Python, MATLAB, custom)
5. **Focus**: NimbusSDK excels at Bayesian inference, not signal processing

<Note>
  For integration examples with preprocessing tools, see [Real-time Setup](/inference-configuration/real-time-setup)
</Note>

## Required Preprocessing Pipeline

### Step 1: Bandpass Filtering

Remove frequencies outside the band of interest:

| BCI Paradigm      | Frequency Band | Rationale                                      |
| ----------------- | -------------- | ---------------------------------------------- |
| **Motor Imagery** | 8-30 Hz        | Mu rhythm (8-13 Hz) and Beta rhythm (13-30 Hz) |
| **P300**          | 0.5-10 Hz      | Slow event-related potentials                  |
| **SSVEP**         | Target ± 2 Hz  | Narrow band around stimulation frequency       |

**Tools**: MNE-Python `raw.filter()`, EEGLAB `pop_eegfiltnew()`

### Step 2: Artifact Removal

Remove physiological and environmental artifacts:

| Artifact Type  | Frequency Range        | Removal Method          |
| -------------- | ---------------------- | ----------------------- |
| **Eye blinks** | 0-4 Hz, high amplitude | ICA (remove components) |
| **Muscle**     | >30 Hz, high amplitude | ICA or thresholding     |
| **Line noise** | 50/60 Hz               | Notch filter            |
| **Movement**   | 0-10 Hz, transient     | Regression or rejection |

**Recommended**: ICA (Independent Component Analysis) - Most versatile method

### Step 3: Epoching

Segment continuous EEG into trials aligned to events:

| Paradigm          | Time Window                       | Baseline                   |
| ----------------- | --------------------------------- | -------------------------- |
| **Motor Imagery** | 0 to 4 seconds post-cue           | -0.5 to 0 seconds          |
| **P300**          | -0.2 to 0.8 seconds post-stimulus | -0.2 to 0 seconds          |
| **SSVEP**         | 2 to 4 seconds post-onset         | No baseline (steady-state) |

### Step 4: Feature Extraction

Convert filtered epochs to discriminative features (see below).

## Feature Types

### CSP (Common Spatial Patterns) - Motor Imagery

**Recommended for Motor Imagery**

CSP maximizes variance ratio between two classes, making it ideal for motor imagery BCI.

```python theme={null}
# MNE-Python example
from mne.decoding import CSP
import numpy as np

csp = CSP(n_components=8, reg=None, log=False, transform_into='csp_space')
csp.fit(X_train, y_train)
X_csp = csp.transform(X)  # (n_trials, n_components, n_samples)

# Extract log-variance (standard for MI)
logvar_features = np.log(np.var(X_csp, axis=2))  # (n_trials, n_components)

# Transpose for Julia: (n_features, n_samples, n_trials)
X_julia = np.transpose(X_csp, (1, 2, 0))
```

**Output dimension**: `2 × n_components` (e.g., 16 for 8 components)

### Bandpower Features - SSVEP

Compute power in specific frequency bands:

```python theme={null}
from scipy.signal import welch

def compute_bandpower(epochs, bands=[(8, 13), (13, 30)]):
    powers = []
    for band in bands:
        freqs, psd = welch(epochs, fs=250, nperseg=256)
        band_power = np.mean(psd[:, (freqs >= band[0]) & (freqs <= band[1])], axis=1)
        powers.append(band_power)
    return np.column_stack(powers)
```

**Output dimension**: `n_channels × n_bands`

### ERP Amplitude - P300

Extract amplitude at specific time windows:

```python theme={null}
# Extract mean amplitude in P300 window (300-500ms)
sfreq = epochs.info['sfreq']
p300_start = int(0.3 * sfreq)
p300_end = int(0.5 * sfreq)

# Mean amplitude per channel in P300 window
p300_amplitude = epochs.get_data()[:, :, p300_start:p300_end].mean(axis=2)
```

## Paradigm-Specific Guidelines

### Motor Imagery

**Recommended pipeline:**

1. Bandpass: **8-30 Hz** (mu + beta)
2. Artifact removal: ICA (remove eye blinks)
3. Epoching: 0-4 seconds post-cue
4. Feature extraction: **CSP** (8 components → 16 features)
5. Temporal aggregation: **Log-variance**

**Data requirements:**

* Minimum: 40 trials per class
* Recommended: 80+ trials per class

**Expected accuracy**: 70-90% (subject-dependent)

### P300

**Recommended pipeline:**

1. Bandpass: **0.5-10 Hz**
2. Artifact removal: ICA or rejection
3. Epoching: -0.2 to 0.8 seconds post-stimulus
4. Feature extraction: ERP amplitude (300-500ms window)
5. Temporal aggregation: Mean

**Data requirements:**

* Minimum: 200 target, 1000 non-target trials
* Recommended: 400 target, 2000 non-target trials

**Expected accuracy**: 80-95% (with averaging)

### SSVEP

**Recommended pipeline:**

1. Bandpass: **Target frequency ± 2 Hz**
2. Artifact removal: Eye blink rejection
3. Epoching: 2-4 seconds post-onset
4. Feature extraction: CCA or bandpower
5. No baseline correction (steady-state)

**Data requirements:**

* Minimum: 30 trials per frequency
* Recommended: 60+ trials per frequency

**Expected accuracy**: 85-98%

## Data Format Requirements

### Expected Format

```julia theme={null}
features::Array{Float64, 3}  # (n_features × n_samples × n_trials)
```

Where:

* `n_features`: Number of extracted features (e.g., 16 for CSP)
* `n_samples`: Samples per trial (e.g., 1000 for 4 seconds at 250 Hz)
* `n_trials`: Number of trials

### Converting from Python/NumPy

Python typically uses `(n_trials, n_features, n_samples)`:

```python theme={null}
# Python shape
X_csp.shape  # (100 trials, 8 components, 1000 samples)

# Convert for Julia
import numpy as np
X_julia = np.transpose(X_csp, (1, 2, 0))  # (8, 1000, 100)

# Save
from scipy.io import savemat
savemat('features.mat', {'features': X_julia, 'labels': y + 1})  # 1-indexed labels!
```

### Labels

Labels must be **1-indexed integers**:

```julia theme={null}
labels = [1, 2, 3, 4, 1, 2, ...]  # ✅ Correct (Julia-style)
labels = [0, 1, 2, 3, 0, 1, ...]  # ❌ Wrong (Python-style)
```

Convert 0-indexed to 1-indexed:

```python theme={null}
y_julia = y + 1  # If y ∈ {0, 1, 2, 3}, convert to {1, 2, 3, 4}
```

## Common Pitfalls

### Pitfall 1: Using Raw EEG Instead of Features

**Symptom**: Accuracy near chance level (25% for 4-class)

**Fix**: Apply feature extraction (CSP, bandpower, etc.)

```julia theme={null}
# WRONG - Raw EEG channels
features = raw_eeg  # (32 channels × 1000 samples × 100 trials)

# CORRECT - CSP features
csp_features = extract_csp(raw_eeg)  # (16 features × 1000 samples × 100 trials)
```

### Pitfall 2: Wrong Frequency Band

**Symptom**: Low confidence scores, poor separability

| Paradigm      | Correct Band | Wrong Band | Result               |
| ------------- | ------------ | ---------- | -------------------- |
| Motor Imagery | 8-30 Hz      | 0.5-10 Hz  | Accuracy ↓ 20-30%    |
| P300          | 0.5-10 Hz    | 8-30 Hz    | Amplitude ↓, noise ↑ |

### Pitfall 3: Incorrect Data Shape

**Symptom**: `DimensionMismatch` error

```julia theme={null}
# WRONG - Transposed dimensions
features = X_csp  # (trials × features × samples) from numpy

# CORRECT - Transpose to NimbusSDK format
features = permutedims(X_csp, (2, 3, 1))  # (features × samples × trials)
```

## Validation Checklist

Before using NimbusSDK, verify:

### Data Quality ✅

* [ ] **No NaN values**: `@assert !any(isnan, features)`
* [ ] **No Inf values**: `@assert !any(isinf, features)`
* [ ] **Finite range**: Values are reasonable
* [ ] **No constant features**: Each feature varies

### Preprocessing Steps ✅

* [ ] **Bandpass filtered**: Paradigm-appropriate frequency band
* [ ] **Artifacts removed**: ICA or equivalent applied
* [ ] **Epoched correctly**: Proper time windows
* [ ] **Features extracted**: CSP/bandpower/ERP, not raw EEG

### Format Requirements ✅

* [ ] **Correct shape**: `(n_features × n_samples × n_trials)`
* [ ] **Correct type**: `Float64` (or convertible)
* [ ] **Labels valid**: 1-indexed integers
* [ ] **Metadata accurate**: Sampling rate, paradigm, feature type

## Feature Normalization

<Warning>
  **Feature normalization is CRITICAL for cross-session BCI!**

  EEG amplitude varies 50-200% across sessions. Proper normalization improves cross-session accuracy by 15-30%.
</Warning>

For optimal performance, especially when using models across different sessions:

```julia theme={null}
using NimbusSDK

# Training: Estimate normalization parameters
norm_params = estimate_normalization_params(train_features; method=:zscore)
train_norm = apply_normalization(train_features, norm_params)

# Testing: Apply SAME parameters
test_norm = apply_normalization(test_features, norm_params)

# Save parameters with model
@save "model.jld2" model norm_params
```

**Impact**:

* Same session: +1% accuracy
* Cross-session (next day): +15-25% accuracy
* Multi-subject transfer: +15-20% accuracy

<Tip>
  See [Feature Normalization](/inference-configuration/feature-normalization) for the recommended train/test scaling workflow.
</Tip>

## Preprocessing Diagnostics

NimbusSDK includes built-in diagnostics:

```julia theme={null}
using NimbusSDK

# Run diagnostics
report = diagnose_preprocessing(bci_data)

# Check for issues
if !isempty(report.errors)
    @error "Preprocessing errors detected" report.errors
end

# View recommendations
println("Quality score: $(round(report.quality_score * 100, digits=1))%")
for rec in report.recommendations
    println("  • ", rec)
end
```

**What it checks:**

* Line noise (50/60 Hz components)
* Amplitude range (detects raw EEG vs features)
* DC offset
* Temporal correlation
* Feature normalization
* NaN/Inf values

## Next Read

<CardGroup cols={2}>
  <Card title="Real-time Setup" icon="gauge" href="/inference-configuration/real-time-setup">
    Integration with EEG acquisition systems
  </Card>

  <Card title="Batch Processing" icon="list" href="/inference-configuration/batch-processing">
    Process multiple trials efficiently
  </Card>

  <Card title="Julia SDK" icon="code" href="/julia-sdk/api-reference">
    Complete SDK reference
  </Card>

  <Card title="Code Examples" icon="braces" href="/examples/basic-examples">
    Working preprocessing examples
  </Card>
</CardGroup>

## Additional Resources

### Tools

* [MNE-Python](https://mne.tools/) - Python EEG processing
* [EEGLAB](https://eeglab.org/) - MATLAB EEG processing
* [BrainFlow](https://brainflow.readthedocs.io/) - Hardware integration

### Papers

* Ramoser et al. (2000). "Optimal spatial filtering of single trial EEG"
* Blankertz et al. (2008). "Optimizing spatial filters for robust EEG single-trial analysis"
* Lotte et al. (2018). "A review of classification algorithms for EEG-based BCI"
