NeurIPT

Electroencephalography (EEG) has wide-ranging applications, from clinical diagnosis to brain-computer interfaces (BCIs). With the increasing volume and variety of EEG data, there has been growing interest in establishing foundation models (FMs) to scale up and generalize neural decoding. Despite showing early potential, applying FMs to EEG remains challenging due to substantial inter-subject, inter-task, and inter-condition variability, as well as diverse electrode configurations across recording setups. To tackle these open challenges, we propose NeurIPT, a foundation model developed for diverse EEG-based Neural Interfaces with a Pre-trained Transformer by capturing both homogeneous and heterogeneous spatio-temporal characteristics inherent in EEG signals. Temporally, we introduce Amplitude-Aware Masked Pretraining (AAMP), masking based on signal amplitude rather than random intervals, to learn robust representations across varying signal intensities beyond local interpolation. Moreover, this temporal representation is enhanced by a Progressive Mixture-of-Experts (PMoE) architecture, where specialized expert subnetworks are progressively introduced at deeper layers, adapting effectively to the diverse temporal characteristics of EEG signals. Spatially, NeurIPT leverages the 3D physical coordinates of electrodes, enabling effective transfer of embedding across varying EEG settings, and develops Intra-Inter Lobe Pooling (IILP) during fine-tuning to efficiently exploit regional brain features. Empirical evaluations across eight downstream BCI datasets, via fine-tuning, demonstrated NeurIPT consistently achieved state-of-the-art performance, highlighting its broad applicability and robust generalization. Our work pushes forward the state of FMs in EEG and offers insights into scalable and generalizable neural information processing systems.

Spectrograms of the nine EEG datasets reveal both homogeneous and heterogeneous spectral patterns, with some datasets showing higher power spectral density (PSD) in specific EEG frequency bands. Thus, they demand neural representations capable of adapting to input variability.

Overview of our NeurIPT, which comprises Amplitude-Aware Masked Pretraining (AAMP), 3D Electrode Embedding, Progressive Mixture-of-Experts (PMoE), and Intra-Inter Lobe Pooling (IILP) for fine-tuning. See below for details on the IILP module.

(Left) Intra-Inter Lobe Pooling (IILP) leverages regional brain features during fine-tuning. (Right) Visualization of attention scores from the temporal attention module and analysis of Pearson correlation between class logits and channel perturbation using Gaussian multiplicative noise. Note that colors in the right panel correspond to the brain regions depicted on the left.

Analysis of expert participation (temporal and spatial) when EEG data from different classes of the BCIC-IV-2A dataset is input to the model.

Click anywhere on the box below to highlight complete record

@inproceedings{fang2025neuript,
  title={NeurIPT: Foundation Model for Neural Interfaces},
  author={Fang, Zitao and Li, Chenxuan and Zhou, Hongting and Yu, Shuyang and Du, Guodong and Qasem, Ashwaq and Lu, Yang and Li, Jing and Zhang, Junsong and Goh, Sim Kuan},
  booktitle={Proceedings of the Advances in Neural Information Processing Systems (NeurIPS)},
  year={2025},
  url={https://arxiv.org/abs/2510.16548}
}

NeurIPT:
Foundation Model for Neural Interfaces

Abstract

**Illustration of our proposed NeurIPT (part A).

Overview of our NeurIPT, which comprises Amplitude-Aware Masked Pretraining (AAMP), 3D Electrode Embedding, Progressive Mixture-of-Experts (PMoE), and Intra-Inter Lobe Pooling (IILP)** for fine-tuning. See below for details on the IILP module.

Models performance on various BCI downtream tasks.

**Analysis of Progressive Mixture-of-Experts (PMoE).**

Analysis of expert participation (temporal and spatial) when EEG data from different classes of the BCIC-IV-2A dataset is input to the model.

**Ablation study on each individual component in NeurIPT.**

Different MoE strategies across various datasets.

**Different PMoE configurations across various datasets.**

Different pooling strategies across various datasets.

Different activation functions for EEG FMs (trained from scratch).

Low resource scenario. Metrics are shown as percentages relative to the full-data baseline.

Comparison between different masking strategies across diverse downstream tasks.

Different positional encoding strategies across various tasks. Alternative encoding strategies are included: trigonometric
functions and 1D learnable embeddings employed by vanilla Transformer, and 2D learnable embeddings introduced in LaBraM.

BibTeX

NeurIPT: Foundation Model for Neural Interfaces