To See a World in a Spark of Neuron: Disentangling Multi-task Interference for Training-free Model Merging

1 Xiamen University Malaysia2 The Hong Kong Polytechnic University
3 Columbia University4 Duke University5 Harbin Institute of Technology (Shenzhen)
*Corresponding authors

Work conducted while at Xiamen University Malaysia
Published at EMNLP 2025.
Results of NeuroMerging

Abstract

Fine-tuning pre-trained models on targeted datasets enhances task-specific performance but often comes at the expense of generalization. Model merging techniques, which integrate multiple fine-tuned models into a single multi-task model through task arithmetic, offer a promising solution. However, task interference remains a fundamental challenge, leading to performance degradation and suboptimal merged models. Existing approaches largely overlooked the fundamental roles of neurons, their connectivity, and activation, resulting in a merging process and a merged model that does not consider how neurons relay and process information. In this work, we present the first study that relies on neuronal mechanisms for model merging. Specifically, we decomposed task-specific representations into two complementary neuronal subspaces that regulate input sensitivity and task adaptability. Leveraging this decomposition, we introduced NeuroMerging, a novel merging framework developed to mitigate task interference within neuronal subspaces, enabling training-free model fusion across diverse tasks. Through extensive experiments, we demonstrated that NeuroMerging achieved superior performance compared to existing methods on multi-task benchmarks across both natural language and vision domains. Our findings highlighted the importance of aligning neuronal mechanisms in model merging, offering new insights into mitigating task interference and improving knowledge fusion.

Main Figure

Illustration of our proposed NeuroMerging framework.

(A) Our approach explicitly considers neuronal activation mechanisms through neuronal task vectors $\tau_t^k \in \mathbb{R}^d$, defined as the difference between fine-tuned and pre-trained neurons for task $t$. (B) These vectors are decomposed into parallel and orthogonal subspaces relative to pre-trained neurons, corresponding to input sensitivity and task adaptability. SVD constructs a coordinate system in the previously unstructured orthogonal subspace. (C) Our NeuroMerging method merges models efficiently within these low-dimensional complementary subspaces, rather than in the original high-dimensional weight space.

Algorithm

Algorithm

Video Presentation

Poster

BibTeX

Click anywhere on the box below to highlight complete record

@inproceedings{fang2025neuromerging,
  title={To See a World in a Spark of Neuron: Disentangling Multi-task Interference for Training-free Model Merging},
  author={Fang, Zitao and Du, Guodong and Yu, Shuyang and Guo, Yifei and Zhang, Yiwei and Cao, Yiyao and Li, Jing and Tang, Ho-Kin and Goh, Sim Kuan},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2025},
  url={https://arxiv.org/abs/2503.05320}
}