Explore Model Kinship For Merging Large Language Models

Oct 2, 2024·
Yedi Hu
,
Yunzhi Yao
,
Ningyu Zhang
,
Shumin Deng
,
Huajun Chen
· 3 min read
Main Figure
Abstract
Model merging has become one of the key technologies for enhancing the capabilities and efficiency of Large Language Models (LLMs). However, our understanding of the expected performance gains and principles when merging any two models remains limited. In this work, we introduce model kinship, the degree of similarity or relatedness between LLMs, analogous to biological evolution. With comprehensive empirical analysis, we find that there is a certain relationship between model kinship and the performance gains after model merging, which can help guide our selection of candidate models. Inspired by this, we propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can yield better performance on benchmark datasets. Specifically, we discover that using model kinship as a criterion can assist us in continuously performing model merging, alleviating the degradation (local optima) in model evolution, whereas model kinship can serve as a guide to escape these traps.
Type
Publication
Under Review

NoteBookLM Speech

This speech is generated by NotebookLM, and some points may not be explained clearly.

Method

Motivated by the parallel drawn between biological reproduction and the process of model evolution, we propose that the concept of kinship, which is central to evolutionary biology for understanding breeding relationships and human genealogies, can be conceptually extended to the field of model merging, specifically to describe the kinship between models.

To be noted, Ilarco et al., 2023 has proposed Task Arithmetic and offered new insights into how the merging process preserves and blends task information from the original models. A task vector is defined as the differences in weight space between fine-tuned models and a pre-trained model, and they demonstrate that arithmetic operations on these task vectors can serve as an effective method for model merging.

Inspired by this idea, we introduce the concept of Model Kinship, a metric designed to assess the degree of similarity or relatedness between large language models (LLMs) based on their “genetic” information (a.k.a. the changes in weights during model evolution).

Method

Results

The results are divided into two parts. The first part is a statistical analysis of community merging evolution experiments, explaining the impact of model kinship on model merging and issues in the evolutionary process. The second part presents the main experiments, which reproduce the optimization issues caused by using a greedy strategy for iterative model merging and how model kinship can mitigate this problem.

In the preliminary analysis, we interpret the community experiments from different perspectives. The results in the article lead to two main conclusions:

  1. There is a correlation between model kinship and the results of model merging. This finding suggests that kinship can potentially be used to predict merging outcomes, avoiding unnecessary merging without the need for costly evaluations.
  2. As iterative model evolution progresses, the kinship between the top models in the model repository gradually increases. At the same time, performance improvements among the top models tend to stagnate once the model kinship reaches a certain threshold.

We reproduced the second discovery in main experiments and pointed out that the rise in model kinship is due to the convergence of local optima generated by a certain generation’s fusion. Subsequent merges abandon the search for better models and instead converge toward previously found optimal models. As an improvement, we integrate an exploration process based on model kinship into the conventional Top-k Greedy Merging, aiming to find better models by merging low-kinship models.

Reuslts

The final experimental results prove the effectiveness of this method.

References

[1] Gabriel Ilharco, Marco T´ulio Ribeiro, Mitchell Wortsman, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic. In The Eleventh International Confer- ence on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/forum?id=6t0Kwf8-jrj.