Group Preference Collapse in Personalized Multimodal Large Language Models

Lyu, Fan; Zhang, Wenqi; van de Weijer, Joost

Abstract

Personalized multimodal large language models (MLLMs) aim to generate user-specific responses, but existing methods mainly rely on profile-level information and overlook diverse user preferences. We identify group preference collapse, where multi-user personalized MLLMs become insensitive to individual preferences and drift toward dominant population-level choices due to suppressed preference signals and unreliable preference use during generation. We propose PrefMoE, a preference-centric framework that separates stable profile information from preference-related representations. PrefMoE decomposes preferences into shared prototypes and personalized residuals, preserves individualized residuals with imbalance-aware learning, counterfactual pseudo-user augmentation, and residual decorrelation, and routes profile and preference factors through separate LoRA adaptation paths. Experiments across multiple MLLM backbones show that PrefMoE improves preference-sensitive personalization while substantially reducing preference collapse.

Illustration of group preference collapse in personalized MLLMs

Overview of PrefMoE

Overview of PrefMoE. PrefMoE separates profile and preference representations, models preferences with shared prototypes and personalized residuals, and regularizes the residuals with contrastive learning and decorrelation. A hierarchical MoE router then activates profile- and preference-aware LoRA experts for query-dependent personalized reasoning.

MMPB-clean Dataset Statistics

Main Results

Method	Type	0-turn				10-turn
Method	Type	Overall↑	Preference↑	Profile↑	Collapse↓	Overall↑	Preference↑	Profile↑	Collapse↓
Non-tuned Models
LLaVA-1.5-7B	NT	0.3564	0.3387	0.3716	0.6228	0.3167	0.3067	0.3337	0.6164
LLaVA-1.5-13B	NT	0.4023	0.3653	0.4222	0.6027	0.3716	0.3333	0.3928	0.6119
LLaVA-OV-72B	NT	0.5100	0.4800	0.5262	0.5297	0.4620	0.4333	0.4774	0.5525
DeepSeek-VL2-Tiny	NT	0.3683	0.3613	0.3720	0.7169	0.3497	0.3440	0.3527	0.6895
DeepSeek-VL2-Small	NT	0.3866	0.3853	0.3871	0.6895	0.3642	0.3667	0.3669	0.6712
DeepSeek-VL2	NT	0.4620	0.4533	0.4667	0.5890	0.4103	0.4067	0.4122	0.5982
Qwen2.5-VL-7B	NT	0.4276	0.3587	0.5653	0.6484	0.3885	0.3433	0.4315	0.6530
Qwen2.5-VL-32B	NT	0.4914	0.4333	0.5226	0.5708	0.4410	0.4027	0.4616	0.5982
Qwen2.5-VL-72B	NT	0.5301	0.4600	0.5677	0.5023	0.5016	0.4453	0.5319	0.5525
Fine-tuned and PEFT Baselines
LLaVA-1.5-7B	FFT	0.5072	0.4413	0.5427	0.3425	0.5026	0.4413	0.5355	0.3470
LLaVA-1.5-13B	FFT	0.5301	0.4600	0.5677	0.3105	0.5100	0.4400	0.5477	0.3196
LLaVA-OV-72B	FFT	0.6200	0.5947	0.6337	0.3151	0.5613	0.5320	0.5771	0.3379
DeepSeek-VL2-Tiny	FFT	0.4711	0.4213	0.4975	0.5845	0.4681	0.4187	0.4946	0.5936
DeepSeek-VL2-Small	FFT	0.4862	0.3840	0.5412	0.6530	0.4834	0.3747	0.5419	0.6530
DeepSeek-VL2	FFT	0.5814	0.5093	0.6201	0.4521	0.5748	0.4987	0.6158	0.4566
Qwen2.5-VL-7B	FFT	0.4452	0.3013	0.5226	0.6530	0.4807	0.3707	0.5398	0.7854
Qwen2.5-VL-32B	FFT	0.5718	0.5480	0.5849	0.4795	0.4914	0.4333	0.5226	0.5114
Qwen2.5-VL-72B	FFT	0.6014	0.5947	0.6057	0.4018	0.5217	0.5053	0.5305	0.3927
Yo'LLaVA	PEFT	0.5040	0.4880	0.5125	0.2075	0.4840	0.4680	0.4925	0.2146
LLaVA-NeXT-34B	PEFT	0.5599	0.6200	0.5276	0.2466	0.5299	0.5900	0.4976	0.2054
LOVA3	PEFT	0.5329	0.5680	0.5140	0.4292	0.4979	0.5330	0.4790	0.4247
TG-LLaVA	PEFT	0.5413	0.5800	0.5204	0.2329	0.5013	0.5400	0.4804	0.1506
PrefMoE
PrefMoE (LLaVA-1.5-7B)	PEFT	0.6751	0.6733	0.6760	0.1233	0.5986	0.5840	0.6065	0.1553
PrefMoE (LLaVA-1.5-13B)	PEFT	0.7012	0.6800	0.7125	0.1416	0.6601	0.6267	0.6781	0.1370
PrefMoE (LLaVA-OV-72B)	PEFT	0.7893	0.7613	0.8043	0.1142	0.6914	0.6627	0.7068	0.1187
PrefMoE (DeepSeek-VL2-Tiny)	PEFT	0.6601	0.6467	0.6674	0.2511	0.6033	0.5933	0.6086	0.2740
PrefMoE (DeepSeek-VL2-Small)	PEFT	0.7305	0.5813	0.8108	0.2740	0.6382	0.5200	0.7018	0.2694
PrefMoE (DeepSeek-VL2)	PEFT	0.7991	0.7520	0.8244	0.1324	0.7012	0.6800	0.7125	0.1826
PrefMoE (Qwen2.5-VL-7B)	PEFT	0.7613	0.6880	0.8007	0.1781	0.6503	0.6253	0.6638	0.1826
PrefMoE (Qwen2.5-VL-32B)	PEFT	0.7902	0.7640	0.8043	0.1416	0.6900	0.6653	0.7032	0.2100
PrefMoE (Qwen2.5-VL-72B)	PEFT	0.8112	0.7893	0.8237	0.1096	0.7301	0.5813	0.8100	0.1279

Table 1. Major comparisons with SOTAs under 0-turn and 10-turn settings.

Ablation Study

						0-turn				10-turn
E	P	I	C	D	M	Overall↑	Preference↑	Profile↑	Collapse↓	Overall↑	Preference↑	Profile↑	Collapse↓
✓						0.5562	0.4120	0.6337	0.6027	0.4928	0.3584	0.5651	0.6347
✓	✓					0.6000	0.5700	0.6159	0.3333	0.5266	0.4902	0.5461	0.3653
✓	✓	✓				0.6279	0.6133	0.6358	0.3288	0.5506	0.5275	0.5631	0.3607
✓	✓	✓	✓			0.6303	0.6253	0.6330	0.2283	0.5571	0.5393	0.5667	0.2603
✓	✓	✓	✓	✓		0.6382	0.6467	0.6337	0.1457	0.5688	0.5622	0.5723	0.1781
✓	✓	✓	✓	✓	✓	0.6751	0.6733	0.6760	0.1233	0.5986	0.5840	0.6065	0.1553

Table 2. Component-wise ablation study of the proposed method. E, P, I, C, D, and M denote the basic user embedding module, profile factor learning, imbalance-aware residual preservation, counterfactual user augmentation, preference decorrelation, and hierarchical MoE router, respectively.

Qualitative Examples

Figure 3. Qualitative examples. <SKS> denotes the target personalized identity or concept. Green boxes indicate correct predictions, while red marks indicate incorrect predictions.

BibTeX

@misc{lyu2026group,
  title={Group Preference Collapse in Personalized Multimodal Large Language Models},
  author={Lyu, Fan and Zhang, Wenqi and van de Weijer, Joost},
  year={2026}
}