elliot logo
Computer Vision Center logo
Universitat Autonoma de Barcelona logo

Group Preference Collapse in Personalized Multimodal Large Language Models

Computer Vision Center, Universitat Autònoma de Barcelona

Abstract

Personalized multimodal large language models (MLLMs) aim to generate user-specific responses, but existing methods mainly rely on profile-level information and overlook diverse user preferences. We identify group preference collapse, where multi-user personalized MLLMs become insensitive to individual preferences and drift toward dominant population-level choices due to suppressed preference signals and unreliable preference use during generation. We propose PrefMoE, a preference-centric framework that separates stable profile information from preference-related representations. PrefMoE decomposes preferences into shared prototypes and personalized residuals, preserves individualized residuals with imbalance-aware learning, counterfactual pseudo-user augmentation, and residual decorrelation, and routes profile and preference factors through separate LoRA adaptation paths. Experiments across multiple MLLM backbones show that PrefMoE improves preference-sensitive personalization while substantially reducing preference collapse.

Illustration of group preference collapse in personalized MLLMs

Overview of PrefMoE

PrefMoE framework

Overview of PrefMoE. PrefMoE separates profile and preference representations, models preferences with shared prototypes and personalized residuals, and regularizes the residuals with contrastive learning and decorrelation. A hierarchical MoE router then activates profile- and preference-aware LoRA experts for query-dependent personalized reasoning.

MMPB-clean Dataset Statistics

MMPB-clean dataset statistics

Main Results

Method Type 0-turn 10-turn
Overall↑ Preference↑ Profile↑ Collapse↓ Overall↑ Preference↑ Profile↑ Collapse↓
Non-tuned Models
LLaVA-1.5-7BNT 0.35640.33870.37160.6228 0.31670.30670.33370.6164
LLaVA-1.5-13BNT 0.40230.36530.42220.6027 0.37160.33330.39280.6119
LLaVA-OV-72BNT 0.51000.48000.52620.5297 0.46200.43330.47740.5525
DeepSeek-VL2-TinyNT 0.36830.36130.37200.7169 0.34970.34400.35270.6895
DeepSeek-VL2-SmallNT 0.38660.38530.38710.6895 0.36420.36670.36690.6712
DeepSeek-VL2NT 0.46200.45330.46670.5890 0.41030.40670.41220.5982
Qwen2.5-VL-7BNT 0.42760.35870.56530.6484 0.38850.34330.43150.6530
Qwen2.5-VL-32BNT 0.49140.43330.52260.5708 0.44100.40270.46160.5982
Qwen2.5-VL-72BNT 0.53010.46000.56770.5023 0.50160.44530.53190.5525
Fine-tuned and PEFT Baselines
LLaVA-1.5-7BFFT 0.50720.44130.54270.3425 0.50260.44130.53550.3470
LLaVA-1.5-13BFFT 0.53010.46000.56770.3105 0.51000.44000.54770.3196
LLaVA-OV-72BFFT 0.62000.59470.63370.3151 0.56130.53200.57710.3379
DeepSeek-VL2-TinyFFT 0.47110.42130.49750.5845 0.46810.41870.49460.5936
DeepSeek-VL2-SmallFFT 0.48620.38400.54120.6530 0.48340.37470.54190.6530
DeepSeek-VL2FFT 0.58140.50930.62010.4521 0.57480.49870.61580.4566
Qwen2.5-VL-7BFFT 0.44520.30130.52260.6530 0.48070.37070.53980.7854
Qwen2.5-VL-32BFFT 0.57180.54800.58490.4795 0.49140.43330.52260.5114
Qwen2.5-VL-72BFFT 0.60140.59470.60570.4018 0.52170.50530.53050.3927
Yo'LLaVAPEFT 0.50400.48800.51250.2075 0.48400.46800.49250.2146
LLaVA-NeXT-34BPEFT 0.55990.62000.52760.2466 0.52990.59000.49760.2054
LOVA3PEFT 0.53290.56800.51400.4292 0.49790.53300.47900.4247
TG-LLaVAPEFT 0.54130.58000.52040.2329 0.50130.54000.48040.1506
PrefMoE
PrefMoE (LLaVA-1.5-7B)PEFT 0.67510.67330.67600.1233 0.59860.58400.60650.1553
PrefMoE (LLaVA-1.5-13B)PEFT 0.70120.68000.71250.1416 0.66010.62670.67810.1370
PrefMoE (LLaVA-OV-72B)PEFT 0.78930.76130.80430.1142 0.69140.66270.70680.1187
PrefMoE (DeepSeek-VL2-Tiny)PEFT 0.66010.64670.66740.2511 0.60330.59330.60860.2740
PrefMoE (DeepSeek-VL2-Small)PEFT 0.73050.58130.81080.2740 0.63820.52000.70180.2694
PrefMoE (DeepSeek-VL2)PEFT 0.79910.75200.82440.1324 0.70120.68000.71250.1826
PrefMoE (Qwen2.5-VL-7B)PEFT 0.76130.68800.80070.1781 0.65030.62530.66380.1826
PrefMoE (Qwen2.5-VL-32B)PEFT 0.79020.76400.80430.1416 0.69000.66530.70320.2100
PrefMoE (Qwen2.5-VL-72B)PEFT 0.81120.78930.82370.1096 0.73010.58130.81000.1279

Table 1. Major comparisons with SOTAs under 0-turn and 10-turn settings.

Ablation Study

0-turn 10-turn
E P I C D M Overall↑ Preference↑ Profile↑ Collapse↓ Overall↑ Preference↑ Profile↑ Collapse↓
0.55620.41200.63370.6027 0.49280.35840.56510.6347
0.60000.57000.61590.3333 0.52660.49020.54610.3653
0.62790.61330.63580.3288 0.55060.52750.56310.3607
0.63030.62530.63300.2283 0.55710.53930.56670.2603
0.63820.64670.63370.1457 0.56880.56220.57230.1781
0.67510.67330.67600.1233 0.59860.58400.60650.1553

Table 2. Component-wise ablation study of the proposed method. E, P, I, C, D, and M denote the basic user embedding module, profile factor learning, imbalance-aware residual preservation, counterfactual user augmentation, preference decorrelation, and hierarchical MoE router, respectively.

Qualitative Examples

Qualitative examples of personalized prediction cases

Figure 3. Qualitative examples. <SKS> denotes the target personalized identity or concept. Green boxes indicate correct predictions, while red marks indicate incorrect predictions.

BibTeX

@misc{lyu2026group,
  title={Group Preference Collapse in Personalized Multimodal Large Language Models},
  author={Lyu, Fan and Zhang, Wenqi and van de Weijer, Joost},
  year={2026}
}