《Learning to Improve Out-of-Distribution Generalization via Self-adaptive Language Masking》

Aug 29, 2024·
Junhong Liu
Junhong Liu
· 1 min read

1. Abstract

The LLM might overfit on the lexical biases when fine-tuning for downstream tasks, which affects the performance of the LLM, especially for out-of-distribution (OOD) data. To address this issue, the paper proposed a self-adaptive language masking (AdaLMask) paradigm that is used on fine-tuning th epre-trained LLM.

Introduction

Normally, the LLMs are train with the closed-world assumption that the test data have the same distribution of the training data, which is known as in-distribution (ID) data. But in real world (or in open-world scenario) the test data often has distribution shift from training data, i.e. out-of-distribution (OOD) data.

  • AdaLMask paradigm can imporove the robustness of lexical biases and the OOD generalization on downstream tasks.
  • Representation-invariant (RInv) fine-tuning objective is used to ensure the AdaLMask masked words are semantically lossless.

Background