An Investigation of Prompt Variations for Zero-shot LLM-based Rankers
This paper provides a systematic understanding of the impact of the specific components and wordings used in prompts on the effectiveness of the rankers based on zero-shot Large Language models. There are several zero shot ranking methods based on LLMs. These methods mainly differ in the following aspects. (1) Ranking algorithm they implement. (2) the backbone LLMs used. (3) the components and wordings used in the prompts. Before this work, it was unclear whether performance differences are due to the underlying ranking algorithm, or because of spurious factors such as better choice of words used in prompts. Through the large-scale experimentation and analysis provided in this paper, it was found that ranking algorithms do contribute to differences between methods for zero-shot LLM ranking. However, so do the LLM backbones – but even more importantly, the choice of prompt components and wordings affect the ranking. In fact, in experiments, it was found that, at times, these latter elements have more impact on the ranker’s effectiveness than the actual ranking algorithms, and that differences among ranking methods become more blurred when prompt variations are considered.