Meta Querier: Revealing Gender Stereotypes in LLM
Main Article Content
Abstract
In recent years, the rapid development of large language models (LLMs) and the widespread adoption of open-source foundational models have significantly advanced technological accessibility. LLMs generate response due to context window, which consists of current prompt and conversation history. However, LLMs still suffer from inherent stereotypes and biases in their generated content, which may lead to erroneous judgments in LLM-based applications and unintentionally perpetuate stereotypes. Most existing studies about LLM stereotypes pay more attention to single-turn conversations, which have no conversation context. This paper, however, focuses on LLMs' vulnerability in robustness for stereotypical bias in multi-turn conversations. In this paper, we propose MetaQuerier, an automated framework grounded in metamorphic testing, which employs a metamorphic-transformation strategy to construct multi-turn contextual consistent prompt pairs to evaluate stereotypical bias in LLMs. We conduct more than 260000 test prompts towards 8 famous LLMs in total. The results show that up to 58.8% of the prompt pairs generated by MetaQuerier detected violations.