Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models

Nimet Beyza Bozdag, Shuhaib Mehri, Gokhan Tur, Dilek Hakkani-Tur

University of Illinois Urbana-Champaign

ConvAI Lab

Abstract

Large Language Models (LLMs) demonstrate persuasive capabilities that rival human-level persuasion. While these capabilities can be used for social good, they also present risks of potential misuse. Moreover, LLMs' susceptibility to persuasion raises concerns about alignment with ethical principles. To study these dynamics, we introduce Persuade Me If You Can (PMIYC), an automated framework for evaluating persuasion through multi-agent interactions. Here, Persuader agents engage in multi-turn conversations with the Persuadee agents, allowing us to measure LLMs' persuasive effectiveness and their susceptibility to persuasion. We conduct comprehensive evaluations across diverse LLMs, ensuring each model is assessed against others in both subjective and misinformation contexts. We validate the efficacy of our framework through human evaluations and show alignment with prior work. PMIYC offers a scalable alternative to human annotation for studying persuasion in LLMs. Through PMIYC, we find that Llama-3.3-70B and GPT-4o exhibit similar persuasive effectiveness, outperforming Claude 3 Haiku by 30%. However, GPT-4o demonstrates over 50% greater resistance to persuasion for misinformation compared to Llama-3.3-70B. These findings provide empirical insights into the persuasive dynamics of LLMs and contribute to the development of safer AI systems.

PMIYC Framework Overview. A t-turn interaction between a PERSUADER and a PERSUADEE in PMIYC. The PERSUADER can addresses the PERSUADEE's concerns, while the PERSUADEE reports its agreement scores.

Persuasion in LLMs

Large Language Models (LLMs) have become increasingly persuasive, demonstrating human-level abilities in influencing opinions and shaping discourse. While these capabilities have been leveraged for social good—such as promoting public health and prosocial behaviors—they also pose significant risks, including the spread of misinformation and undue manipulation. Additionally, LLMs themselves can be persuaded, making them vulnerable to harmful influences that may compromise alignment and ethical safeguards.

Despite the growing importance of persuasion in AI, existing evaluation methods rely heavily on human assessments, which are costly, time-intensive, and difficult to scale. To address this gap, we introduce Persuade Me If You Can (PMIYC), an automated framework designed to evaluate both the persuasive effectiveness of LLMs (how well they persuade) and their susceptibility to persuasion (how easily they change their stance) in dynamic conversational settings. Our key contributions are:

PMIYC, a framework for automatically evaluating both the persuasive effectiveness and susceptibility to persuasion of LLMs.
A comprehensive analysis of persuasion in LLMs across multiple dimensions, including single-turn vs. multi-turn conversations and subjective vs. misinformation contexts.

By providing a scalable and automated evaluation framework, PMIYC enables a deeper understanding of persuasion in LLMs, helping to identify vulnerabilities and promote the development of safer, more transparent, and ethically responsible AI systems.

How does PMIYC work?

PMIYC is a framework for evaluating both the persuasive effectiveness of large language models (LLMs) and their susceptibility to persuasion. It simulates structured conversations between two AI agents: a Persuader, whose goal is to convince the other, and a Persuadee, who starts with an initial stance on a claim and may adjust their position throughout the exchange. The Persuadee's stance is measured using an agreement score, which ranges from 1 (Completely Oppose) to 5 (Completely Support). To assess persuasion dynamics, we track the normalized change in agreement, ensuring fair comparisons across different starting positions.

Through PMIYC, we evaluate:

Persuasive Effectiveness: Measures how well a model can influence others. It is calculated as the average normalized change in agreement when the model acts as the Persuader.
Susceptibility to Persuasion: Measures how easily a model changes its own stance. It is calculated as the average normalized change in agreement when the model acts as the Persuadee.

Persuasive Effectiveness and Susceptibility to Persuasion

Average normalized change (NC) in Persuaadee's agreement for different Persuader-Persuadee pairs in subjective single-turn conversations.

Our first set of experiments focuses on single-turn persuasion, where the Persuader has only one opportunity to present an argument and influence the Persuadee's stance. We evaluate model interactions on 961 subjective claims spanning political, ethical, and social topics, ensuring a diverse range of opinions. The heatmap above presents the average normalized change (NC) in agreement scores across different Persuader-Persuadee pairings. Examining each column reveals which Persuaders exert the greatest influence over a given Persuadee, while each row highlights which models are more susceptible to persuasion. We find that larger models tend to be more effective persuaders, likely due to their superior reasoning capabilities. Notably, GPT-4o-mini achieves persuasion levels comparable to much larger models, while Claude 2 demonstrates unexpectedly weak persuasiveness. Llama-3.3-70B-Instruct emerges as the most susceptible, showing significant shifts in agreement, whereas GPT-4o appears to be more resistant to persuasion.

Multi-Turn Persuasion and Persuasion in Different Domains

(a) Average effectiveness of the Persuader across single-turn vs. multi-turn and subjective vs. misinformation interactions. (b) Average susceptibility of the Persuadee under the same conditions.

Average NC in Persuadee's agreement for different Persuader-Persuadee pairs in subjective multi-turn conversations.

Average NC in Persuadee's agreement for different Persuader-Persuadee pairs in misinformation multi-turn conversations.

We extend our experiments to multi-turn conversations, where the Persuader has more opportunities to present arguments. As expected, we observe that persuasive effectiveness increases in multi-turn settings while maintaining the same relative ranking among Persuaders. Additionally, models generally show a greater susceptibility to persuasion when engaged in longer interactions.

We also examine persuasion across different domains, comparing subjective claims with misinformation contexts. GPT models demonstrate strong resistance to persuasion when misinformation is involved, with GPT-4o showing over 50% greater resistance than the next most resilient Persuadee. Interestingly, while susceptibility varies significantly between domains, the effectiveness of Persuaders remains relatively stable, indicating that persuasion strategies generalize well across different contexts.

How Is Agreement Affected Throughout a Conversation?

Average Persuadee agreement at each Persuadee turn of the conversation for a given Persuader. Solid lines indicate performance with subjective claims, dashed lines represent performance with misinformation claims. After the Persuadee's fourth turn (orange line), it is prompted to make a final decision (green line).

In multi-turn conversations, agreement tends to increase as the exchange progresses, with Persuaders exerting the most influence during the first two turns before their effectiveness diminishes. However, in misinformation settings, a different pattern emerges—Persuadee agreement often declines in the final decision step. When prompted to finalize their stance and reminded of the misinformation claim, Persuadees frequently reassess their position, leading to a drop in agreement. This suggests that critical reflection at the end of a conversation can help mitigate susceptibility to misinformation.

Are Persuaders More Effective When in Agreement with a Claim?

Average NC in Persuadee's agreement based on Persuader's agreement with the claim in multi-turn subjective (blue), and misinformation (red) settings. Persuaders are classified as Opposing (agreement score: 1–2), Neutral (3), or Supporting (4–5).

Persuaders who already agree with a claim are significantly more effective in convincing others, as shown across both subjective and misinformation contexts. Persuaders who initially oppose a claim are less successful in shifting opinions. This finding highlights the role of belief alignment in persuasion and suggests that models advocating positions they internally support may be inherently more convincing.