Evaluating Large Language Models for Cyberbullying Behavior

Penn Today March 26, 2025 May 13, 2026

Penn researchers developed “evaluator agents” — specialized AI systems that test large language models for potential cyberbullying behavior by generating nuanced, demographically diverse prompts. The study uncovered reasoning blind spots in some leading language models that could lead to harmful cyberbullying-adjacent outputs, with implications for responsible AI deployment. Featured: Shreya Havaldar, Eric Wong, Lyle Ungar.

Read the full story →