Evaluating Large Language Models for Cyberbullying Behavior
Penn Today
March 26, 2025
April 08, 2026
Penn researchers developed “evaluator agents” — specialized AI systems that test large language models for potential cyberbullying behavior by generating nuanced, demographically diverse prompts. The study uncovered reasoning blind spots in some leading language models that could lead to harmful cyberbullying-adjacent outputs, with implications for responsible AI deployment. Featured: Shreya Havaldar, Eric Wong, Lyle Ungar.