Core Team

Baran Peters

AI Evaluation Frameworks

Bachelor Student

About

Baran Peters is a software engineer and researcher working at the intersection of language models, democratic systems, and real-world AI governance. Originally from Berlin, he has built across fast-moving startup environments, fintech product teams, and independent AI consulting, developing a strong instinct for turning technical ideas into practical systems. His background spans early hands-on software building, studies at CODE in Berlin, and time at Stanford, giving him both an entrepreneurial and research-oriented perspective. At ASL, Baran focuses on how large language models shape political understanding in high-stakes public contexts. His work examines whether these systems communicate political information in ways that are accurate, balanced, and fair across parties, languages, and national settings. He is especially interested in building evaluation frameworks that move beyond raw model performance and address deeper questions of neutrality, reasoning quality, and public trust. His profile combines strong technical execution with a clear interest in how AI systems affect democratic decision-making.

Projects

01PluralBench: Benchmarking Political Fairness in LLM Voting Advisors

Research Areas

01Political AI evaluation

02LLM fairness benchmarking

03Democratic AI systems

Connect

baranpeters

f14wn

Project

PluralBench: Benchmarking Political Fairness in LLM Voting Advisors

PluralBench investigates how large language models communicate political information in Voting Advice Applications, digital tools that help citizens understand party positions before elections. As these systems increasingly adopt conversational LLM interfaces, they begin to shape how users interpret policy issues, compare political actors, and form judgments in democratic settings. Baran’s project builds a benchmark to systematically evaluate whether these models represent political information fairly across parties, countries, languages, and retrieval setups. Rather than only asking whether a model gives the correct answer, the project examines whether its explanations are grounded in evidence, whether they remain neutral in tone, and whether they apply consistent reasoning standards across the political spectrum. The result is an evaluation framework designed for realistic deployment conditions, where trustworthiness depends not only on prediction quality but also on how political information is framed. Scientifically, the project pushes beyond conventional NLP evaluation by focusing on the epistemic quality of model outputs. PluralBench introduces a dual-fairness framework that captures both predictive competence and representational fairness, allowing researchers to study whether models perform equally well across parties while also assessing whether generated language remains objective and balanced. This creates a structured foundation for research on political bias in LLMs under real-world use conditions. By comparing different retrieval regimes and multilingual settings, the benchmark also opens the door to more detailed investigations into where bias emerges and how it propagates through the full system stack. In that sense, the project contributes not just a dataset or scorecard, but an infrastructure for a broader scientific agenda around interpretability, fairness, and AI-mediated civic communication. The commercial relevance is especially strong because political AI systems are moving into a more regulated environment. Under frameworks such as the EU AI Act, applications that influence political decision-making are likely to face much stricter scrutiny, yet there is currently no widely accepted standard for auditing fairness in these systems. PluralBench addresses exactly that gap. Its model-agnostic testing pipeline could support civic technology platforms, LLM providers, and organizations deploying AI in regulated public-interest domains that need defensible, research-backed evaluation methods. In practice, this creates a clear path toward an external benchmarking and assurance layer for politically sensitive AI products, helping organizations demonstrate reliability, transparency, and responsible deployment in settings where public trust is essential.

Other team members