
I am a member of technical staff at the U.S. Center for AI Standards and Innovation (CAISI), leading the Agent Security team, and supported by a TechCongress AI Safety fellowship. I’m based in Princeton, NJ.
Last spring, I completed my PhD at Harvard as a member of the Machine Learning Foundations and Theory of Computation groups. I was advised by Sham Kakade and Leslie Valiant (and was also mentored by Boaz Barak), and received a NSF Graduate Research Fellowship. In the summer of 2021, I interned with the ML group at Microsoft Research NYC, where I worked with Cyril Zhang and Surbhi Goel. Previously, I undergraduated in math at Princeton.
My PhD research was focused on the scientific study of deep learning, motivated by the following claims:
- If we understand AI systems better, we will have a better shot at making them safer, foreseeing future technological developments, and designing well-informed policies.
- It is crucial to build understanding of cutting-edge methods, and for our insights to generalize across changes in algorithms and scale.
- The shortest path to scientific understanding involves a blend of both theory and empirics, on both clean toy models and real messy systems.
I’ve also worked on game theory and computational complexity theory.
benedelman100@gmail.com | Google Scholar | LinkedIn
Research
Transcendence: Generative Models Can Outperform The Experts That Train Them 
Edwin Zhang, Vincent Zhu, Naomi Saphra, Anat Kleiman, BE, Milind Tambe, Sham M. Kakade, and Eran Malach
 
NeurIPS 2024 | Blog post
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains 
BE, Ezra Edelman, Surbhi Goel, Eran Malach, and Nikolaos Tsilivis 
NeurIPS 2024 | Blog post
Foundational Challenges in Assuring Alignment and Safety of Large Language Models 
Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, BE, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, and 27 others 
TMLR, 2024 | Webpage
Distinguishing the Knowable from the Unknowable with Language Models 
Gustaf Ahdritz, Tian Qin, Nikhil Vyas, Boaz Barak, and BE 
ICML 2024 | Blog post
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models 
Hanlin Zhang, BE, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak 
ICML 2024, and Secure & Trustworthy LLMs Workshop @ ICLR 2024 | Blog post
Feature Emergence via Margin Maximization: Case Studies in Algebraic Tasks 
Depen Morwani, BE, Costin-Andrei Oncescu, Rosie Zhao, and Sham Kakade 
ICLR 2024 (spotlight) | Blog post
Pareto Frontiers in Deep Feature Learning: Data, Compute, Width, and Luck 
BE, Surbhi Goel, Sham Kakade, Eran Malach, and Cyril Zhang 
NeurIPS 2023 (spotlight)
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit 
Boaz Barak, BE, Surbhi Goel, Sham Kakade, Eran Malach, and Cyril Zhang 
NeurIPS 2022
Inductive Biases and Variable Creation in Self-Attention Mechanisms 
BE, Surbhi Goel, Sham Kakade, and Cyril Zhang 
ICML 2022
The Multiplayer Colonel Blotto Game 
Enric Boix-Adserà, BE, and Siddhartha Jayanti 
Games and Economic Behavior (full version), EC 2020 (extended abstract)
Causal Strategic Linear Regression 
Yonadav Shavit, BE, and Brian Axelrod 
ICML 2020
SGD on Neural Networks Learns Functions of Increasing Complexity 
Preetum Nakkiran, Gal Kaplun, Dimitris Kalimeris, Tristan Yang, BE, Fred Zhang, and Boaz Barak 
NeurIPS 2019 (spotlight)
Matrix Rigidity and the Croot-Lev-Pach Lemma 
BE, Zeev Dvir 
Theory of Computing, 2019
Theses
Combinatorial Tasks as Model Systems of Deep Learning 
PhD Thesis
A Proof of Strassen’s Degree Bound for Homogeneous Arithmetic Circuits 
Undergraduate Senior Thesis
Teaching
Spring 2021 Teaching fellow for CS 229br: Biology and Complexity 
 Received Certificate of Distinction in Teaching from Harvard University
Spring 2020 Teaching fellow for CS 228: Computational Learning Theory 
Gave three lectures on “Mysteries of Generalization in Deep Learning”
Tutorials
How to Achieve Both Transparency and Accuracy in Predictive Decision Making: An Introduction to Strategic Prediction 
with Chara Podimata and Yonadav Shavit 
FAccT 2021
Recent talks
January & March 2024 Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models 
NYC Crypto Day, Boston Crypto Day
February 2023 Studies in feature learning through the lens of sparse boolean functions 
Seminar in Mathematics, Physics and Machine Learning, University of Lisbon
November 2022 Hidden progress in deep learning 
Statistical Learning Theory and Applications, MIT course
September 2022 Sparse feature emergence in deep learning 
Alg-ml seminar, Princeton University
May 2022 Towards demystifying the inductive bias of attention mechanisms 
Collaboration on the Theoretical Foundations of Deep Learning
Feb 2022 Towards demystifying transformers & attention 
New Technologies in Mathematics Seminar, Harvard Center of Mathematical Sciences and Applications