Pierfrancesco Beneventano

Hello! I'm a Postdoc at MIT with Prof. Tomaso Poggio. I work on Machine Learning Theory and the mathematics to develop it.

Before I was a PhD student at Princeton University, fortunate to be advised by Prof. Boris Hanin and Prof. Jason D. Lee.

I also did research at INRIA with Prof. Blake Woodworth, at AWS AI Labs, and at ETH Math with Prof. Arnulf Jentzen!

This fall '25 I'm a proud instructor of Statistical Learning Theory at MIT.

Please hit me up if you want to chat or work together!I have a number of projects on my list and I'm always open to new ideas and collaborations!

pierb at mit dot edu

Research

I work on the theoretical foundations of machine learning, particularly on deep neural networks and their optimization. More precisely, in my PhD I studied (1) where optimization algorithms converge on general landscapes, (2) how hyper parameters affect that, and (3) if we can link this to changes in performance of the resulting models. I recently started working also on the instabilities of training. I'm also getting interested in certification of neural networks and a lot of other topics in machine learning theory.

If you're interested you can watch these talks of mine:

Talk on my work on Edge of Stochastic Stability at One World ML, 2025. The talk is a good fast paced companion to the article.
Lecture intended for students in adjacent subjects on my work on implicit regularization at the MIBR Summer School (MIT), 2024.

Please hit me up if you want to chat or work together! Bonus points if you're in Boston! I have a number of projects on my list and I'm always open to new ideas and collaborations!

Preprints

	Edge of Stochastic Stability: Revisiting the Edge of Stability for SGD. Arseniy Andreyev* and Pierfrancesco Beneventano, We managed to understand in what sense also mini-batch SGD* trains at the Edge of stability: The average largest eigenvalue of mini-batch Hessians stabilizes at 2/learning rate, pushing the full-batch sharpness down. This explains SGD's implicit regularization toward flatter minima and provides a theoretical basis for key empirical phenomena in small-batch training.
	Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks. Pierfrancesco Beneventano, Prof. Blake Woodworth, We found out that neural networks often implicitly identify what input variables are relevant in the first layer. We proved that this is the case when training with SGD (with higher speed for smaller batch or bigger step size), but it is not the case when training with vanilla GD.
	How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD. Pierfrancesco Beneventano, Andrea Pinto, and Prof. Tomaso Poggio, We found out that neural networks often implicitly identify what input variables are relevant in the first layer. We proved that this is the case when training with SGD (with higher speed for smaller batch or bigger step size), but it is not the case when training with vanilla GD.
	On the Trajectories of SGD Without Replacement. Pierfrancesco Beneventano. We characterized the trajectories taken by the most commonly used training algorithm and explained why some phenomena are frequently empirically observed.
$PontTuset$	Deep neural network approximation theory for high-dimensional functions. Pierfrancesco Beneventano, Prof. Patrick Cheridito, Robin Graeber, Prof. Arnulf Jentzen, and Benno Kuckuck. We study the capacity of neural networks to approximate high-dimensional functions without suffering from the curse of dimensionality. We prove that they can, on any compact domain, for a vast and new class of functions.
$PontTuset$	High-dimensional approximation spaces of artificial neural networks and applications to partial differential equations. Pierfrancesco Beneventano, Prof. Patrick Cheridito, Prof. Arnulf Jentzen, and Philippe von Wurstemberger. We develop a new machinery to study the capacity of neural networks to approximate high-dimensional functions without suffering from the curse of dimensionality. We prove that this is the case, for example, of a certain family of PDEs.

Besides that...

Sometime in my PhD I got interested in the ethical aspects of my job. I contributed to the organization of multiple events to raise the awareness of the community around the responsibilities of modelers and statisticians for the high-stakes decisions and policies that are based on their work. See CEST-UCL Seminar series on responsible modelling and check out our conference.
I was in the committee of the Princeton AI Club (PAIC), where we hosted a lot of exciting talks featuring Yoshua Bengio, Max Welling, Chelsea Finn, Tamara Broderick, etc.
I spoke about the advent of AI and its impact on society in Italian public radio, at Zapping on Rai Radio 1, etc.

Teaching

Full list & details →

Statistical Learning Theory and Applications -- MIT (Fall 2025)

Last modified on Jun 7th 2025.

Template credits to Jon Barron!