|
Research
TLDR: Agentic systems, fundamental limits of AI systems, and theorems about training neural networks.
Recently, I have been exploring the fundamental limits of agentic systems and the mathematical theory one can develop around them.
For instance, with the project PoggioAI I want to understand to what extent agentic systems can automate high-quality research and, if so, identify, approach, and possibly attain this upper bound to automation.
In parallel, I continue to work on the theoretical foundations of machine learning, particularly on deep neural networks and their optimization, as I did during my PhD.
In this direction, I'm interested in (1) where optimization algorithms converge on general landscapes, (2) how hyperparameters affect that, and (3) whether we can link this to changes in the performance of the resulting models.
Selected talks
-
Edge of Stochastic Stability, One World ML 2025 -
talk
·
paper
-
Implicit Regularization, MIBR Summer School (MIT) 2024 -
talk
|
|
Articles (Agents)
This work is collected in a more accessible and continuously updated form at
PoggioAI.github.io.
|
|
Articles (Training Dynamics)
|
|
Same Error, Different Function: The Optimizer as an Implicit Prior in Financial Time Series.
Federico Vittorio Cortesi*,
Giuseppe Iannone*,
Giulia Crippa,
Prof. Tomaso Poggio,
Pierfrancesco Beneventano.
We study an underspecified regime for neural networks in financial time series, where models with the same test error can still learn qualitatively different functions.
The main point is that the optimizer acts as a meaningful implicit prior, shaping the learned response profiles and the decision-level behavior of the resulting models.
|
|
Edge of Stochastic Stability: Revisiting the Edge of Stability for SGD.
Arseniy Andreyev* and
Pierfrancesco Beneventano*,
Link to our repository.
We provide a framework to describe what EoS (Now EoSS) is for general optimizer, we show that mini-batch SGD trains at the Edge of Stochastic Stability: Batch Sharpness (expected directional mini-batch curvature) 2/learning rate,
|
|
Does SGD Seek Flatness or Sharpness? An Exactly Solvable Model.
Yizhou Xu,
Pierfrancesco Beneventano,
Isaac Chuang,
Liu Ziyin,
Does SGD really “seek flat minima”? We show that SGD has no intrinsic preference for flatness, even for stable linear networks—going against ~10 years of folklore.
Flatness emerges iff label noise is isotropic; anisotropic noise drives SGD to arbitrarily sharp solutions.
This reveals a new flattening-sharpening mechanism in late training, unrelated to standard progressive sharpening or Edge-of-Stability effects.
|
|
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks.
Pierfrancesco Beneventano,
Blake Woodworth,
We proved that GD converges for a family of linear networks even with big step size and we characterize location of convergence no matter the initialization.
|
|
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD.
Pierfrancesco Beneventano*,
Andrea Pinto*, and
Prof. Tomaso Poggio,
Neural networks often implicitly identify what input variables are relevant in the first layer.
We proved that this is the case when training with SGD (with higher speed for smaller batch or bigger step size), but it is not the case when training with vanilla GD.
|
|
On the Trajectories of SGD Without Replacement.
Pierfrancesco Beneventano.
We characterized the trajectories taken by the most commonly used training algorithm and explained why some phenomena are frequently empirically observed.
|
|
Deep neural network approximation theory for high-dimensional functions.
Pierfrancesco Beneventano,
Prof. Patrick Cheridito,
Robin Graeber,
Prof. Arnulf Jentzen,
and Benno Kuckuck.
We study the capacity of neural networks to approximate high-dimensional functions without suffering from the curse of dimensionality.
We prove that they can, on any compact domain, for a vast and new class of functions.
|
|
High-dimensional approximation spaces of artificial neural networks and applications to partial differential equations.
Pierfrancesco Beneventano,
Prof. Patrick Cheridito,
Prof. Arnulf Jentzen,
and
Philippe von Wurstemberger.
We develop a new machinery to study the capacity of neural networks to approximate high-dimensional functions without suffering from the curse of dimensionality.
We prove that this is the case, for example, of a certain family of PDEs.
|
|
Besides that...
At MIT, I am also helping organize the Poggio Lab seminar series AI: Foundations -- for Academia (and Startups).
Sometime in my PhD I got interested in the ethical aspects of my job. I contributed to the organization of multiple events to raise the awareness of the community around the responsibilities of modelers and statisticians for the high-stakes decisions and policies that are based on their work.
See
CEST-UCL Seminar series on responsible modelling and check out our conference.
I was in the committee of the Princeton AI Club (PAIC), where we hosted a lot of exciting talks featuring Yoshua Bengio, Max Welling, Chelsea Finn, Tamara Broderick, etc.
I spoke about the advent of AI and its impact on society in Italian public radio, at Zapping on Rai Radio 1, etc.
|
|