I work on the theoretical foundations of machine learning, particularly on deep neural networks and their optimization.
More precisely, in my PhD I studied (1) where optimization algorithms converge on general landscapes, (2) how hyper parameters affect that, and (3) if we can link this to changes in performance of the resulting models.
I'm also getting interested in certification of neural networks and a lot of other topics in machine learning theory.
If you're interested you can watch these talks of mine:
Please hit me up if you want to chat or work together! Bonus points if you're in Boston! I have a number of projects on my list and I'm always open to new ideas and collaborations!
We managed to understand in what sense also mini-batch SGD trains at the Edge of stability: The average largest eigenvalue of mini-batch Hessians stabilizes at 2/learning rate,
pushing the full-batch sharpness down. This explains SGD's implicit regularization toward flatter minima and provides a theoretical basis for key empirical phenomena in small-batch training.
We found out that neural networks often implicitly identify what input variables are relevant in the first layer.
We proved that this is the case when training with SGD (with higher speed for smaller batch or bigger step size), but it is not the case when training with vanilla GD.
We found out that neural networks often implicitly identify what input variables are relevant in the first layer.
We proved that this is the case when training with SGD (with higher speed for smaller batch or bigger step size), but it is not the case when training with vanilla GD.
We characterized the trajectories taken by the most commonly used training algorithm and explained why some phenomena are frequently empirically observed.
We study the capacity of neural networks to approximate high-dimensional functions without suffering from the curse of dimensionality.
We prove that they can, on any compact domain, for a vast and new class of functions.
We develop a new machinery to study the capacity of neural networks to approximate high-dimensional functions without suffering from the curse of dimensionality.
We prove that this is the case, for example, of a certain family of PDEs.
Besides that...
Sometime in my PhD I got interested in the ethical aspects of my job. I contributed to the organization of multiple events to raise the awareness of the community around the responsibilities of modelers and statisticians for the high-stakes decisions and policies that are based on their work.
See
CEST-UCL Seminar series on responsible modelling and check out our conference.
I was in the committee of the Princeton AI Club (PAIC), where we hosted a lot of exciting talks featuring Yoshua Bengio, Max Welling, Chelsea Finn, Tamara Broderick, etc.
I spoke about the advent of AI and its impact on society in Italian public radio, at Zapping on Rai Radio 1, etc.