The Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent

Zhu, Zhanxing; Wu, Jingfeng; Yu, Bing; Wu, Lei; Ma, Jinwen

Statistics > Machine Learning

arXiv:1803.00195v3 (stat)

[Submitted on 1 Mar 2018 (v1), revised 7 Mar 2018 (this version, v3), latest version 10 Jun 2019 (v5)]

Title:The Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent

Authors:Zhanxing Zhu, Jingfeng Wu, Bing Yu, Lei Wu, Jinwen Ma

View PDF

Abstract:Understanding the generalization of deep learning has raised lots of concerns recently, where the learning algorithms play an important role in generalization performance, such as stochastic gradient descent (SGD). Along this line, we particularly study the anisotropic noise introduced by SGD, and investigate its importance for the generalization in deep neural networks. Through a thorough empirical analysis, it is shown that the anisotropic diffusion of SGD tends to follow the curvature information of the loss landscape, and thus is beneficial for escaping from sharp and poor minima effectively, towards more stable and flat minima. We verify our understanding through comparing this anisotropic diffusion with full gradient descent plus isotropic diffusion (i.e. Langevin dynamics) and other types of position-dependent noise.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1803.00195 [stat.ML]
	(or arXiv:1803.00195v3 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1803.00195

Submission history

From: Zhanxing Zhu [view email]
[v1] Thu, 1 Mar 2018 03:46:36 UTC (828 KB)
[v2] Fri, 2 Mar 2018 10:19:15 UTC (827 KB)
[v3] Wed, 7 Mar 2018 08:16:16 UTC (827 KB)
[v4] Mon, 21 May 2018 15:01:58 UTC (873 KB)
[v5] Mon, 10 Jun 2019 05:19:32 UTC (8,783 KB)

Statistics > Machine Learning

Title:The Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:The Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators