
Share
Researchers challenge the necessity of normalization layers in neural networks with a simple yet effective technique called Dynamic Tanh, which matches and surpasses performance in Transformer models.
Normalization layers have been a staple in modern neural networks, often considered essential for maintaining stable training and improving performance. However, a new paper by Jiachen Zhu, Xinlei Chen, Kaiming He, Yann LeCun, and Zhuang Liu challenges this conventional wisdom. They introduce Dynamic Tanh (DyT), an element-wise operation that can replace normalization layers in Transformers while achieving the same or better results.
The key innovation is DyT, defined as ( \text{DyT}(x) = \tanh(\alpha x) ). This simple function acts as a drop-in replacement for normalization layers in Transformers. The authors observed that layer normalization often produces tanh-like, S-shaped input-output mappings, which inspired the use of DyT.
This work has several implications for practitioners and researchers:

The authors validated DyT across a diverse set of tasks:
The paper provides several benchmarks to support their claims:
The introduction of Dynamic Tanh (DyT) offers a compelling alternative to traditional normalization layers in Transformers. By simplifying the model architecture and maintaining or improving performance, this technique has the potential to influence future research and practical applications in deep learning.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
17 March 2025
133 articles
Related Articles
Related Articles
More Stories