
Share
Researchers unveil a two-stage Transformer model that enhances emotional depth in piano performances by separately handling valence through lead sheet composition and arousal via performance attributes like tempo and articulation.
In a recent paper, researchers have introduced a novel two-stage Transformer-based model designed to generate emotion-driven piano performances. This approach addresses the limitations of previous end-to-end models that struggled with accurate emotion modeling. The first stage focuses on valence modeling using lead sheet composition (melody + chord), while the second stage tackles arousal modeling by incorporating performance attributes like articulation, tempo, and velocity.
The model is divided into two stages:
Valence Modeling (Stage 1):
Arousal Modeling (Stage 2):
The functional representation is an alternative to the popular REMI (REpresentational Music Interface) method. It encodes both melody and chords using Roman numerals relative to musical keys, which helps in capturing the interactions among notes, chords, and tonalities.
The researchers conducted experiments to evaluate the effectiveness of their framework and functional representation. The results demonstrated significant improvements in emotion modeling:

Mean Opinion Score (MOS):
Confusion Matrices:
To illustrate the capabilities of the proposed framework, the researchers provided generation samples from three models:
The following examples demonstrate piano performances generated from the same lead sheet but with different arousal levels:
These samples highlight the model's ability to generate diverse emotional expressions while maintaining the same musical structure.
The two-stage Transformer-based model and functional representation offer significant advancements in emotion-driven piano performance generation. By separating valence and arousal modeling and considering musical keys, this approach provides more accurate and flexible control over emotional expression in generated performances.
Tags
Original Sources
↗ https://emo-disentanger.github.io/?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
1 August 2024
88 articles
Related Articles
Related Articles
More Stories