PhD Defense

Modeling Symbolic Music with Natural Language Processing Approaches

Stream  

Slides  PDF  Reveal.js (WIP)

Manuscript  PDF

Abstract
Music is often described as a language because of its similarities to natural language. These include their respective representations through symbolic music notation and textual form. Therefore, the field of Music Information Retrieval (MIR) has often borrowed several tools from the Natural Language Processing (NLP) field to adapt them to process symbolic music data. In particular, this phenomenon has been increasingly popular with the breakthrough of Transformer models in the NLP field.

This thesis first provides a structured overview of adaptations of NLP methods developed in the MIR field for symbolic music processing. They are presented along three axes, each addressing the use of diverse representations of symbolic music at different levels. Symbolic music represented as sequential data has lead to the development of several tokenization strategies, which we propose to organize within a unified taxonomy. These representations are subsequently processed through models, such as recurrent or attention-based architectures initially developed for text data, giving rise to multiple adaptations for symbolic music processing. Finally, these abstract representations are used to perform tasks, where both parallels and distinctive characteristics emerge between MIR and NLP.

These aspects then structure the three technical contributions of this thesis. First, we study the expressiveness of sequential representations of music through the development of interval-based tokenization strategies, and the analysis of a subword tokenization strategy, Byte-Pair Encoding, applied to symbolic music tokens. We then propose a framework for model explainability which leads to the analysis of the attention mechanism of a Transformer-based model trained for functional harmony analysis. Finally, we develop a model adapted from NLP tools for a task of re-orchestration, framed as a case of multi-track music generation.

Ultimately, this thesis defends that NLP methods first remains a toolbox from which MIR studies can take some tools from. Beyond the analogies between music and natural language, the main motivation guiding a MIR study should be musical questions.

Jury

Reviewers    
M. Xavier HINAUT Inria Bordeaux Reviewer
Ms. Cheng-Zhi Anna HUANG Massachusetts Institute of Technology Reviewer
Examiners    
Ms. Chloé BRAUD Institut de Recherche en Informatique de Toulouse Examiner
M. Emmanouil BENETOS Queen Mary University of London Examiner
M. Marius BILASCO Université de Lille Examiner
M. Patrick BAS Université de Lille Examiner ; Jury President
Thesis advisors    
M. Marc TOMMASI Université de Lille Co-director
M. Louis BIGO Université de Bordeaux Co-director
Ms. Mikaela KELLER Université de Lille Co-advisor