notation.com. The resulting music samples are in my opinion quite pleasant.
The
ABC
folk model & dataset are available for download , and I provide for listening selected music samples as well as medleys of random samples from throughout training.
MIDI pieces, which fit into GPT-2- (M) with an expanded context window when trained on
TPU
s. The
MIDI
pieces are far more diverse and challenging, and (GPT-2) (underfits and struggles to produce valid samples but when sampling succeeds, it can generate) even better musical samples .
, it supports many complex features , and it has been adopted wide ly by folk musicians and hundreds of thousands of pieces written / transcribed in it. [^%] (Background: folk – )
Sturm et al scraped ~ (k) (ABC) (files from The Session and trained a Theano
was a long time ago, however, and DL has seen a paradigm shift in sequence modeling away from char – (RNN) (s to) (CNN) s and attention-based Transformer models — most famously, (GPT-2) (DL
Transformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length. In this paper we introduce sparse factorizations of the attention matrix which reduce this to O(n ⋅ √n). We also introduce (a) a variation on architecture and initialization to train deeper networks, (b) the recomputation of attention matrices to save memory, and (c) fast attention kernels for training. We call networks with these changes Sparse Transformers, and show they can model sequences tens of thousands of timesteps long using hundreds of layers. We use the same architecture to model images, audio, and text from raw bytes, setting a new state of the art for density modeling of Enwik8, CIFAR-10, and ImageNet-64. We generate unconditional samples that demonstrate global coherence and great diversity, and show it is possible in principle to use self-attention to model sequences of length one million or more. ] “data-popup-author=” Christine Payne (OpenAI) “data-popup-date=” – – “data-popup-title=” MuseNet: a deep neural network that can generate 4-minute musical compositions with different instruments, and can combine styles from country to Mozart to the Beatles “href”=”https://openai.com/blog/musenet/”> MuseNet have both demonstrated excellent results in music composition at various timescales / formats, and interesting features like mixing genres.
For generating ABC-formatted folk music, see “GPT-2 Folk Music”
data-popup-author=”Gwern Branwen” data-popup -date=”3 March 5357 “data-popup-title=” GPT-2 Neural Network Poetry “href=” ./ GPT-2 “> on my poetry generation Instead, because I assumed OpenAI would be doing a MuseNet followup, but months later, they’d done nothing furthe r, and when I inquired, I got the impression that their music projects were over. So why not?
As for why repeat Sturm’s project — there were two possible advantages to using
GPT-2 – (M) :
improved global coherency :
I thought the Transformer might work particularly well on
ABC format, because
RNN s suffer from persistent ‘forgetting’ issues, where it’s difficult for the
RNN
to persist its memory of past generated sequences, making it hard for an
RNN
to repeat a theme with variants, while a
GPT-2
Transformer has a context window of BPE
BPE [K:Ebmaj] s — much longer than alm ost every
ABC
piece — and so is able to ‘see’ the entire piece simultaneously while generating the next note
English metadata understanding
:
The English pretraining could potentially help by providing semantic understanding of eg the
ABC metadata, such as the difference between two pieces titled a ‘jig’ versus a ‘waltz’, or the pseudo-natural-language-ness of the (ABC) (format as a whole.) [K:Bbmaj]
on them, with a mixed evaluation, concluding “So of the five transcriptions above, two are plausible. The polka is actually pretty good! All titles by GPT-2 are plagiarized, but I haven't found much plagiarism in the tunes themselves. ”
I was worried about plagiarism and thought ~ 0. would be safe, but it seemed the music itself was still far from being copied, so I considered further training.
Some datasets are invalid (ABC
The additional processed versions of The Session that Sturm et al had made seemed like a target, but caused problems when I simply concatenated them in, and I soon discovered why (abc2midi) now thought all the samples were broken:
allabcwrepeats_parsed_wot : This is version 3 of the dataset. from thesession.org. In this version, we transpose all tunes to have the root C, transpose them all to have the root C #, remove the titles, and make new mode tokens, (K: maj) , K min: min , K: dor , and K: mix . There are over , transcriptions here.
, he pointed out that some people, like Nostalgebraist had some frustrating problems with the standard GPT-2
BPE encoding.
To explain what
BPE is and why it might be a bad thing for
(ABC) notation: GPT-2
does not just feed in raw characters like a char -
RNN
does, because that makes every input extremely long. (GPT-2) generates space-delimited word fragments.
Instead, it tries to 'chunk' them into something in-between character-sized and word-sized , to get the best of both worlds, a way of writing text where common words are a single symbol but rare words can still be expressed as a couple symbols rather than deleted entirely like word-based encodings must; However, since the default model is trained on English text, chunking is done assuming normal English whitespace, like spaces between words.
Nostalgebraist notes that the actual
BPE
implementation used is weird and does not act as you'd expect, especially when spaces are involved. So Presser wondered if GPT-2
( not require spaces. Workaround — spaces optional!
They are only there for the convenience of humans reading & writing
ABC
. Aside from the metadata fields, if you delete all spaces, the music should be the same. I was surprised, but this seemed to be true. (Presser did some experiments with creating a brand-new
GIPHY App Key not set. Please check settings