We introduce Fine-Grained Guidance (FGG), an efficient approach for symbolic music generation using diffusion models. Our method enhances guidance through:
(1) Fine-grained conditioning during training,
(2) Fine-grained control during the diffusion sampling process.
In particular, sampling control ensures tonal accuracy in every generated sample, allowing our model to produce music with high precision, consistent rhythmic patterns, and even stylistic variations that align with user intent.
We provide the model with the melody and chord as inputs, and the model will generate the accompaniment accordingly.
In each example, the left column displays the melody provided as inputs to the model. The right column showcases music samples generated by the model. The score for the melody and accompaniment are provided in section 4.
Our approach enables controllable stylization in music generation. The sampling control is able to ensure that all generated notes strictly adhere to the target musical style's scale. This allows the model to generate music in specific styles — even those that were not present in the training data.
Below, we demonstrate several examples of style-controlled music generation for:
The following are two examples generated by our method
The following are two examples generated by our method
We demonstrate the impact of sampling control in an accompaniment generation task, given a melody and chord progression. Each example generates accompaniments using the same random seed (but different ablative conditions), ensuring that the results are comparable.
The ablative conditions are as follows:
Comparison of the results indicates that sampling control not only eliminates out-of-key notes but also enhances the overall coherence and harmonic consistency of the accompaniments. This highlights the effectiveness of our approach in maintaining musical coherence.
Training Control + Sampling Control (our proposed method)
Only Training Control
Training Control + Remove Out-of-Key Notes
Training Control + Round Out-of-Key Notes to Nearest
Inpainting Method
Training Control + Sampling Control (our proposed method)
Only Training Control
Training Control + Remove Out-of-Key Notes
Training Control + Round Out-of-Key Notes to Nearest
Inpainting Method
Try our interactive music generation tool where you can generate new accompaniments for given melody and chord conditions! Visit our Hugging Face demo to experiment with the model.