Kilian Schulze-Forster

Unsupervised Audio Source Separation
Using Differentiable Parametric Source Models

Welcome to the demo website of the paper

Schulze-Forster, K., Doire, C., Richard, G., & Badeau, R. "Unsupervised Musical Source Separation Using Differentiable Parametric Source Models" Currently under review at IEEE/ACM Transactions on Audio, Speech and Language Processing.

All example mixtures are taken from the test set.

Audio examples for mixtures of 2 sources (J=2)

Example mixture 1:

Method

Comment

Soprano

Alto

Re-synthesized mix^*

True sources

US-F (proposed)

Estimates ŝ_j(t) obtained by Wiener filtering

US-F

Signals s̃_j(t) generated with source models

US-S (proposed)

Estimates ŝ_j(t) obtained by Wiener filtering

US-S

Signals s̃_j(t) generated with source models

SV-F

Estimates ŝ_j(t) obtained by Wiener filtering

SV-F

Signals s̃_j(t) generated with source models

SV-S

Estimates ŝ_j(t) obtained by Wiener filtering

SV-S

Signals s̃_j(t) generated with source models

NMF1

Wiener filtering

NMF2

Wiener filtering

Unet-F

Wiener filtering

Unet-S

Wiener filtering

^*This is the sum of the signals s̃_j(t) generated with the parametric source models. It is denoted m̃(t) in the paper. This signal can be manipulated by changing the source model parameters.

Example mixture 2:

Method

Comment

Tenor

Bass

Re-synthesized mix^*

True sources

US-F (proposed)

Estimates ŝ_j(t) obtained by Wiener filtering

US-F

Signals s̃_j(t) generated with source models

US-S (proposed)

Estimates ŝ_j(t) obtained by Wiener filtering

US-S

Signals s̃_j(t) generated with source models

SV-F

Estimates ŝ_j(t) obtained by Wiener filtering

SV-F

Signals s̃_j(t) generated with source models

SV-S

Estimates ŝ_j(t) obtained by Wiener filtering

SV-S

Signals s̃_j(t) generated with source models

NMF1

Wiener filtering

NMF2

Wiener filtering

Unet-F

Wiener filtering

Unet-S

Wiener filtering

^*This is the sum of the signals s̃_j(t) generated with the parametric source models. It is denoted m̃(t) in the paper. This signal can be manipulated by changing the source model parameters.

Example mixture 3:

Method

Comment

Soprano

Alto

Re-synthesized mix^*

True sources

US-F (proposed)

Estimates ŝ_j(t) obtained by Wiener filtering

US-F

Signals s̃_j(t) generated with source models

US-S (proposed)

Estimates ŝ_j(t) obtained by Wiener filtering

US-S

Signals s̃_j(t) generated with source models

SV-F

Estimates ŝ_j(t) obtained by Wiener filtering

SV-F

Signals s̃_j(t) generated with source models

SV-S

Estimates ŝ_j(t) obtained by Wiener filtering

SV-S

Signals s̃_j(t) generated with source models

NMF1

Wiener filtering

NMF2

Wiener filtering

Unet-F

Wiener filtering

Unet-S

Wiener filtering

^*This is the sum of the signals s̃_j(t) generated with the parametric source models. It is denoted m̃(t) in the paper. This signal can be manipulated by changing the source model parameters.

Example mixture 4:

Method

Comment

Tenor

Bass

Re-synthesized mix^*

True sources

US-F (proposed)

Estimates ŝ_j(t) obtained by Wiener filtering

US-F

Signals s̃_j(t) generated with source models

US-S (proposed)

Estimates ŝ_j(t) obtained by Wiener filtering

US-S

Signals s̃_j(t) generated with source models

SV-F

Estimates ŝ_j(t) obtained by Wiener filtering

SV-F

Signals s̃_j(t) generated with source models

SV-S

Estimates ŝ_j(t) obtained by Wiener filtering

SV-S

Signals s̃_j(t) generated with source models

NMF1

Wiener filtering

NMF2

Wiener filtering

Unet-F

Wiener filtering

Unet-S

Wiener filtering

^*This is the sum of the signals s̃_j(t) generated with the parametric source models. It is denoted m̃(t) in the paper. This signal can be manipulated by changing the source model parameters.

Audio examples for mixtures of 4 sources (J=4)

Example mixture 1:

Method

Comment

Soprano

Alto

Tenor

Bass

Re-synthesized mix^*

True sources

US-F (proposed)

Estimates ŝ_j(t) obtained by Wiener filtering

US-F

Signals s̃_j(t) generated with source models

US-S (proposed)

Estimates ŝ_j(t) obtained by Wiener filtering

US-S

Signals s̃_j(t) generated with source models

SV-F

Estimates ŝ_j(t) obtained by Wiener filtering

SV-F

Signals s̃_j(t) generated with source models

SV-S

Estimates ŝ_j(t) obtained by Wiener filtering

SV-S

Signals s̃_j(t) generated with source models

NMF1

Wiener filtering

NMF2

Wiener filtering

Unet-F

Wiener filtering

Unet-S

Wiener filtering

^*This is the sum of the signals s̃_j(t) generated with the parametric source models. It is denoted m̃(t) in the paper. This signal can be manipulated by changing the source model parameters.

Example mixture 2:

Method

Comment

Soprano

Alto

Tenor

Bass

Re-synthesized mix^*

True sources

US-F (proposed)

Estimates ŝ_j(t) obtained by Wiener filtering

US-F

Signals s̃_j(t) generated with source models

US-S (proposed)

Estimates ŝ_j(t) obtained by Wiener filtering

US-S

Signals s̃_j(t) generated with source models

SV-F

Estimates ŝ_j(t) obtained by Wiener filtering

SV-F

Signals s̃_j(t) generated with source models

SV-S

Estimates ŝ_j(t) obtained by Wiener filtering

SV-S

Signals s̃_j(t) generated with source models

NMF1

Wiener filtering

NMF2

Wiener filtering

Unet-F

Wiener filtering

Unet-S

Wiener filtering

^*This is the sum of the signals s̃_j(t) generated with the parametric source models. It is denoted m̃(t) in the paper. This signal can be manipulated by changing the source model parameters.

Audio examples for melody editing

The mixture is parameterized using the proposed method US-F. An example use case of this parameterization is melody editing. The fundamental frequencies of individual sources are modified and then the sources are mixed together. The result is a modified mixture. The parametersization can be exploited for other tasks such as timbre transfer, style transfer, transpostion, or changing the sung text.

Original mixture

Re-synthesized mixture

Modified mixture