Unsupervised acoustic word segmentation and clustering using the embedded segmental K-means (ES-KMeans) algorithm. The algorithm is described in:
H. Kamper, K. Livescu, and S. J. Goldwater, "An embedded segmental K-means model for unsupervised segmentation and clustering of speech," in Proc. ASRU, 2017. [arXiv]
Please cite this paper if you use the code.
Dependencies can be installed in a conda environment:
conda env create -f environment.yml
conda activate eskmeans
Perform unit tests:
nosetests -v
A number of example notebooks are given in examples/
. The
examples/eskmeans_example.ipynb notebook
provides a step-by-step example of the ES-KMeans algorithm. This notebook can
also be opened directly in a Colab
notebook.
The code here only provides the main algorithm and some supporting utilities. A complete recipe where ES-KMeans is applied to the Buckeye English and NCHLT Xitsonga datasets is available here.
Copyright (C) 2021 Herman Kamper
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
The module utils/theta_oscillator.py is a derivation of Adriana Stan's Python implementation available at https://github.com/speech-utcluj/thetaOscillator-syllable-segmentation, which was released under the same license. Concrete changes to Adriana's code are listed in the documentation at the top of the module. I also list Adriana as a contributor below.