Skip to content

tttslab/spolacq

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spoken Language Acquisition From Conversation Based On Reinforcement Learning

Overview

This repository contains codes for the papers:

Unsupervised syllable boundary detection

We use the unsupervised syllable boundary detection algorithm described in:

  • O. J. Räsänen, G. Doyle, and M. C. Frank, "Unsupervised word discovery from speech using automatic segmentation into syllable-like units," in Proc. Interspeech, 2015.

Unsupervised segmentation and clustering

Word segmentation and clustering is performed using the ES-KMeans package.

Unsupervised sound-image correspondence learning

We use the unsupervised sound-image correspondence learning described in:

  • D. Harwath, A. Torralba, and J. Glass, "Unsupervised learning of spoken language with visual context," in Proc. NIPS, 2016.

Self-supervised pronunciation adaptation

As the agent's speech organ, we use the neural vocoder described in:

  • N. Chen and Y. Zhang and H. Zen and R. J. Weiss and M. Norouzi and W. Chan, "WaveGrad: Estimating Gradients for Waveform Generation," in Proc. ICLR, 2021.

For implementation, we use the wavegrad package.

Dependencies

About Author

Usage

Create your combined_sounds.wav from your own data (optional)

cd scripts
python prep_combine_wavs.py --path=PATH_TO_RAW_AUDIOS_FOLDER

Run main script for 3D homing task

cd egs
bash run.bash DATA_NAME(arbitrary) MAX_K

Run main script for food task

conda env create -f=spolacq.yml
conda activate spolacq
cd egs
sh runfood.sh DATA_DIR_NAME(arbitrary)

Run main script for food task with WaveGrad speech organ-based agent

conda env create -f=spolacq.yml
conda activate spolacq
cd egs
sh runwavegrad.sh DATA_DIR_NAME(arbitrary)

If you are having trouble with building box2d-py, please try

sudo apt install xvfb xorg-dev libsdl2-dev swig cmake

License

Copyright (C) 2020, 2021 Shinozaki Lab Tokyo Tech

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see https://www.gnu.org/licenses/.

The following programs are APIs for ES-KMeans.
We have copied and modified some of the code available at https://github.com/kamperh/eskmeans,
which is released under GNU General Public License version 3.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 5