Implementation with PyTorch.
- Base model
- LSTM using MFCC audio features
- CNN(ref simplified version) with LPC features
- Python3
- PyTorch v0.3.0
- numpy
- librosa & audiolazy
- scipy
- etc.
-
Scripts to run
main.py: change net name and set checkpoints folder to train different modelstest_model.py: generate blendshape sequences given extracted audio features (need audio features as input)synthesis.py: generate blendshape directly from input wav (need arguements of input audio path)
-
Classes
models.py: Classes with LSTM and CNN (simplified NvidiaNet) model.models_testae.py: Advanced models with audoencoder design.dataset.py: Class for loading dataset.
-
Input preprocessing
misc/audio_mfcc.py: extract mfcc features from input wav filesmisc/audio_lpc.py: extract lpc featuresmisc/combine.py: combine certain audio feature/blendshape files to obtain a single file for data loading
To build your own dataset, you need to preprocess your wav/blendshape pairs with misc/audio_mfcc.py or misc/audio_lpc.py. Then combine those feature/blendshape files misc/combine.py to a single feature/blendshape file.
Modify main.py. Set model to the one you need and also specify checkpoint folder.
- Both
test_model.pyandsynthesis.pycan be used to generate blendshape sequences.test_model.pyaccepts extrated audio features (MFCC/LPC).synthesis.pytakes raw wav file as input- State the arguments and it will produce a blenshape test file.