Implementation of layer-norm in the training script #3256

bernardohenz · 2020-08-18T14:58:11Z

PR that allows the use of layer-norm in the model. For our experiments, it allows for training more epochs without overfitting.

PS: I'll be creating a PR in you tensorflow repository, as layer-norm uses some dependencies that are not included in your build rule. Nonetheless, I'm sending the new rule here (which were found in tensorflow/core/kernels/BUILD). When compiling in the 0.6.1 version, the following rule worked just fine:

tf_kernel_library(
    name = "deepspeech_cwise_ops",
    srcs = [
        "cwise_op_less.cc",
        "cwise_op_minimum.cc",
        "cwise_op_mul_1.cc",
        "cwise_op_squared_difference.cc",
        "cwise_op_add_1.cc",
        "cwise_op_add_2.cc",
        "cwise_op_rsqrt.cc",
        "cwise_op_sub.cc",
    ],
    gpu_srcs = [
        "cwise_op_gpu_less.cu.cc",
        "cwise_op_gpu_minimum.cu.cc",
        "cwise_op_gpu_mul.cu.cc",
        "cwise_op_gpu_squared_difference.cu.cc",
        "cwise_op_gpu_add.cu.cc",
        "cwise_op_gpu_rsqrt.cu.cc",
        "cwise_op_gpu_sub.cu.cc",
    ],
    deps = [
        ":cwise_lib",
        "//tensorflow/core:framework",
        "//tensorflow/core:lib",
        "//third_party/eigen3",
    ],
}

I'll be compiling the binaries in the master to check if it still works.

community-tc-integration · 2020-08-18T14:58:28Z

No Taskcluster jobs started for this pull request

The `allowPullRequests` configuration for this repository (in `.taskcluster.yml` on the
default branch) does not allow starting tasks for this pull request.

lissyx · 2020-08-18T15:12:35Z

I'll be compiling the binaries in the master to check if it still works.

Master moved to r2.3

PS: I'll be creating a PR in you tensorflow repository, as layer-norm uses some dependencies that are not included in your build rule. Nonetheless, I'm sending the new rule here (which were found in tensorflow/core/kernels/BUILD).

Pretty sure we already have those

lissyx · 2020-08-18T15:14:30Z

I'm sending the new rule here

https://github.com/mozilla/tensorflow/blob/r2.3/tensorflow/core/kernels/BUILD#L8655-L8673

Ok, so you just have a few files to add?

Please make sure you:

update tensorflow submodule
update taskcluster/.shared.yml tensorflow references

bernardohenz · 2020-08-18T15:18:28Z

Ok, so you just have a few files to add?

Yes, these new files that I need to check if it works on the r2.3. But I believe this will work just fine

lissyx · 2020-08-18T16:09:28Z

For our experiments, it allows for training more epochs without overfitting.

Do you mind sharing more on that?

training/mozilla_voice_stt_training/util/flags.py

bernardohenz · 2020-08-18T17:00:53Z

Do you mind sharing more on that?

The experiments we have done are a little old right now. I performing a new training benchmark right now, soon I'll let you know.

reuben

The code changes look good to me. Thanks Bernardo!

bernardohenz · 2020-08-19T16:19:41Z

All is working in the binaries, already create a PR for mozilla tensorflow: mozilla/tensorflow#124

DanBmh · 2020-08-20T12:29:46Z

For training with already existing checkpoints you have to initialize the new layers first. This did work for me:

# training/deepspeech_training/utils/checkpoins.py

def _load_checkpoint(session, checkpoint_path, allow_drop_layers):
    [...]

    if FLAGS.layer_norm:
        for v in load_vars:
            if v.op.name not in vars_in_ckpt:
                if 'LayerNorm' in v.name:
                    init_vars.add(v)
                else:
                    msg = "Tried to train with layer normalization but there was " \
                          "a missing variable other than the LayerNorm tensors: {}"
                    log_error(msg.format(v))
                    sys.exit(1)
        load_vars -= init_vars

I also did run a short test transfer-learning the English checkpoint to German with a small dataset (~32h), but layer-norm didn't help here:

Dataset	Additional Infos	Losses	Training epochs of best model	Total training duration
Voxforge	without layer-norm	Test: 30.655203, Validation: 33.655750	9	48 min
Voxforge	with layer normalization	Test: 57.330410, Validation: 61.025009	45	2:37 h

Maybe training the reinitialized LayerNorm tensors only and freezing the rest of the network (see #3247) before training the complete network would help here

bernardohenz · 2020-08-20T12:53:44Z

@DanBmh you cannot take the weights (checkpoint) from a model trained without layer-norm, and use them to finetune/transferlearning to a model that uses layernorm. The model's architecture are just different.

For instance, while the 2nd dense layer from the current checkpoint (without LN) was trained to process tensors in a certain range; the 2nd dense layer in an architecture with LN will process tensors with a completely different range (not only range, but mean and var as well).

Also, that's why I've put the argument layer_norm along with n_hidden in the geometry section. These arguments dictates the geometry/architecture of your model. If you wish to finetune/transfer-learning from a trained model, you should stick right with the architecture it was trained.

DanBmh · 2020-08-20T14:24:01Z

For instance, while the 2nd dense layer from the current checkpoint (without LN) was trained to process tensors in a certain range; the 2nd dense layer in an architecture with LN will process tensors with a completely different range (not only range, but mean and var as well).

@bernardohenz you're right. I just was hoping that transfer-learning performance wasn't that bad and did a short test run.

lissyx · 2020-08-24T15:09:26Z

All is working in the binaries, already create a PR for mozilla tensorflow: mozilla/tensorflow#124

Thanks @bernardohenz !

Can you send a PR against mozilla/STT that:

changes .gitmodules to fetch your tensorflow repo
changes tensorflow sha1 checkout to your changes
changes taskcluster/.shared.yml tensorflow SHA1 references to your new sha1

This is required for us to be able to run your PR with all your changes and ensure nothing regresses

lissyx · 2020-08-24T17:56:46Z

@bernardohenz This is not complete, you have not updated taskcluster/.shared.yml

bernardohenz · 2020-08-24T18:07:29Z

@bernardohenz This is not complete, you have not updated taskcluster/.shared.yml

Yes, I was about to post asking for some help with this. What am I supposed to do? Just replace the old sha references ('4336a5b49fa6d650e24dbdba55bcef9581535244') to the new one ('6dc2a1becfd1316eb4d77240133a548e93dbff63')?
Or should I compile anything and upload to you guys?

lissyx · 2020-08-24T18:14:33Z

@bernardohenz This is not complete, you have not updated taskcluster/.shared.yml

Yes, I was about to post asking for some help with this. What am I supposed to do? Just replace the old sha references ('4336a5b49fa6d650e24dbdba55bcef9581535244') to the new one ('6dc2a1becfd1316eb4d77240133a548e93dbff63')?
Or should I compile anything and upload to you guys?

Just replace, that is the purpose of those references: our CI will check if the taskcluster index exists, and if not, it will build it.

lissyx · 2020-08-24T18:21:37Z

@bernardohenz Please don't merge but rebase.

lissyx · 2020-08-24T18:29:58Z

@bernardohenz Can you please clean the history? No merge, no "revert" of the previous commit. Force-push is fine, no worries.

Replacing old sha references ('4336a5b49fa6d650e24dbdba55bcef9581535244') with the new one ('6dc2a1becfd1316eb4d77240133a548e93dbff63')

bernardohenz · 2020-08-24T18:54:56Z

I believe this is ok now. Sorry for the mess.

lissyx · 2020-08-24T19:07:47Z

I believe this is ok now. Sorry for the mess.

Thanks; one thing I forgot, can you please update taskcluster/.build.yml reference to tensorflow version with the value matching your git describe --long --tags?

lissyx · 2020-08-24T19:26:23Z

You can see the progress on the Community-TC link 😊

bernardohenz · 2020-08-24T19:45:00Z

You can see the progress on the Community-TC link

Nice :D

lissyx · 2020-08-25T11:17:42Z

You can see the progress on the Community-TC link

Nice :D

macOS CI was a bit of a burden (it always is), but it's green in the end. I'm going to merge your TensorFlow part and then re-run PR with new sha1 and take care of the rest.

lissyx reviewed Aug 18, 2020

View reviewed changes

training/mozilla_voice_stt_training/util/flags.py Outdated Show resolved Hide resolved

lissyx requested a review from reuben August 18, 2020 16:11

reuben approved these changes Aug 19, 2020

View reviewed changes

bernardohenz mentioned this pull request Aug 19, 2020

Adding dependencies for layer normalization mozilla/tensorflow#124

Merged

lissyx self-requested a review August 24, 2020 15:10

lissyx assigned bernardohenz Aug 24, 2020

bernardohenz force-pushed the layer_norm branch from 5e78032 to cdcbfdf Compare August 24, 2020 18:39

bernardohenz added 5 commits August 24, 2020 15:39

Implementation of layer-norm in the training script

7f75e13

Default for layer_norm set to False

7253776

Changing .gitmodules to github.com/bernardohenz/tensorflow.git

9f019e9

Updating commit of submodule

cd91087

Replacing old sha with new ones

77e6c93

Replacing old sha references ('4336a5b49fa6d650e24dbdba55bcef9581535244') with the new one ('6dc2a1becfd1316eb4d77240133a548e93dbff63')

bernardohenz force-pushed the layer_norm branch from cdcbfdf to 77e6c93 Compare August 24, 2020 18:44

Updating tensorflow version in taskcluster/.build.yml

a529b39

lissyx closed this Aug 25, 2020

Implementation of layer-norm in the training script #3256

Implementation of layer-norm in the training script #3256

Uh oh!

Conversation

bernardohenz commented Aug 18, 2020

Uh oh!

community-tc-integration bot commented Aug 18, 2020

Uh oh!

lissyx commented Aug 18, 2020

Uh oh!

lissyx commented Aug 18, 2020

Uh oh!

bernardohenz commented Aug 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lissyx commented Aug 18, 2020

Uh oh!

Uh oh!

bernardohenz commented Aug 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reuben left a comment

Choose a reason for hiding this comment

Uh oh!

bernardohenz commented Aug 19, 2020

Uh oh!

DanBmh commented Aug 20, 2020

Uh oh!

bernardohenz commented Aug 20, 2020

Uh oh!

DanBmh commented Aug 20, 2020

Uh oh!

lissyx commented Aug 24, 2020

Uh oh!

lissyx commented Aug 24, 2020

Uh oh!

bernardohenz commented Aug 24, 2020

Uh oh!

lissyx commented Aug 24, 2020

Uh oh!

lissyx commented Aug 24, 2020

Uh oh!

lissyx commented Aug 24, 2020

Uh oh!

bernardohenz commented Aug 24, 2020

Uh oh!

lissyx commented Aug 24, 2020

Uh oh!

lissyx commented Aug 24, 2020

Uh oh!

bernardohenz commented Aug 24, 2020

Uh oh!

lissyx commented Aug 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bernardohenz commented Aug 18, 2020 •

edited

Loading

bernardohenz commented Aug 18, 2020 •

edited

Loading