Fix some more core aten ops #6342

wonjoo-wj · 2024-01-22T06:43:04Z

Fixes #5896, fixes #5867, fixes #5884, fixes #5889

ManfeiBai

LGTM

cota · 2024-01-24T23:09:05Z

I've bisected this commit to a large amount of failures (all torchbench inference on XLA:GPU).

Some example failures:

INFO:__main__:Run with --model-config={"model_name": "BERT_pytorch"} --experiment-config={"accelerator": "cuda", "xla": "PJRT", "xla_flags": null, "dynamo": "openxla", "test": "train"}
ERROR:torchbench_model:Cannot load benchmark model
Traceback (most recent call last):
  File "/home/ecg/nightly_runs/2024-01-24/pytorch/xla/benchmarks/torchbench_model.py", line 288, in default_precision_flag
    benchmark = self.load_benchmark()
  File "/home/ecg/nightly_runs/2024-01-24/pytorch/xla/benchmarks/torchbench_model.py", line 267, in load_benchmark
    return benchmark_cls(
  File "/home/ecg/nightly_runs/2024-01-24/benchmark/torchbenchmark/util/model.py", line 24, in __call__
    obj = type.__call__(cls, *args, **kwargs)
  File "/home/ecg/nightly_runs/2024-01-24/benchmark/torchbenchmark/models/BERT_pytorch/__init__.py", line 148, in __init__
    trainer = BERTTrainer(bert, len(vocab), train_dataloader=train_data_loader, test_dataloader=test_data_loader,
  File "/home/ecg/nightly_runs/2024-01-24/benchmark/torchbenchmark/models/BERT_pytorch/bert_pytorch/trainer/pretrain.py", line 38, in __init__
    self.device = torch.device(device)
TypeError: device() received an invalid combination of arguments - got (bool), but expected one of:
 * (torch.device device)
      didn't match because some of the arguments have invalid types: (!bool!)
 * (str type, int index)
 *

INFO:__main__:Run with --model-config={"model_name": "Background_Matting"} --experiment-config={"accelerator": "cuda", "xla": "PJRT", "xla_flags": null, "dynamo": "openxla", "test": "eval"}
ERROR:torchbench_model:Cannot load benchmark model
Traceback (most recent call last):
  File "/home/ecg/nightly_runs/2024-01-24/pytorch/xla/benchmarks/torchbench_model.py", line 288, in default_precision_flag
    benchmark = self.load_benchmark()
  File "/home/ecg/nightly_runs/2024-01-24/pytorch/xla/benchmarks/torchbench_model.py", line 267, in load_benchmark
    return benchmark_cls(
  File "/home/ecg/nightly_runs/2024-01-24/benchmark/torchbenchmark/util/model.py", line 24, in __call__
    obj = type.__call__(cls, *args, **kwargs)
  File "/home/ecg/nightly_runs/2024-01-24/benchmark/torchbenchmark/models/Background_Matting/__init__.py", line 72, in __init__
    netB.to(self.device)
  File "/home/ecg/nightly_runs/2024-01-24/pytorch/torch/nn/modules/module.py", line 1137, in to
    raise TypeError('nn.Module.to only accepts floating point or complex '
TypeError: nn.Module.to only accepts floating point or complex dtypes, but got desired dtype=torch.bool

Does this ring any bells?

wonjoo-wj · 2024-01-25T00:14:25Z

Thanks for catching this. It's hard to identify the offending op just looking at the trace, but this PR basically only touches two ops -- aten::reciprocal and aten::sigmoid. Let me revert the changes that this PR does for this two ops for now and investigate.

This reverts commit 99a1341.

wonjoo-wj · 2024-01-25T00:39:46Z

Reading the error, it's complaining that it's getting passed a boolean in .device() and .to() methods. Just by a quick look, the errors seem irrelevant to this PR's change but let me continue to investigate.

@cota, is there something that describes the set-up for me to repro that (run torchbench) in GPU?

This reverts commit 4ab7a24. Turns out that the revert was unnecessary; things broke from a different commit. This reverts the revert, i.e. it reinstates pytorch#6342.

cota · 2024-01-26T05:14:21Z

@wonjoolee95 I re-did the bisection paying more attention this time. It turns out that the problem was introduced in a prior commit, not in this PR. My apologies! :(
Things are now working on master, and I have confirmed that reinstating this PR still works.
I've sent #6387 to reapply this change.

- `SgnOp` and `SignOp` - Full codegen migration: #3577 - Mistakenly re-introduced: #3572 - `LogSigmoid` - Introduced: #3539 - Full codegen migration: #3743 - `SiLU` - Introduced: #2721 - Full codegen migration: #3780 - `SiLUBackward` - Introduced: #3195 - Full codegen migration: #3780 - `SeLU` - Introduced: #3547 - Full codegen migration: #3780 - `Sigmoid` - Introduced: 6a73deb (no PR record) - Full codegen migration: #6342

wonjoo-wj requested a review from ManfeiBai January 22, 2024 06:43

ManfeiBai approved these changes Jan 22, 2024

View reviewed changes

wonjoo-wj changed the title ~~Fix sore core aten ops~~ Fix some more core aten ops Jan 22, 2024

wonjoo-wj force-pushed the wonjoo/core-aten-ops-week-6 branch from ae77bfa to 34786e8 Compare January 22, 2024 20:23

wonjoo-wj added 7 commits January 23, 2024 08:32

Codegen aten_sigmoid and do manual dtype conversion

49df8ae

Increase fult tolerance for aten_gelu

d2450ec

Run linter

483d314

Enable test_aten_reciprocal test

4910f78

Increase fault tolernce for aten_native_group_norm

c563a34

Run linter again

3865443

Add manual dtype conversion for aten_reciprocal

0478d2f

wonjoo-wj force-pushed the wonjoo/core-aten-ops-week-6 branch from 34786e8 to 0478d2f Compare January 23, 2024 08:32

wonjoo-wj merged commit 99a1341 into master Jan 23, 2024

wonjoo-wj added a commit that referenced this pull request Jan 25, 2024

Revert "Fix some more core aten ops (#6342)"

38cdd71

This reverts commit 99a1341.

wonjoo-wj added a commit that referenced this pull request Jan 25, 2024

Revert "Fix some more core aten ops (#6342)" (#6377)

4ab7a24

cota mentioned this pull request Jan 26, 2024

Reapply "Fix some more core aten ops (#6342)" (#6377) #6387

Merged

wonjoo-wj pushed a commit that referenced this pull request Jan 26, 2024

Reapply "Fix some more core aten ops (#6342)" (#6377) (#6387)

9e4db96

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024

Fix some more core aten ops (#6342)

56761a3

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024

Revert "Fix some more core aten ops (#6342)" (#6377)

24eb3c1

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024

Reapply "Fix some more core aten ops (#6342)" (#6377) (#6387)

73f7e97

ysiraichi mentioned this pull request May 22, 2025

Delete unused tracing functions from ops.cpp. #9240

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix some more core aten ops #6342

Fix some more core aten ops #6342

wonjoo-wj commented Jan 22, 2024 •

edited

Loading

Uh oh!

ManfeiBai left a comment

Uh oh!

cota commented Jan 24, 2024 •

edited

Loading

Uh oh!

wonjoo-wj commented Jan 25, 2024

Uh oh!

wonjoo-wj commented Jan 25, 2024 •

edited

Loading

Uh oh!

cota commented Jan 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix some more core aten ops #6342

Fix some more core aten ops #6342

Conversation

wonjoo-wj commented Jan 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ManfeiBai left a comment

Choose a reason for hiding this comment

Uh oh!

cota commented Jan 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wonjoo-wj commented Jan 25, 2024

Uh oh!

wonjoo-wj commented Jan 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cota commented Jan 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wonjoo-wj commented Jan 22, 2024 •

edited

Loading

cota commented Jan 24, 2024 •

edited

Loading

wonjoo-wj commented Jan 25, 2024 •

edited

Loading