Skip to content

Conversation

@wonjoo-wj
Copy link
Collaborator

@wonjoo-wj wonjoo-wj commented Jan 22, 2024

Fixes #5896, fixes #5867, fixes #5884, fixes #5889

@wonjoo-wj wonjoo-wj requested a review from ManfeiBai January 22, 2024 06:43
Copy link
Collaborator

@ManfeiBai ManfeiBai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wonjoo-wj wonjoo-wj changed the title Fix sore core aten ops Fix some more core aten ops Jan 22, 2024
@wonjoo-wj wonjoo-wj force-pushed the wonjoo/core-aten-ops-week-6 branch from ae77bfa to 34786e8 Compare January 22, 2024 20:23
@wonjoo-wj wonjoo-wj force-pushed the wonjoo/core-aten-ops-week-6 branch from 34786e8 to 0478d2f Compare January 23, 2024 08:32
@wonjoo-wj wonjoo-wj merged commit 99a1341 into master Jan 23, 2024
@cota
Copy link
Collaborator

cota commented Jan 24, 2024

I've bisected this commit to a large amount of failures (all torchbench inference on XLA:GPU).

Some example failures:

INFO:__main__:Run with --model-config={"model_name": "BERT_pytorch"} --experiment-config={"accelerator": "cuda", "xla": "PJRT", "xla_flags": null, "dynamo": "openxla", "test": "train"}
ERROR:torchbench_model:Cannot load benchmark model
Traceback (most recent call last):
  File "/home/ecg/nightly_runs/2024-01-24/pytorch/xla/benchmarks/torchbench_model.py", line 288, in default_precision_flag
    benchmark = self.load_benchmark()
  File "/home/ecg/nightly_runs/2024-01-24/pytorch/xla/benchmarks/torchbench_model.py", line 267, in load_benchmark
    return benchmark_cls(
  File "/home/ecg/nightly_runs/2024-01-24/benchmark/torchbenchmark/util/model.py", line 24, in __call__
    obj = type.__call__(cls, *args, **kwargs)
  File "/home/ecg/nightly_runs/2024-01-24/benchmark/torchbenchmark/models/BERT_pytorch/__init__.py", line 148, in __init__
    trainer = BERTTrainer(bert, len(vocab), train_dataloader=train_data_loader, test_dataloader=test_data_loader,
  File "/home/ecg/nightly_runs/2024-01-24/benchmark/torchbenchmark/models/BERT_pytorch/bert_pytorch/trainer/pretrain.py", line 38, in __init__
    self.device = torch.device(device)
TypeError: device() received an invalid combination of arguments - got (bool), but expected one of:
 * (torch.device device)
      didn't match because some of the arguments have invalid types: (!bool!)
 * (str type, int index)
 * 
INFO:__main__:Run with --model-config={"model_name": "Background_Matting"} --experiment-config={"accelerator": "cuda", "xla": "PJRT", "xla_flags": null, "dynamo": "openxla", "test": "eval"}
ERROR:torchbench_model:Cannot load benchmark model
Traceback (most recent call last):
  File "/home/ecg/nightly_runs/2024-01-24/pytorch/xla/benchmarks/torchbench_model.py", line 288, in default_precision_flag
    benchmark = self.load_benchmark()
  File "/home/ecg/nightly_runs/2024-01-24/pytorch/xla/benchmarks/torchbench_model.py", line 267, in load_benchmark
    return benchmark_cls(
  File "/home/ecg/nightly_runs/2024-01-24/benchmark/torchbenchmark/util/model.py", line 24, in __call__
    obj = type.__call__(cls, *args, **kwargs)
  File "/home/ecg/nightly_runs/2024-01-24/benchmark/torchbenchmark/models/Background_Matting/__init__.py", line 72, in __init__
    netB.to(self.device)
  File "/home/ecg/nightly_runs/2024-01-24/pytorch/torch/nn/modules/module.py", line 1137, in to
    raise TypeError('nn.Module.to only accepts floating point or complex '
TypeError: nn.Module.to only accepts floating point or complex dtypes, but got desired dtype=torch.bool

Does this ring any bells?

@wonjoo-wj
Copy link
Collaborator Author

Thanks for catching this. It's hard to identify the offending op just looking at the trace, but this PR basically only touches two ops -- aten::reciprocal and aten::sigmoid. Let me revert the changes that this PR does for this two ops for now and investigate.

wonjoo-wj added a commit that referenced this pull request Jan 25, 2024
@wonjoo-wj
Copy link
Collaborator Author

wonjoo-wj commented Jan 25, 2024

Reading the error, it's complaining that it's getting passed a boolean in .device() and .to() methods. Just by a quick look, the errors seem irrelevant to this PR's change but let me continue to investigate.

@cota, is there something that describes the set-up for me to repro that (run torchbench) in GPU?

cota added a commit to cota/pytorch-xla that referenced this pull request Jan 26, 2024
This reverts commit 4ab7a24.

Turns out that the revert was unnecessary; things broke
from a different commit. This reverts the revert, i.e.
it reinstates pytorch#6342.
@cota
Copy link
Collaborator

cota commented Jan 26, 2024

@wonjoolee95 I re-did the bisection paying more attention this time. It turns out that the problem was introduced in a prior commit, not in this PR. My apologies! :(
Things are now working on master, and I have confirmed that reinstating this PR still works.
I've sent #6387 to reapply this change.

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024
ysiraichi added a commit that referenced this pull request May 22, 2025
- `SgnOp` and `SignOp`
    - Full codegen migration: #3577
    - Mistakenly re-introduced: #3572
- `LogSigmoid`
    - Introduced: #3539
    - Full codegen migration: #3743
- `SiLU`
    - Introduced: #2721
    - Full codegen migration: #3780
- `SiLUBackward`
    - Introduced: #3195
    - Full codegen migration: #3780
- `SeLU`
    - Introduced: #3547
    - Full codegen migration: #3780
- `Sigmoid`
    - Introduced: 6a73deb (no PR record)
    - Full codegen migration: #6342
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants