-
Notifications
You must be signed in to change notification settings - Fork 4.6k
add cpu|mps support #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights
…rors still, so wip
|
ok but it sounds like it's unrelated to this PR or running on mac. the only thing that is mostly preventing me from merging this branch to master is the toml issue i think. looking... |
Add mps and cpu dependency management
|
@burtenshaw thanks but i get an error with this (doing uv sync on my macbook) I think it's time I spend some quality time with uv docs. |
Co-authored-by: Tancrède Lepoint <[email protected]>
@karpathy Sorry! Then it could just be as simple as limiting the cuda index to only linux: |
|
Still should be checked with gpu and cpu configs to see if that throws errors |
| for i in range(needed_tokens): | ||
| scratch[i] = token_buffer.popleft() | ||
| # Create the inputs/targets as 1D tensors | ||
| inputs_cpu = scratch[:-1].to(dtype=torch.int32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I'm getting the following error at L42
!self.is_mps() INTERNAL ASSERT FAILED at "/Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp":1414, please report a bug to PyTorch. as_strided_tensorimpl does not work with MPS; call self.as_strided(...) instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To fix it, change L41-42 to
tokens = [token_buffer.popleft() for _ in range(needed_tokens)]
scratch = torch.tensor(tokens, dtype=torch.int64, pin_memory=device.type == "cuda")
|
@karpathy - I successfully got this running on a Windows machine (CPU-only). I encountered a few issues during setup, which I have detailed below along with their resolutions. Getting the nanochat-d32 model running on Windows CPU wasn't straightforward, but the journey was incredibly educational..! Issues Encountered:
Attaching a screenshot of the nanochat application running:
|
yes, this is working on Mac OS with M2 |
…various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
|
@karpathy Instead of assuming GPU support for Linux users, a cleaner way to do it would be to make the device specific package index for Torch an "extra", switching to the extra like my PR showed. Adds index: If the devices are all autodetected indexes, it could "just work". |
… flagship build which is linux. sorry to pollute the repo history...


WIP, allowing people to run the code either on CPU (any potato) or MPS (Macbook GPUs)
Atm struggling a bit to figure out how to adjust the
pyproject.tomlto switch pytorch to the basic version on demand. Current workaround is to delete these lines from the pyproject.toml: