Fix FSDPModule reference (#3527)

emmanuel-ferdman · malfet · sekyondaMeta · web-flow · commit 763699ce2cdb · 2025-09-02T08:39:59.000-07:00
Fixes #ISSUE_NUMBER ## Description This small PR fixes the `FSDPModule` reference by removing a weird hidden zero-width space (`U+200B`). Steps to reproduce: 1. Visit the [FSDP tutorial](https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html). 2. Click on the `FSDPModule` reference. 3. It will redirect you to: ``` https://docs.pytorch.org/tutorials/intermediate/%E2%80%8Bhttps://docs.pytorch.org/docs/main/distributed.fsdp.fully_shard.html#torch.distributed.fsdp.FSDPModule ``` ## Checklist  - [ ] The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER") - [ ] Only one issue is addressed in this pull request - [ ] Labels from the issue that this PR is fixing are added to this pull request - [x] No unnecessary issues are included into this pull request. --------- Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Co-authored-by: sekyondaMeta <127536312+sekyondaMeta@users.noreply.github.com>
diff --git a/intermediate_source/FSDP_tutorial.rst b/intermediate_source/FSDP_tutorial.rst
@@ -73,7 +73,7 @@ Model Initialization
     #  )
 
 We can inspect the nested wrapping with ``print(model)``. ``FSDPTransformer`` is a joint class of `Transformer <https://github.com/pytorch/examples/blob/70922969e70218458d2a945bf86fd8cc967fc6ea/distributed/FSDP2/model.py#L100>`_ and `FSDPModule
-<​https://docs.pytorch.org/docs/main/distributed.fsdp.fully_shard.html#torch.distributed.fsdp.FSDPModule>`_. The same thing happens to `FSDPTransformerBlock <https://github.com/pytorch/examples/blob/70922969e70218458d2a945bf86fd8cc967fc6ea/distributed/FSDP2/model.py#L76C7-L76C18>`_. All FSDP2 public APIs are exposed through ``FSDPModule``. For example, users can call ``model.unshard()`` to manually control all-gather schedules. See "explicit prefetching" below for details.
+<https://docs.pytorch.org/docs/main/distributed.fsdp.fully_shard.html#torch.distributed.fsdp.FSDPModule>`_. The same thing happens to `FSDPTransformerBlock <https://github.com/pytorch/examples/blob/70922969e70218458d2a945bf86fd8cc967fc6ea/distributed/FSDP2/model.py#L76C7-L76C18>`_. All FSDP2 public APIs are exposed through ``FSDPModule``. For example, users can call ``model.unshard()`` to manually control all-gather schedules. See "explicit prefetching" below for details.
 
 **model.parameters() as DTensor**: ``fully_shard`` shards parameters across ranks, and convert ``model.parameters()`` from plain ``torch.Tensor`` to DTensor to represent sharded parameters. FSDP2 shards on dim-0 by default so DTensor placements are `Shard(dim=0)`. Say we have N ranks and a parameter with N rows before sharding. After sharding, each rank will have 1 row of the parameter. We can inspect sharded parameters using ``param.to_local()``.