-
Notifications
You must be signed in to change notification settings - Fork 132
Patch for loading config locally #126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThe update changes how model configurations and weights are loaded in the Changes
Poem
Tip ⚡️ Faster reviews with caching
Enjoy the performance boost—your workflow just got faster. ✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
detoxify/detoxify.py (1)
22-22: Slightly exceed the line length limit.The line exceeds the 120-character limit (129 > 120) as flagged by the static analysis tool.
Consider breaking this line into multiple lines for better readability:
- config = model_class.config_class.from_pretrained(huggingface_config_path, num_labels=num_classes, local_files_only=True) + config = model_class.config_class.from_pretrained( + huggingface_config_path, + num_labels=num_classes, + local_files_only=True + )🧰 Tools
🪛 Ruff (0.8.2)
22-22: Line too long (129 > 120)
(E501)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
detoxify/detoxify.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
detoxify/detoxify.py
22-22: Line too long (129 > 120)
(E501)
🔇 Additional comments (2)
detoxify/detoxify.py (2)
21-24: LGTM - Good implementation to solve the network restriction issue.This change correctly implements the solution for users in restricted network environments (like China) by loading the configuration directly from the specified local directory when
huggingface_config_pathis provided. This prevents the timeout issues caused by network restrictions.🧰 Tools
🪛 Ruff (0.8.2)
22-22: Line too long (129 > 120)
(E501)
27-27: LGTM - Consistent with the new configuration loading approach.This change correctly uses the explicitly loaded configuration object instead of the previous conditional approach, which maintains consistency with the new configuration loading strategy.
| config=huggingface_config_path or config, | ||
| config=config, | ||
| state_dict=state_dict, | ||
| local_files_only=huggingface_config_path is not None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify this works in restricted network environments.
The implementation correctly sets local_files_only=True based on the presence of huggingface_config_path for both model and tokenizer loading. This should theoretically solve the issue in China. Let's verify that these changes completely prevent any network requests to Hugging Face servers when a local config path is provided.
Also applies to: 32-34
🏁 Script executed:
#!/bin/bash
# Search for any other potential network requests in the codebase that might still occur
# even with local_files_only=True
echo "Checking for other potential network requests in transformers usage:"
rg -n "from_pretrained" --no-filename | grep -v "local_files_only" | sort
echo "Checking if there are any direct URLs being accessed elsewhere in the code:"
rg -n "https://" --no-filename | sortLength of output: 4937
Add local_files_only to all from_pretrained calls
The following calls still omit the local_files_only flag and may trigger network requests even when a local config path is provided. Please update them to pass
local_files_only=(huggingface_config_path is not None):
• detoxify/detoxify.py
- Line 24:
config = model_class.config_class.from_pretrained( model_type, num_labels=num_classes,
-
)
local_files_only=(huggingface_config_path is not None),- Line 25–27: ```python
- model = model_class.from_pretrained(
-
model_type, -
num_labels=num_classes, - )
- model = model_class.from_pretrained(
-
model_type, -
num_labels=num_classes, -
local_files_only=(huggingface_config_path is not None), - )
```python
- tokenizer = getattr(transformers, tokenizer_name).from_pretrained(model_type)
- tokenizer = getattr(transformers, tokenizer_name).from_pretrained(
-
model_type, -
local_files_only=(huggingface_config_path is not None), - )
- Line 31–34 (second tokenizer load): apply the same change.
Once these calls include local_files_only, no network requests will be made if huggingface_config_path is set.
Issue:
We were facing an issue when using detoxify in China where it would try to download the config from Hugging Face (and running into timeout issues with the firewall) even when the huggingface_config_path was specified. This fix should address that.
Solution
Load config locally from huggingface_config_path (local directory) if provided.
Summary by CodeRabbit