-
Notifications
You must be signed in to change notification settings - Fork 548
chore(types): Type-clean embeddings/ (25 errors) #1383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
nemoguardrails/embeddings/basic.py
Outdated
|
||
def _init_model(self): | ||
"""Initialize the model used for computing the embeddings.""" | ||
# Provide defaults if not specified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move the defaults to constructor? at line 52
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's cleaner, good idea
nemoguardrails/embeddings/basic.py
Outdated
if not self._current_batch_finished_event: | ||
raise Exception("self._current_batch_finished_event not initialized") | ||
|
||
assert self._current_batch_finished_event is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this redundant? also it must not use assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, removed this line
nemoguardrails/embeddings/cache.py
Outdated
return EmbeddingsCacheConfig( | ||
key_generator=self._key_generator.name, | ||
store=self._cache_store.name, | ||
key_generator=self._key_generator.name if self._key_generator else "sha256", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i find defining defaults here problematic. What Pyright did not like about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There already is a default here in EmbeddingsCacheConfig
:
key_generator: str = Field(
default="sha256",
description="The method to use for generating the cache keys.",
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree this isn't good. The EmbeddingsCacheConfig Pydantic model already has default values for key_generator
, store
, and store_config
. I added an Optional
qualifier to all three and pass None in the constructor to pick up these defaults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some redundancies in default values defined in different files, and also redundant checks for None
. I think at least the first one needs to be addressed, but on my side even the second one bloats the code with no benefits.
nemoguardrails/embeddings/basic.py
Outdated
def _init_model(self): | ||
"""Initialize the model used for computing the embeddings.""" | ||
# Provide defaults if not specified | ||
model = self.embedding_model or "sentence-transformers/all-MiniLM-L6-v2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some defaults in llmrails.py
therefore in "normal" usage these are never None
. We should at least the same defaults (e.g. FastEmbed
):
On my side, there are some type checking errors that might happen (for example, having None
here) by just using static analysis tools, but the actual "normal" usage flow in Guardrails makes it never happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed these defaults
nemoguardrails/embeddings/basic.py
Outdated
if self._model is None: | ||
self._init_model() | ||
|
||
if not self._model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already throwing an ValueError
in _init_model
if an error when initializing the model. Does this make sense to throw another one here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, removed this check and added a cast() to make it clear to Pyright that self._model can't be None at this point
|
||
# We check if we reached the max batch size | ||
if len(self._req_queue) >= self.max_batch_size: | ||
if not self._current_batch_full_event: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again this is redundant, self._current_batch_full_event
cannot be None
here as per earlier check and assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code statically, self._current_batch_full_event can be None if self._current_batch_finished_event is not None. Is there something about the code that prevents this from happeneing?
nemoguardrails/embeddings/cache.py
Outdated
return EmbeddingsCacheConfig( | ||
key_generator=self._key_generator.name, | ||
store=self._cache_store.name, | ||
key_generator=self._key_generator.name if self._key_generator else "sha256", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There already is a default here in EmbeddingsCacheConfig
:
key_generator: str = Field(
default="sha256",
description="The method to use for generating the cache keys.",
)
Converting to draft while I rebase on the latest changes to develop. |
7fa0c6f
to
1b6eaab
Compare
Documentation preview |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Requesting review after rebasing on top of develop and addressing feedback in comments from Pouyan and Traian. cc @Pouyanpi , @trebedea , @cparisien |
Description
Cleaned the nemoguardrails/embeddings directory using Pyright.
This report summarizes the type-safety fixes applied to the
nemoguardrails
embeddings module. The changes are categorized into medium and low-risk buckets based on their potential to impact existing functionality.🟡 Medium-Risk Changes
These changes involve refactoring class initialization and adding explicit null-safety checks in critical code paths. While they improve robustness, they alter error handling and make assumptions about default behavior, which introduces a moderate risk of unintended side effects.
1. Refactoring
BasicEmbeddingsIndex
for Type SafetyThe
BasicEmbeddingsIndex
class was refactored to properly initialize instance attributes and handle potentially uninitialized async components, preventingAttributeError
at runtime.nemoguardrails/embeddings/basic.py
AttributeError
if async methods like_run_batch
were called before certain event objects were created. Class attributes were also implicitly treated as instance attributes, leading to poor type inference.__init__
. In critical async methods, explicit checks were added to ensure event loop objects (_current_batch_full_event
, etc.) are notNone
before they are accessed, raising a descriptiveException
instead of a genericAttributeError
. The_init_model
method now provides sensible defaults if a model or engine is not specified.None
when accessed, it represents an unrecoverable state error, justifying an exception.Exception
, a custom exception type could have been used for more specific error handling. However, the current fix is sufficient to prevent the runtime crash and clearly signals the programming error.2. Making
EmbeddingsCache
Robust AgainstNone
The
EmbeddingsCache
class was hardened to preventAttributeError
when its core components (_key_generator
,_cache_store
) are not configured.nemoguardrails/embeddings/cache.py
TypeError
orAttributeError
would occur ifget
orset
were called on a cache instance that was not fully configured (e.g.,_cache_store
wasNone
).get
andset
methods now begin with a guard clause that checks if the key generator and cache store have been initialized. If not, they exit early, withget
returningNone
andset
doing nothing. This makes the cache's behavior more predictable when it's disabled or misconfigured.ConfigurationError
if the methods are called on an uninitialized cache. The current implementation prioritizes graceful degradation over strictness, which is a reasonable choice for an optional component like a cache.🟢 Low-Risk Changes
These fixes are minor, defensive additions that silence type-checker warnings and prevent simple errors without altering program logic. They are highly unlikely to introduce any new bugs.
1. Ignoring Missing Type Stubs in Third-Party Libraries
Type checkers were reporting errors for popular libraries that do not ship with type stubs. These have been silenced using
# type: ignore
.embeddings/basic.py
,embeddings/providers/fastembed.py
,embeddings/providers/openai.py
,embeddings/providers/sentence_transformers.py
annoy
,fastembed
,openai
,torch
, andsentence_transformers
as missing type information.# type: ignore
to the import statements instructs static type checkers to skip validation for these lines. This has no effect on the runtime behavior of the code and is the standard way to handle dependencies that are not yet fully typed..pyi
) for these libraries, which would be a significant and unnecessary effort.2. Adding
Optional
Types and DefensiveNone
ChecksFunction signatures were updated with
Optional
to accurately reflect thatNone
is a valid input. Defensivehasattr
andis None
checks were added to prevent errors.nemoguardrails/embeddings/cache.py
AttributeError
when accessing the.name
attribute on a subclass that might not have defined it, orTypeError
when passingNone
to functions not expecting it.from_name
factory methods now usehasattr(subclass, 'name')
before attempting to access thename
attribute, making them more robust to incorrectly defined subclasses. Additionally, theredis
import is now handled lazily, with a check inRedisCacheStore
that provides a clear error message if the library is not installed.Test Plan
Type-checking
$ poetry run pre-commit run --all-files check yaml...............................................................Passed fix end of files.........................................................Passed trim trailing whitespace.................................................Passed isort (python)...........................................................Passed black....................................................................Passed Insert license in comments...............................................Passed pyright..................................................................Passed
Unit-tests
Local CLI check
Checklist