Skip to content

Commit 5357735

Browse files
committed
Add sensitive language availability privacy consideration
1 parent 74630d6 commit 5357735

File tree

1 file changed

+11
-1
lines changed

1 file changed

+11
-1
lines changed

index.bs

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -992,6 +992,8 @@ enum RewriterLength { "as-is", "shorter", "longer" };
992992
<p>This prevents the web developer-perceived progress from suddenly jumping from 0% to 90%, and then taking a long time to go from 90% to 100%. It also provides some protection against the (admittedly not very powerful) fingerprinting vector of measuring the current download progress across multiple sites.
993993
</div>
994994

995+
If the actual number of bytes necessary to download is 0, but the user agent is faking a download for the reasons described in [[#privacy]], then set this number to an [=implementation-defined=] value that helps with the download faking.
996+
995997
1. Let |lastProgressFraction| be 0.
996998

997999
1. Let |lastProgressTime| be the [=monotonic clock=]'s [=monotonic clock/unsafe current time=].
@@ -1006,7 +1008,7 @@ enum RewriterLength { "as-is", "shorter", "longer" };
10061008

10071009
1. Abort these steps.
10081010

1009-
1. Let |bytesSoFar| be the number of bytes downloaded so far.
1011+
1. Let |bytesSoFar| be the number of bytes downloaded so far. (Or the number of bytes fake-downloaded so far, if the user agent is faking the download.)
10101012

10111013
1. [=Assert=]: |bytesSoFar| is greater than or equal to 0, and less than or equal to |totalBytes|.
10121014

@@ -1566,6 +1568,14 @@ A slight variant of this is to re-download the model every time it is requested
15661568

15671569
Going further, a user agent could attempt to fake the download for new [=storage keys=] by just waiting for a similar amount of time as the real download originally took. This then only spends the user's time, sparing their bandwidth and disk space. However, this is less private than the above alternatives, due to the presence of network side channels. For example, a web page could attempt to detect the fake downloads by issuing network requests concurrent to the `create()` call, and noting that there is no change to network throughouput. The scheme of remembering the time the real download originally took can also be dangerous, as the first site to initiate the download could attempt to artificially inflate this time (using concurrent network requests) in order to communicate information to other sites that will initiate a fake download in the future, from which they can read the time taken. Nevertheless, something along these lines might be useful in some cases, implemented with caution and combined with other mitigations.
15681570

1571+
<h3 id="privacy-language-availability">Sensitive language availability</h3>
1572+
1573+
Even if the user agent mitigates most of the fingerprinting risks associated with the availability of AI models per [[#privacy-availability]], such that probing availability requires a destructive action per [[#privacy-availability-creation]], the information about download availabilities for different languages can still be a privacy risk beyond fingerprinting. This is most obvious in the case of the translator API, where, for example, knowing that the user has downloaded a translator from English to a minority language might be sensitive information. But it can apply just as well to other APIs, via options such as their expected input languages, which might be implemented using downloadable fine-tunings with variable availability.
1574+
1575+
For this reason, on top of the creation-time mitigations discussed in [[#privacy-availability-creation]], <strong>user agents may artificially fake a download if they believe it would be helpful for privacy reasons</strong>, instead of instantly creating the model. This is *not* a fingerprinting mitigation, but instead provides some degree of plausible deniability for the user, such that web pages cannot be certain of the user's demographic information. If the web page sees model object creation taking 2–3 seconds and emitting {{CreateMonitor/downloadprogress}} events, then perhaps this is a fake download due to the user previously downloading a translator for that minority language, or perhaps it is a real download that completed quickly.
1576+
1577+
As discussed in [[#privacy-availability-alternatives]], such fake downloads are not foolproof, and a determined web page could attempt to detect them. However, they do provide some privacy benefit, and can be combined with other mitigations (such as prompts) to provide a more robust defense, and to make such demographic probing impractically unreliable for attackers.
1578+
15691579
<h3 id="privacy-model-version">Model version</h3>
15701580

15711581
Separate from the availability of a model, the specific version or behavior of a model can also be a fingerprinting vector.

0 commit comments

Comments
 (0)