@@ -155,6 +155,23 @@ should allow a 3PID to be mapped to a Matrix user identity, but not in the other
155155direction (i.e. one should not be able to get all 3PIDs associated with a Matrix
156156user ID, or get all 3PIDs associated with a 3PID).
157157
158+ Version 1 API deprecation
159+ -------------------------
160+
161+ .. TODO: Remove this section when the v1 API is removed.
162+
163+ As described on each of the version 1 endpoints, the v1 API is deprecated in
164+ favour of the v2 API described here. The major difference, with the exception
165+ of a few isolated cases, is that the v2 API requires authentication to ensure
166+ the user has given permission for the identity server to operate on their data.
167+
168+ The v1 API is planned to be removed from the specification in a future version.
169+
170+ Clients SHOULD attempt the v2 endpoints first, and if they receive a ``404 ``,
171+ ``400 ``, or similar error they should try the v1 endpoint or fail the operation.
172+ Clients are strongly encouraged to warn the user of the risks in using the v1 API,
173+ if they are planning on using it.
174+
158175Web browser clients
159176-------------------
160177
@@ -258,7 +275,134 @@ Association lookup
258275
259276{{lookup_is_http_api}}
260277
261- .. TODO: TravisR - Add v2 lookup API in future PR
278+ {{v2_lookup_is_http_api}}
279+
280+ Client behaviour
281+ ~~~~~~~~~~~~~~~~
282+
283+ .. TODO: Remove this note when v1 is removed completely
284+ .. Note ::
285+ This section only covers the v2 lookup endpoint. The v1 endpoint is described
286+ in isolation above.
287+
288+ Prior to performing a lookup clients SHOULD make a request to the ``/hash_details ``
289+ endpoint to determine what algorithms the server supports (described in more detail
290+ below). The client then uses this information to form a ``/lookup `` request and
291+ receive known bindings from the server.
292+
293+ Clients MUST support at least the ``sha256 `` algorithm.
294+
295+ Server behaviour
296+ ~~~~~~~~~~~~~~~~
297+
298+ .. TODO: Remove this note when v1 is removed completely
299+ .. Note ::
300+ This section only covers the v2 lookup endpoint. The v1 endpoint is described
301+ in isolation above.
302+
303+ Servers, upon receipt of a ``/lookup `` request, will compare the query against
304+ known bindings it has, hashing the identifiers it knows about as needed to
305+ verify exact matches to the request.
306+
307+ Servers MUST support at least the ``sha256 `` algorithm.
308+
309+ Algorithms
310+ ~~~~~~~~~~
311+
312+ Some algorithms are defined as part of the specification, however other formats
313+ can be negotiated between the client and server using ``/hash_details ``.
314+
315+ ``sha256 ``
316+ ++++++++++
317+
318+ This algorithm MUST be supported by clients and servers at a minimum. It is
319+ additionally the preferred algorithm for lookups.
320+
321+ When using this algorithm, the client converts the query first into strings
322+ separated by spaces in the format ``<address> <medium> <pepper> ``. The ``<pepper> ``
323+ is retrieved from ``/hash_details ``, the ``<medium> `` is typically ``email `` or
324+ ``msisdn `` (both lowercase), and the ``<address> `` is the 3PID to search for.
325+ For example, if the client wanted to know about ``
[email protected] ``'s bindings,
326+ it would first format the query as ``
[email protected] email ThePepperGoesHere``.
327+
328+ .. admonition :: Rationale
329+
330+ Mediums and peppers are appended to the address to prevent a common prefix
331+ for each 3PID, helping prevent attackers from pre-computing the internal state
332+ of the hash function.
333+
334+ After formatting each query, the string is run through SHA-256 as defined by
335+ `RFC 4634 <https://tools.ietf.org/html/rfc4634 >`_. The resulting bytes are then
336+ encoded using URL-Safe `Unpadded Base64 `_ (similar to `room version 4's
337+ event ID format <../../rooms/v4.html#event-ids> `_).
338+
339+ An example set of queries when using the pepper ``matrixrocks `` would be::
340+
341+ "[email protected] email matrixrocks" -> "4kenr7N9drpCJ4AfalmlGQVsOn3o2RHjkADUpXJWZUc" 342+ "[email protected] email matrixrocks" -> "LJwSazmv46n0hlMlsb_iYxI0_HXEqy_yj6Jm636cdT8" 343+ "18005552067 msisdn matrixrocks" -> "nlo35_T5fzSGZzJApqu8lgIudJvmOQtDaHtr-I4rU7I"
344+
345+
346+ The set of hashes is then given as the ``addresses `` array in ``/lookup ``. Note
347+ that the pepper used MUST be supplied as ``pepper `` in the ``/lookup `` request.
348+
349+ ``none ``
350+ ++++++++
351+
352+ This algorithm performs plaintext lookups on the identity server. Typically this
353+ algorithm should not be used due to the security concerns of unhashed identifiers,
354+ however some scenarios (such as LDAP-backed identity servers) prevent the use of
355+ hashed identifiers. Identity servers (and optionally clients) can use this algorithm
356+ to perform those kinds of lookups.
357+
358+ Similar to the ``sha256 `` algorithm, the client converts the queries into strings
359+ separated by spaces in the format ``<address> <medium> `` - note the lack of ``<pepper> ``.
360+ For example, if the client wanted to know about ``
[email protected] ``'s bindings,
361+ it would format the query as ``
[email protected] email``.
362+
363+ The formatted strings are then given as the ``addresses `` in ``/lookup ``. Note that
364+ the ``pepper `` is still required, and must be provided to ensure the client has made
365+ an appropriate request to ``/hash_details `` first.
366+
367+ Security considerations
368+ ~~~~~~~~~~~~~~~~~~~~~~~
369+
370+ .. Note ::
371+ `MSC2134 <https://github.com/matrix-org/matrix-doc/pull/2134 >`_ has much more
372+ information about the security considerations made for this section of the
373+ specification. This section covers the high-level details for why the specification
374+ is the way it is.
375+
376+ Typically the lookup endpoint is used when a client has an unknown 3PID it wants to
377+ find a Matrix User ID for. Clients normally do this kind of lookup when inviting new
378+ users to a room or searching a user's address book to find any Matrix users they may
379+ not have discovered yet. Rogue or malicious identity servers could harvest this
380+ unknown information and do nefarious things with it if it were sent in plain text.
381+ In order to protect the privacy of users who might not have a Matrix identifier bound
382+ to their 3PID addresses, the specification attempts to make it difficult to harvest
383+ 3PIDs.
384+
385+ .. admonition :: Rationale
386+
387+ Hashing identifiers, while not perfect, helps make the effort required to harvest
388+ identifiers significantly higher. Phone numbers in particular are still difficult
389+ to protect with hashing, however hashing is objectively better than not.
390+
391+ An alternative to hashing would be using bcrypt or similar with many rounds, however
392+ by nature of needing to serve mobile clients and clients on limited hardware the
393+ solution needs be kept relatively lightweight.
394+
395+ Clients should be cautious of servers not rotating their pepper very often, and
396+ potentially of servers which use a weak pepper - these servers may be attempting to
397+ brute force the identifiers or use rainbow tables to mine the addresses. Similarly,
398+ clients which support the ``none `` algorithm should consider at least warning the user
399+ of the risks in sending identifiers in plain text to the identity server.
400+
401+ Addresses are still potentially reversable using a calculated rainbow table given
402+ some identifiers, such as phone numbers, common email address domains, and leaked
403+ addresses are easily calculated. For example, phone numbers can have roughly 12
404+ digits to them, making them an easier target for attack than email addresses.
405+
262406
263407Establishing associations
264408-------------------------
0 commit comments