Skip to content

Commit 6cfd761

Browse files
committed
Spec the v2 lookup API
Spec for [MSC2134](#2134)
1 parent a24bcc2 commit 6cfd761

File tree

3 files changed

+291
-2
lines changed

3 files changed

+291
-2
lines changed

api/identity/lookup.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
# limitations under the License.
1717
swagger: '2.0'
1818
info:
19-
title: "Matrix Identity Service Lookup API"
19+
title: "Matrix Identity Service Lookup API"
2020
version: "1.0.0"
2121
host: localhost:8090
2222
schemes:

api/identity/v2_lookup.yaml

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# Copyright 2016 OpenMarket Ltd
2+
# Copyright 2017 Kamax.io
3+
# Copyright 2017 New Vector Ltd
4+
# Copyright 2018 New Vector Ltd
5+
# Copyright 2019 The Matrix.org Foundation C.I.C.
6+
#
7+
# Licensed under the Apache License, Version 2.0 (the "License");
8+
# you may not use this file except in compliance with the License.
9+
# You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing, software
14+
# distributed under the License is distributed on an "AS IS" BASIS,
15+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
# See the License for the specific language governing permissions and
17+
# limitations under the License.
18+
swagger: '2.0'
19+
info:
20+
title: "Matrix Identity Service Lookup API"
21+
version: "2.0.0"
22+
host: localhost:8090
23+
schemes:
24+
- https
25+
basePath: /_matrix/identity/v2
26+
consumes:
27+
- application/json
28+
produces:
29+
- application/json
30+
securityDefinitions:
31+
$ref: definitions/security.yaml
32+
paths:
33+
"/hash_details":
34+
get:
35+
summary: Gets hash function information from the server.
36+
description: |-
37+
Gets parameters for hashing identifiers from the server. This can include
38+
any of the algorithms defined in this specification.
39+
operationId: getHashDetails
40+
security:
41+
- accessToken: []
42+
parameters: []
43+
responses:
44+
200:
45+
description: The hash function information.
46+
examples:
47+
application/json: {
48+
"lookup_pepper": "matrixrocks",
49+
"algorithms": ["none", "sha256"]
50+
}
51+
schema:
52+
type: object
53+
properties:
54+
lookup_pepper:
55+
type: string
56+
description: |-
57+
The pepper the client MUST use in hashing identifiers, and MUST
58+
supply to the ``/lookup`` endpoint when performing lookups.
59+
60+
Servers SHOULD rotate this string often.
61+
algorithms:
62+
type: array
63+
items:
64+
type: string
65+
description: |-
66+
The algorithms the server supports. Must contain at least ``sha256``.
67+
required: ['lookup_pepper', 'algorithms']
68+
"/lookup":
69+
post:
70+
summary: Look up Matrix User IDs for a set of 3PIDs.
71+
description: |-
72+
Looks up the set of Matrix User IDs which have bound the 3PIDs given, if
73+
bindings are available. Note that the format of the addresses is defined
74+
later in this specification.
75+
operationId: lookupUsersV2
76+
security:
77+
- accessToken: []
78+
parameters:
79+
- in: body
80+
name: body
81+
schema:
82+
type: object
83+
properties:
84+
algorithm:
85+
type: string
86+
description: |-
87+
The algorithm the client is using to encode the ``addresses``. This
88+
should be one of the available options from ``/hash_details``.
89+
example: "sha256"
90+
pepper:
91+
type: string
92+
description: |-
93+
The pepper from ``/hash_details``. This is required even when the
94+
``algorithm`` does not make use of it.
95+
example: "matrixrocks"
96+
addresses:
97+
type: array
98+
items:
99+
type: string
100+
description: |-
101+
The addresses to look up. The format of the entries here depend on
102+
the ``algorithm`` used. Note that queries which have been incorrectly
103+
hashed or formatted will lead to no matches.
104+
example: [
105+
"4kenr7N9drpCJ4AfalmlGQVsOn3o2RHjkADUpXJWZUc",
106+
"nlo35_T5fzSGZzJApqu8lgIudJvmOQtDaHtr-I4rU7I"
107+
]
108+
required: ['algorithm', 'pepper', 'addresses']
109+
responses:
110+
200:
111+
description:
112+
The associations for any matched ``addresses``.
113+
examples:
114+
application/json: {
115+
"mappings": {
116+
"4kenr7N9drpCJ4AfalmlGQVsOn3o2RHjkADUpXJWZUc": "@alice:example.org"
117+
}
118+
}
119+
schema:
120+
type: object
121+
properties:
122+
mappings:
123+
type: object
124+
description: |-
125+
Any applicable mappings of ``addresses`` to Matrix User IDs. Addresses
126+
which do not have associations will not be included, which can make
127+
this property be an empty object.
128+
title: AssociatedMappings
129+
additionalProperties:
130+
type: string
131+
required: ['mappings']
132+
400:
133+
description:
134+
The client's request was invalid in some way. One possible problem could
135+
be the ``pepper`` being invalid after the server has rotated it - this is
136+
presented with the ``M_INVALID_PEPPER`` error code. Clients SHOULD make
137+
a call to ``/hash_details`` to get a new pepper in this scenario, being
138+
careful to avoid retry loops.
139+
examples:
140+
application/json: {
141+
"errcode": "M_INVALID_PEPPER",
142+
"error": "Unknown or invalid pepper - has it been rotated?"
143+
}
144+
schema:
145+
$ref: "../client-server/definitions/errors/error.yaml"

specification/identity_service_api.rst

Lines changed: 145 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,23 @@ should allow a 3PID to be mapped to a Matrix user identity, but not in the other
155155
direction (i.e. one should not be able to get all 3PIDs associated with a Matrix
156156
user ID, or get all 3PIDs associated with a 3PID).
157157

158+
Version 1 API deprecation
159+
-------------------------
160+
161+
.. TODO: Remove this section when the v1 API is removed.
162+
163+
As described on each of the version 1 endpoints, the v1 API is deprecated in
164+
favour of the v2 API described here. The major difference, with the exception
165+
of a few isolated cases, is that the v2 API requires authentication to ensure
166+
the user has given permission for the identity server to operate on their data.
167+
168+
The v1 API is planned to be removed from the specification in a future version.
169+
170+
Clients SHOULD attempt the v2 endpoints first, and if they receive a ``404``,
171+
``400``, or similar error they should try the v1 endpoint or fail the operation.
172+
Clients are strongly encouraged to warn the user of the risks in using the v1 API,
173+
if they are planning on using it.
174+
158175
Web browser clients
159176
-------------------
160177

@@ -258,7 +275,134 @@ Association lookup
258275

259276
{{lookup_is_http_api}}
260277

261-
.. TODO: TravisR - Add v2 lookup API in future PR
278+
{{v2_lookup_is_http_api}}
279+
280+
Client behaviour
281+
~~~~~~~~~~~~~~~~
282+
283+
.. TODO: Remove this note when v1 is removed completely
284+
.. Note::
285+
This section only covers the v2 lookup endpoint. The v1 endpoint is described
286+
in isolation above.
287+
288+
Prior to performing a lookup clients SHOULD make a request to the ``/hash_details``
289+
endpoint to determine what algorithms the server supports (described in more detail
290+
below). The client then uses this information to form a ``/lookup`` request and
291+
receive known bindings from the server.
292+
293+
Clients MUST support at least the ``sha256`` algorithm.
294+
295+
Server behaviour
296+
~~~~~~~~~~~~~~~~
297+
298+
.. TODO: Remove this note when v1 is removed completely
299+
.. Note::
300+
This section only covers the v2 lookup endpoint. The v1 endpoint is described
301+
in isolation above.
302+
303+
Servers, upon receipt of a ``/lookup`` request, will compare the query against
304+
known bindings it has, hashing the identifiers it knows about as needed to
305+
verify exact matches to the request.
306+
307+
Servers MUST support at least the ``sha256`` algorithm.
308+
309+
Algorithms
310+
~~~~~~~~~~
311+
312+
Some algorithms are defined as part of the specification, however other formats
313+
can be negotiated between the client and server using ``/hash_details``.
314+
315+
``sha256``
316+
++++++++++
317+
318+
This algorithm MUST be supported by clients and servers at a minimum. It is
319+
additionally the preferred algorithm for lookups.
320+
321+
When using this algorithm, the client converts the query first into strings
322+
separated by spaces in the format ``<address> <medium> <pepper>``. The ``<pepper>``
323+
is retrieved from ``/hash_details``, the ``<medium>`` is typically ``email`` or
324+
``msisdn`` (both lowercase), and the ``<address>`` is the 3PID to search for.
325+
For example, if the client wanted to know about ``[email protected]``'s bindings,
326+
it would first format the query as ``[email protected] email ThePepperGoesHere``.
327+
328+
.. admonition:: Rationale
329+
330+
Mediums and peppers are appended to the address to prevent a common prefix
331+
for each 3PID, helping prevent attackers from pre-computing the internal state
332+
of the hash function.
333+
334+
After formatting each query, the string is run through SHA-256 as defined by
335+
`RFC 4634 <https://tools.ietf.org/html/rfc4634>`_. The resulting bytes are then
336+
encoded using URL-Safe `Unpadded Base64`_ (similar to `room version 4's
337+
event ID format <../../rooms/v4.html#event-ids>`_).
338+
339+
An example set of queries when using the pepper ``matrixrocks`` would be::
340+
341+
"[email protected] email matrixrocks" -> "4kenr7N9drpCJ4AfalmlGQVsOn3o2RHjkADUpXJWZUc"
342+
"[email protected] email matrixrocks" -> "LJwSazmv46n0hlMlsb_iYxI0_HXEqy_yj6Jm636cdT8"
343+
"18005552067 msisdn matrixrocks" -> "nlo35_T5fzSGZzJApqu8lgIudJvmOQtDaHtr-I4rU7I"
344+
345+
346+
The set of hashes is then given as the ``addresses`` array in ``/lookup``. Note
347+
that the pepper used MUST be supplied as ``pepper`` in the ``/lookup`` request.
348+
349+
``none``
350+
++++++++
351+
352+
This algorithm performs plaintext lookups on the identity server. Typically this
353+
algorithm should not be used due to the security concerns of unhashed identifiers,
354+
however some scenarios (such as LDAP-backed identity servers) prevent the use of
355+
hashed identifiers. Identity servers (and optionally clients) can use this algorithm
356+
to perform those kinds of lookups.
357+
358+
Similar to the ``sha256`` algorithm, the client converts the queries into strings
359+
separated by spaces in the format ``<address> <medium>`` - note the lack of ``<pepper>``.
360+
For example, if the client wanted to know about ``[email protected]``'s bindings,
361+
it would format the query as ``[email protected] email``.
362+
363+
The formatted strings are then given as the ``addresses`` in ``/lookup``. Note that
364+
the ``pepper`` is still required, and must be provided to ensure the client has made
365+
an appropriate request to ``/hash_details`` first.
366+
367+
Security considerations
368+
~~~~~~~~~~~~~~~~~~~~~~~
369+
370+
.. Note::
371+
`MSC2134 <https://github.com/matrix-org/matrix-doc/pull/2134>`_ has much more
372+
information about the security considerations made for this section of the
373+
specification. This section covers the high-level details for why the specification
374+
is the way it is.
375+
376+
Typically the lookup endpoint is used when a client has an unknown 3PID it wants to
377+
find a Matrix User ID for. Clients normally do this kind of lookup when inviting new
378+
users to a room or searching a user's address book to find any Matrix users they may
379+
not have discovered yet. Rogue or malicious identity servers could harvest this
380+
unknown information and do nefarious things with it if it were sent in plain text.
381+
In order to protect the privacy of users who might not have a Matrix identifier bound
382+
to their 3PID addresses, the specification attempts to make it difficult to harvest
383+
3PIDs.
384+
385+
.. admonition:: Rationale
386+
387+
Hashing identifiers, while not perfect, helps make the effort required to harvest
388+
identifiers significantly higher. Phone numbers in particular are still difficult
389+
to protect with hashing, however hashing is objectively better than not.
390+
391+
An alternative to hashing would be using bcrypt or similar with many rounds, however
392+
by nature of needing to serve mobile clients and clients on limited hardware the
393+
solution needs be kept relatively lightweight.
394+
395+
Clients should be cautious of servers not rotating their pepper very often, and
396+
potentially of servers which use a weak pepper - these servers may be attempting to
397+
brute force the identifiers or use rainbow tables to mine the addresses. Similarly,
398+
clients which support the ``none`` algorithm should consider at least warning the user
399+
of the risks in sending identifiers in plain text to the identity server.
400+
401+
Addresses are still potentially reversable using a calculated rainbow table given
402+
some identifiers, such as phone numbers, common email address domains, and leaked
403+
addresses are easily calculated. For example, phone numbers can have roughly 12
404+
digits to them, making them an easier target for attack than email addresses.
405+
262406

263407
Establishing associations
264408
-------------------------

0 commit comments

Comments
 (0)