4848% % +----------+ +-------+--------------+-----------...etc...
4949% % | | |
5050% % V V V
51- % % amqqueue_process---+ slave -----+ slave -----+ ...etc...
51+ % % amqqueue_process---+ mirror -----+ mirror -----+ ...etc...
5252% % | BQ = master----+ | | BQ = vq | | BQ = vq |
5353% % | | BQ = vq | | +-+-------+ +-+-------+
5454% % | +-+-------+ | | |
6363% % consumers
6464% %
6565% % The master is merely an implementation of bq, and thus is invoked
66- % % through the normal bq interface by the amqqueue_process. The slaves
66+ % % through the normal bq interface by the amqqueue_process. The mirrors
6767% % meanwhile are processes in their own right (as is the
68- % % coordinator). The coordinator and all slaves belong to the same gm
68+ % % coordinator). The coordinator and all mirrors belong to the same gm
6969% % group. Every member of a gm group receives messages sent to the gm
7070% % group. Because the master is the bq of amqqueue_process, it doesn't
7171% % have sole control over its mailbox, and as a result, the master
7272% % itself cannot be passed messages directly (well, it could by via
7373% % the amqqueue:run_backing_queue callback but that would induce
7474% % additional unnecessary loading on the master queue process), yet it
75- % % needs to react to gm events, such as the death of slaves . Thus the
75+ % % needs to react to gm events, such as the death of mirrors . Thus the
7676% % master creates the coordinator, and it is the coordinator that is
7777% % the gm callback module and event handler for the master.
7878% %
7979% % Consumers are only attached to the master. Thus the master is
80- % % responsible for informing all slaves when messages are fetched from
80+ % % responsible for informing all mirrors when messages are fetched from
8181% % the bq, when they're acked, and when they're requeued.
8282% %
83- % % The basic goal is to ensure that all slaves performs actions on
83+ % % The basic goal is to ensure that all mirrors performs actions on
8484% % their bqs in the same order as the master. Thus the master
8585% % intercepts all events going to its bq, and suitably broadcasts
86- % % these events on the gm. The slaves thus receive two streams of
86+ % % these events on the gm. The mirrors thus receive two streams of
8787% % events: one stream is via the gm, and one stream is from channels
8888% % directly. Whilst the stream via gm is guaranteed to be consistently
89- % % seen by all slaves , the same is not true of the stream via
89+ % % seen by all mirrors , the same is not true of the stream via
9090% % channels. For example, in the event of an unexpected death of a
9191% % channel during a publish, only some of the mirrors may receive that
9292% % publish. As a result of this problem, the messages broadcast over
93- % % the gm contain published content, and thus slaves can operate
93+ % % the gm contain published content, and thus mirrors can operate
9494% % successfully on messages that they only receive via the gm.
9595% %
9696% % The key purpose of also sending messages directly from the channels
97- % % to the slaves is that without this, in the event of the death of
97+ % % to the mirrors is that without this, in the event of the death of
9898% % the master, messages could be lost until a suitable slave is
9999% % promoted. However, that is not the only reason. A slave cannot send
100100% % confirms for a message until it has seen it from the
101101% % channel. Otherwise, it might send a confirm to a channel for a
102102% % message that it might *never* receive from that channel. This can
103- % % happen because new slaves join the gm ring (and thus receive
103+ % % happen because new mirrors join the gm ring (and thus receive
104104% % messages from the master) before inserting themselves in the
105105% % queue's mnesia record (which is what channels look at for routing).
106106% % As it turns out, channels will simply ignore such bogus confirms,
107107% % but relying on that would introduce a dangerously tight coupling.
108108% %
109- % % Hence the slaves have to wait until they've seen both the publish
109+ % % Hence the mirrors have to wait until they've seen both the publish
110110% % via gm, and the publish via the channel before they issue the
111111% % confirm. Either form of publish can arrive first, and a slave can
112112% % be upgraded to the master at any point during this
116116% % amqqueue API. However, it does not need to implement all parts: for
117117% % example, no ack or consumer-related message can arrive directly at
118118% % a slave from a channel: it is only publishes that pass both
119- % % directly to the slaves and go via gm.
119+ % % directly to the mirrors and go via gm.
120120% %
121121% % Slaves can be added dynamically. When this occurs, there is no
122122% % attempt made to sync the current contents of the master with the
144144% % the master queue but can't go back in the slave, since we don't
145145% % want "holes" in the slave queue. Note that the depth, and the
146146% % length likewise, must always be shorter on the slave - we assert
147- % % that in various places. In case slaves are joined to an empty queue
147+ % % that in various places. In case mirrors are joined to an empty queue
148148% % which only goes on to receive publishes, they start by asking the
149- % % master to broadcast its depth. This is enough for slaves to always
149+ % % master to broadcast its depth. This is enough for mirrors to always
150150% % be able to work out when their head does not differ from the master
151151% % (and is much simpler and cheaper than getting the master to hang on
152152% % to the guid of the msg at the head of its queue). When a slave is
153153% % promoted to a master, it unilaterally broadcasts its depth, in
154- % % order to solve the problem of depth requests from new slaves being
154+ % % order to solve the problem of depth requests from new mirrors being
155155% % unanswered by a dead master.
156156% %
157157% % Obviously, due to the async nature of communication across gm, the
158- % % slaves can fall behind. This does not matter from a sync pov: if
158+ % % mirrors can fall behind. This does not matter from a sync pov: if
159159% % they fall behind and the master dies then a) no publishes are lost
160160% % because all publishes go to all mirrors anyway; b) the worst that
161161% % happens is that acks get lost and so messages come back to
164164% % but close enough for jazz).
165165% %
166166% % Because acktags are issued by the bq independently, and because
167- % % there is no requirement for the master and all slaves to use the
167+ % % there is no requirement for the master and all mirrors to use the
168168% % same bq, all references to msgs going over gm is by msg_id. Thus
169169% % upon acking, the master must convert the acktags back to msg_ids
170170% % (which happens to be what bq:ack returns), then sends the msg_ids
171- % % over gm, the slaves must convert the msg_ids to acktags (a mapping
172- % % the slaves themselves must maintain).
171+ % % over gm, the mirrors must convert the msg_ids to acktags (a mapping
172+ % % the mirrors themselves must maintain).
173173% %
174174% % When the master dies, a slave gets promoted. This will be the
175175% % eldest slave, and thus the hope is that that slave is most likely
196196% % mirrors to be able to detect this and tidy up as necessary to avoid
197197% % leaks. If we just had the master monitoring all senders then we
198198% % would have the possibility that a sender appears and only sends the
199- % % message to a few of the slaves before dying. Those slaves would
199+ % % message to a few of the mirrors before dying. Those mirrors would
200200% % then hold on to the message, assuming they'll receive some
201- % % instruction eventually from the master. Thus we have both slaves
201+ % % instruction eventually from the master. Thus we have both mirrors
202202% % and the master monitor all senders they become aware of. But there
203203% % is a race: if the slave receives a DOWN of a sender, how does it
204204% % know whether or not the master is going to send it instructions
209209% % coordinator receives a DOWN message from a sender, it informs the
210210% % master via a callback. This allows the master to do any tidying
211211% % necessary, but more importantly allows the master to broadcast a
212- % % sender_death message to all the slaves , saying the sender has
213- % % died. Once the slaves receive the sender_death message, they know
212+ % % sender_death message to all the mirrors , saying the sender has
213+ % % died. Once the mirrors receive the sender_death message, they know
214214% % that they're not going to receive any more instructions from the gm
215215% % regarding that sender. However, it is possible that the coordinator
216216% % receives the DOWN and communicates that to the master before the
230230% % received the sender_death message from the master via gm already,
231231% % then it will wait 20 seconds before broadcasting a request for
232232% % confirmation from the master that the sender really has died.
233- % % Should a sender have only sent a publish to slaves , this allows
234- % % slaves to inform the master of the previous existence of the
233+ % % Should a sender have only sent a publish to mirrors , this allows
234+ % % mirrors to inform the master of the previous existence of the
235235% % sender. The master will thus monitor the sender, receive the DOWN,
236236% % and subsequently broadcast the sender_death message, allowing the
237- % % slaves to tidy up. This process can repeat for the same sender:
237+ % % mirrors to tidy up. This process can repeat for the same sender:
238238% % consider one slave receives the publication, then the DOWN, then
239239% % asks for confirmation of death, then the master broadcasts the
240240% % sender_death message. Only then does another slave receive the
248248% % When the 20 second timer expires, the slave first checks to see
249249% % whether it still needs confirmation of the death before requesting
250250% % it. This prevents unnecessary traffic on gm as it allows one
251- % % broadcast of the sender_death message to satisfy many slaves .
251+ % % broadcast of the sender_death message to satisfy many mirrors .
252252% %
253253% % If we consider the promotion of a slave at this point, we have two
254254% % possibilities: that of the slave that has received the DOWN and is
257257% % DOWN. In the first case, in the act of promotion to master, the new
258258% % master will monitor again the dead sender, and after it has
259259% % finished promoting itself, it should find another DOWN waiting,
260- % % which it will then broadcast. This will allow slaves to tidy up as
260+ % % which it will then broadcast. This will allow mirrors to tidy up as
261261% % normal. In the second case, we have the possibility that
262262% % confirmation-of-sender-death request has been broadcast, but that
263263% % it was broadcast before the master failed, and that the slave being
264264% % promoted does not know anything about that sender, and so will not
265265% % monitor it on promotion. Thus a slave that broadcasts such a
266266% % request, at the point of broadcasting it, recurses, setting another
267- % % 20 second timer. As before, on expiry of the timer, the slaves
267+ % % 20 second timer. As before, on expiry of the timer, the mirrors
268268% % checks to see whether it still has not received a sender_death
269269% % message for the dead sender, and if not, broadcasts a death
270270% % confirmation request. Thus this ensures that even when a master
273273% % dead sender, receive the DOWN and broadcast the sender_death
274274% % message.
275275% %
276- % % The preceding commentary deals with the possibility of slaves
276+ % % The preceding commentary deals with the possibility of mirrors
277277% % receiving publications from senders which the master does not, and
278278% % the need to prevent memory leaks in such scenarios. The inverse is
279279% % also possible: a partial publication may cause only the master to
280280% % receive a publication. It will then publish the message via gm. The
281- % % slaves will receive it via gm, will publish it to their BQ and will
281+ % % mirrors will receive it via gm, will publish it to their BQ and will
282282% % set up monitoring on the sender. They will then receive the DOWN
283283% % message and the master will eventually publish the corresponding
284284% % sender_death message. The slave will then be able to tidy up its
@@ -419,7 +419,7 @@ handle_pre_hibernate(State = #state { gm = GM }) ->
419419 % % timely notification of slave death if policy changes when
420420 % % everything is idle. So cause some activity just before we
421421 % % sleep. This won't cause us to go into perpetual motion as the
422- % % heartbeat does not wake up coordinator or slaves .
422+ % % heartbeat does not wake up coordinator or mirrors .
423423 gm :broadcast (GM , hibernate_heartbeat ),
424424 {hibernate , State }.
425425
@@ -446,7 +446,7 @@ handle_msg([_CPid], _From, {delete_and_terminate, _Reason}) ->
446446 % % actually delivered. Then it calls handle_terminate/2 below so the
447447 % % coordinator is stopped.
448448 % %
449- % % If we stop the coordinator right now, remote slaves could see the
449+ % % If we stop the coordinator right now, remote mirrors could see the
450450 % % coordinator DOWN before delete_and_terminate was delivered to all
451451 % % GMs. One of those GM would be promoted as the master, and this GM
452452 % % would hang forever, waiting for other GMs to stop.
0 commit comments