|
95 | 95 | %% |
96 | 96 | %% The key purpose of also sending messages directly from the channels |
97 | 97 | %% to the mirrors is that without this, in the event of the death of |
98 | | -%% the master, messages could be lost until a suitable slave is |
99 | | -%% promoted. However, that is not the only reason. A slave cannot send |
| 98 | +%% the master, messages could be lost until a suitable mirror is |
| 99 | +%% promoted. However, that is not the only reason. A mirror cannot send |
100 | 100 | %% confirms for a message until it has seen it from the |
101 | 101 | %% channel. Otherwise, it might send a confirm to a channel for a |
102 | 102 | %% message that it might *never* receive from that channel. This can |
|
108 | 108 | %% |
109 | 109 | %% Hence the mirrors have to wait until they've seen both the publish |
110 | 110 | %% via gm, and the publish via the channel before they issue the |
111 | | -%% confirm. Either form of publish can arrive first, and a slave can |
| 111 | +%% confirm. Either form of publish can arrive first, and a mirror can |
112 | 112 | %% be upgraded to the master at any point during this |
113 | 113 | %% process. Confirms continue to be issued correctly, however. |
114 | 114 | %% |
115 | | -%% Because the slave is a full process, it impersonates parts of the |
| 115 | +%% Because the mirror is a full process, it impersonates parts of the |
116 | 116 | %% amqqueue API. However, it does not need to implement all parts: for |
117 | 117 | %% example, no ack or consumer-related message can arrive directly at |
118 | | -%% a slave from a channel: it is only publishes that pass both |
| 118 | +%% a mirror from a channel: it is only publishes that pass both |
119 | 119 | %% directly to the mirrors and go via gm. |
120 | 120 | %% |
121 | 121 | %% Slaves can be added dynamically. When this occurs, there is no |
122 | 122 | %% attempt made to sync the current contents of the master with the |
123 | | -%% new slave, thus the slave will start empty, regardless of the state |
124 | | -%% of the master. Thus the slave needs to be able to detect and ignore |
| 123 | +%% new slave, thus the mirror will start empty, regardless of the state |
| 124 | +%% of the master. Thus the mirror needs to be able to detect and ignore |
125 | 125 | %% operations which are for messages it has not received: because of |
126 | 126 | %% the strict FIFO nature of queues in general, this is |
127 | | -%% straightforward - all new publishes that the new slave receives via |
| 127 | +%% straightforward - all new publishes that the new mirror receives via |
128 | 128 | %% gm should be processed as normal, but fetches which are for |
129 | | -%% messages the slave has never seen should be ignored. Similarly, |
130 | | -%% acks for messages the slave never fetched should be |
| 129 | +%% messages the mirror has never seen should be ignored. Similarly, |
| 130 | +%% acks for messages the mirror never fetched should be |
131 | 131 | %% ignored. Similarly, we don't republish rejected messages that we |
132 | 132 | %% haven't seen. Eventually, as the master is consumed from, the |
133 | 133 | %% messages at the head of the queue which were there before the slave |
134 | | -%% joined will disappear, and the slave will become fully synced with |
| 134 | +%% joined will disappear, and the mirror will become fully synced with |
135 | 135 | %% the state of the master. |
136 | 136 | %% |
137 | 137 | %% The detection of the sync-status is based on the depth of the BQs, |
138 | 138 | %% where the depth is defined as the sum of the length of the BQ (as |
139 | 139 | %% per BQ:len) and the messages pending an acknowledgement. When the |
140 | | -%% depth of the slave is equal to the master's, then the slave is |
| 140 | +%% depth of the mirror is equal to the master's, then the mirror is |
141 | 141 | %% synchronised. We only store the difference between the two for |
142 | 142 | %% simplicity. Comparing the length is not enough since we need to |
143 | 143 | %% take into account rejected messages which will make it back into |
144 | 144 | %% the master queue but can't go back in the slave, since we don't |
145 | | -%% want "holes" in the slave queue. Note that the depth, and the |
146 | | -%% length likewise, must always be shorter on the slave - we assert |
| 145 | +%% want "holes" in the mirror queue. Note that the depth, and the |
| 146 | +%% length likewise, must always be shorter on the mirror - we assert |
147 | 147 | %% that in various places. In case mirrors are joined to an empty queue |
148 | 148 | %% which only goes on to receive publishes, they start by asking the |
149 | 149 | %% master to broadcast its depth. This is enough for mirrors to always |
150 | 150 | %% be able to work out when their head does not differ from the master |
151 | 151 | %% (and is much simpler and cheaper than getting the master to hang on |
152 | | -%% to the guid of the msg at the head of its queue). When a slave is |
| 152 | +%% to the guid of the msg at the head of its queue). When a mirror is |
153 | 153 | %% promoted to a master, it unilaterally broadcasts its depth, in |
154 | 154 | %% order to solve the problem of depth requests from new mirrors being |
155 | 155 | %% unanswered by a dead master. |
|
171 | 171 | %% over gm, the mirrors must convert the msg_ids to acktags (a mapping |
172 | 172 | %% the mirrors themselves must maintain). |
173 | 173 | %% |
174 | | -%% When the master dies, a slave gets promoted. This will be the |
175 | | -%% eldest slave, and thus the hope is that that slave is most likely |
| 174 | +%% When the master dies, a mirror gets promoted. This will be the |
| 175 | +%% eldest slave, and thus the hope is that that mirror is most likely |
176 | 176 | %% to be sync'd with the master. The design of gm is that the |
177 | 177 | %% notification of the death of the master will only appear once all |
178 | 178 | %% messages in-flight from the master have been fully delivered to all |
179 | | -%% members of the gm group. Thus at this point, the slave that gets |
| 179 | +%% members of the gm group. Thus at this point, the mirror that gets |
180 | 180 | %% promoted cannot broadcast different events in a different order |
181 | 181 | %% than the master for the same msgs: there is no possibility for the |
182 | 182 | %% same msg to be processed by the old master and the new master - if |
183 | 183 | %% it was processed by the old master then it will have been processed |
184 | | -%% by the slave before the slave was promoted, and vice versa. |
| 184 | +%% by the mirror before the mirror was promoted, and vice versa. |
185 | 185 | %% |
186 | 186 | %% Upon promotion, all msgs pending acks are requeued as normal, the |
187 | | -%% slave constructs state suitable for use in the master module, and |
| 187 | +%% mirror constructs state suitable for use in the master module, and |
188 | 188 | %% then dynamically changes into an amqqueue_process with the master |
189 | 189 | %% as the bq, and the slave's bq as the master's bq. Thus the very |
190 | | -%% same process that was the slave is now a full amqqueue_process. |
| 190 | +%% same process that was the mirror is now a full amqqueue_process. |
191 | 191 | %% |
192 | 192 | %% It is important that we avoid memory leaks due to the death of |
193 | 193 | %% senders (i.e. channels) and partial publications. A sender |
|
200 | 200 | %% then hold on to the message, assuming they'll receive some |
201 | 201 | %% instruction eventually from the master. Thus we have both mirrors |
202 | 202 | %% and the master monitor all senders they become aware of. But there |
203 | | -%% is a race: if the slave receives a DOWN of a sender, how does it |
| 203 | +%% is a race: if the mirror receives a DOWN of a sender, how does it |
204 | 204 | %% know whether or not the master is going to send it instructions |
205 | 205 | %% regarding those messages? |
206 | 206 | %% |
|
221 | 221 | %% master will ask the coordinator to set up a new monitor, and |
222 | 222 | %% will continue to process the messages normally. Slaves may thus |
223 | 223 | %% receive publishes via gm from previously declared "dead" senders, |
224 | | -%% but again, this is fine: should the slave have just thrown out the |
| 224 | +%% but again, this is fine: should the mirror have just thrown out the |
225 | 225 | %% message it had received directly from the sender (due to receiving |
226 | 226 | %% a sender_death message via gm), it will be able to cope with the |
227 | 227 | %% publication purely from the master via gm. |
228 | 228 | %% |
229 | | -%% When a slave receives a DOWN message for a sender, if it has not |
| 229 | +%% When a mirror receives a DOWN message for a sender, if it has not |
230 | 230 | %% received the sender_death message from the master via gm already, |
231 | 231 | %% then it will wait 20 seconds before broadcasting a request for |
232 | 232 | %% confirmation from the master that the sender really has died. |
|
235 | 235 | %% sender. The master will thus monitor the sender, receive the DOWN, |
236 | 236 | %% and subsequently broadcast the sender_death message, allowing the |
237 | 237 | %% mirrors to tidy up. This process can repeat for the same sender: |
238 | | -%% consider one slave receives the publication, then the DOWN, then |
| 238 | +%% consider one mirror receives the publication, then the DOWN, then |
239 | 239 | %% asks for confirmation of death, then the master broadcasts the |
240 | | -%% sender_death message. Only then does another slave receive the |
| 240 | +%% sender_death message. Only then does another mirror receive the |
241 | 241 | %% publication and thus set up its monitoring. Eventually that slave |
242 | 242 | %% too will receive the DOWN, ask for confirmation and the master will |
243 | 243 | %% monitor the sender again, receive another DOWN, and send out |
244 | 244 | %% another sender_death message. Given the 20 second delay before |
245 | 245 | %% requesting death confirmation, this is highly unlikely, but it is a |
246 | 246 | %% possibility. |
247 | 247 | %% |
248 | | -%% When the 20 second timer expires, the slave first checks to see |
| 248 | +%% When the 20 second timer expires, the mirror first checks to see |
249 | 249 | %% whether it still needs confirmation of the death before requesting |
250 | 250 | %% it. This prevents unnecessary traffic on gm as it allows one |
251 | 251 | %% broadcast of the sender_death message to satisfy many mirrors. |
252 | 252 | %% |
253 | | -%% If we consider the promotion of a slave at this point, we have two |
254 | | -%% possibilities: that of the slave that has received the DOWN and is |
| 253 | +%% If we consider the promotion of a mirror at this point, we have two |
| 254 | +%% possibilities: that of the mirror that has received the DOWN and is |
255 | 255 | %% thus waiting for confirmation from the master that the sender |
256 | | -%% really is down; and that of the slave that has not received the |
| 256 | +%% really is down; and that of the mirror that has not received the |
257 | 257 | %% DOWN. In the first case, in the act of promotion to master, the new |
258 | 258 | %% master will monitor again the dead sender, and after it has |
259 | 259 | %% finished promoting itself, it should find another DOWN waiting, |
260 | 260 | %% which it will then broadcast. This will allow mirrors to tidy up as |
261 | 261 | %% normal. In the second case, we have the possibility that |
262 | 262 | %% confirmation-of-sender-death request has been broadcast, but that |
263 | | -%% it was broadcast before the master failed, and that the slave being |
| 263 | +%% it was broadcast before the master failed, and that the mirror being |
264 | 264 | %% promoted does not know anything about that sender, and so will not |
265 | | -%% monitor it on promotion. Thus a slave that broadcasts such a |
| 265 | +%% monitor it on promotion. Thus a mirror that broadcasts such a |
266 | 266 | %% request, at the point of broadcasting it, recurses, setting another |
267 | 267 | %% 20 second timer. As before, on expiry of the timer, the mirrors |
268 | 268 | %% checks to see whether it still has not received a sender_death |
269 | 269 | %% message for the dead sender, and if not, broadcasts a death |
270 | 270 | %% confirmation request. Thus this ensures that even when a master |
271 | | -%% dies and the new slave has no knowledge of the dead sender, it will |
| 271 | +%% dies and the new mirror has no knowledge of the dead sender, it will |
272 | 272 | %% eventually receive a death confirmation request, shall monitor the |
273 | 273 | %% dead sender, receive the DOWN and broadcast the sender_death |
274 | 274 | %% message. |
|
281 | 281 | %% mirrors will receive it via gm, will publish it to their BQ and will |
282 | 282 | %% set up monitoring on the sender. They will then receive the DOWN |
283 | 283 | %% message and the master will eventually publish the corresponding |
284 | | -%% sender_death message. The slave will then be able to tidy up its |
| 284 | +%% sender_death message. The mirror will then be able to tidy up its |
285 | 285 | %% state as normal. |
286 | 286 | %% |
287 | 287 | %% Recovery of mirrored queues is straightforward: as nodes die, the |
288 | 288 | %% remaining nodes record this, and eventually a situation is reached |
289 | 289 | %% in which only one node is alive, which is the master. This is the |
290 | 290 | %% only node which, upon recovery, will resurrect a mirrored queue: |
291 | | -%% nodes which die and then rejoin as a slave will start off empty as |
| 291 | +%% nodes which die and then rejoin as a mirror will start off empty as |
292 | 292 | %% if they have no mirrored content at all. This is not surprising: to |
293 | 293 | %% achieve anything more sophisticated would require the master and |
294 | | -%% recovering slave to be able to check to see whether they agree on |
| 294 | +%% recovering mirror to be able to check to see whether they agree on |
295 | 295 | %% the last seen state of the queue: checking depth alone is not |
296 | 296 | %% sufficient in this case. |
297 | 297 | %% |
@@ -361,8 +361,8 @@ handle_cast({gm_deaths, DeadGMPids}, State = #state{q = Q}) when ?amqqueue_pid_r |
361 | 361 | noreply(State); |
362 | 362 | {ok, _MPid0, DeadPids, _ExtraNodes} -> |
363 | 363 | %% see rabbitmq-server#914; |
364 | | - %% Different slave is now master, stop current coordinator normally. |
365 | | - %% Initiating queue is now slave and the least we could do is report |
| 364 | + %% Different mirror is now master, stop current coordinator normally. |
| 365 | + %% Initiating queue is now mirror and the least we could do is report |
366 | 366 | %% deaths which we 'think' we saw. |
367 | 367 | %% NOTE: Reported deaths here, could be inconsistent. |
368 | 368 | rabbit_mirror_queue_misc:report_deaths(MPid, false, QueueName, |
@@ -416,7 +416,7 @@ code_change(_OldVsn, State, _Extra) -> |
416 | 416 |
|
417 | 417 | handle_pre_hibernate(State = #state { gm = GM }) -> |
418 | 418 | %% Since GM notifications of deaths are lazy we might not get a |
419 | | - %% timely notification of slave death if policy changes when |
| 419 | + %% timely notification of mirror death if policy changes when |
420 | 420 | %% everything is idle. So cause some activity just before we |
421 | 421 | %% sleep. This won't cause us to go into perpetual motion as the |
422 | 422 | %% heartbeat does not wake up coordinator or mirrors. |
|
0 commit comments