-
Hello! Versions: Setup: We're currently using a 3-server setup all serving metadata and storage with one management node (for now.) Everything appears to be working perfectly fine. However, when we run a "beegfs health net" from any clients, node1's "node_meta_1" is always connected via RDMA (as expected) but node2 and node3 seem to randomly drop out of the health net check. I can bring them back instantly running a "beegfs node ping" (which successfully pings all nodes very quickly) but this is only temporary as at some point the node2 and node3 meta nodes will drop back out again (from the health net output.) This doesn't appear to be impacting performance or clients or anything. But, it is somewhat concerning to see them missing all the time. I've check logs and don't see any errors or any information in regards to why these are dropping out of the health net check. All other health checks appear to be fine as well. Does anyone have any idea why this is happening? Known issue? Is it an issue? Is there some way to track down why this is happening? Any push in the right direction would be greatly appreciated! Here's an example of nodes gone:
Then, I run a
About an hour later, the node2 and node3 meta's are gone again. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
No idea what the problem is, but if that bothered me I'd create a node ping cronjob on the management node and schedule it ever 1800s. |
Beta Was this translation helpful? Give feedback.
@scaleoutsean
I appreciate the reply! This was my first thought as well so that's what I did. But, after speaking to BeeGFS tech support, they informed me that this was completely normal when there are no jobs running that would require the meta to be needed. So, I removed the cron job and we'll see what happens when actual jobs start running. Apparently, the meta nodes are simply smart enough to sync up when they need to. Otherwise, they just rest.