-
Notifications
You must be signed in to change notification settings - Fork 370
Description
Search before asking
- I searched in the issues and found nothing similar.
Fluss version
0.7.0 (latest release)
Please describe the bug 🐞
When a table is dropped on the CoordinatorServer, the table metadata is removed from ZooKeeper. TabletServers then perform replica cleanup actions asynchronously. However, if any TabletServer restarts before its local bucket data cleanup is completed, it will attempt to recover table logs from residual data in local disk of the already dropped table, which no longer has schema data in ZooKeeper. Consequently, the TabletServer will throw a fatal exception and be unable to restart.
The error messages are as follows:
2025-08-06 16:31:31,185 ERROR com.alibaba.fluss.server.ServerBase [] - Could not start TabletServer.
com.alibaba.fluss.exception.FlussException: Failed to start the TabletServer.
at com.alibaba.fluss.server.ServerBase.start(ServerBase.java:136) ~[fluss-server-0.8-SNAPSHOT.jar:0.8-SNAPSHOT]
at com.alibaba.fluss.server.ServerBase.startServer(ServerBase.java:93) [fluss-server-0.8-SNAPSHOT.jar:0.8-SNAPSHOT]
at com.alibaba.fluss.server.tablet.TabletServer.main(TabletServer.java:172) [fluss-server-0.8-SNAPSHOT.jar:0.8-SNAPSHOT]
Caused by: com.alibaba.fluss.exception.FlussRuntimeException: Failed to recovery log
at com.alibaba.fluss.server.log.LogManager.loadLogs(LogManager.java:213) ~[fluss-server-0.8-SNAPSHOT.jar:0.8-SNAPSHOT]
at com.alibaba.fluss.server.log.LogManager.startup(LogManager.java:129) ~[fluss-server-0.8-SNAPSHOT.jar:0.8-SNAPSHOT]
at com.alibaba.fluss.server.tablet.TabletServer.startServices(TabletServer.java:200) ~[fluss-server-0.8-SNAPSHOT.jar:0.8-SNAPSHOT]
at com.alibaba.fluss.server.ServerBase.start(ServerBase.java:123) ~[fluss-server-0.8-SNAPSHOT.jar:0.8-SNAPSHOT]
... 2 more
Solution
When TabletServers attempt to recover table logs from data whose corresponding table has no schema in ZooKeeper, they can confirm that the table has already been dropped. In this case, they can skip the recovery process, and the residual data can be safely removed.
Are you willing to submit a PR?
- I'm willing to submit a PR!