Skip to content

Consuming Stopped in Pinot 1.2.0 #11613

@donghun-cho

Description

@donghun-cho

Problem Description

After upgrading Pinot from version 1.0.0 to 1.2.0 and running it for several weeks, consumption stopped from one partition.
Later, consuming stopped from all partitions handled by a single Pinot server.

How to Check

In the Pinot server log, Helix's completed tasks value is not increasing, and the queued tasks value is increasing with active threads=40

stopped server log

[2024-10-10 17:38:06.394] INFO [HelixTaskExecutor] [ZkClient-EventThread-108-real-zookeeper:2181] Scheduling message b2f4a6ab-1ce2-42f8-9207-48020206c2f5: inspectorStatAgent00_REALTIME:inspectorStatAgent00__24__144__20241008T0216Z, ONLINE->OFFLINE
[2024-10-10 17:38:06.394] INFO [HelixTaskExecutor] [ZkClient-EventThread-108-real-zookeeper:2181] Submit task: b2f4a6ab-1ce2-42f8-9207-48020206c2f5 to pool: java.util.concurrent.ThreadPoolExecutor@7fa15569[Running, pool size = 40, active threads = 40, queued tasks = 8721, completed tasks = 52682]
[2024-10-10 17:38:06.394] INFO [HelixTaskExecutor] [ZkClient-EventThread-108-real-zookeeper:2181] Message: b2f4a6ab-1ce2-42f8-9207-48020206c2f5 handling task scheduled

What to Do!

This issue can be resolved by restarting the Pinot server, but that won't prevent the problem from recurring.
To avoid data loss, adjust the Kafka retention period and the replicasPerPartition value of the realtime table.

Pinot already has a pull request for this issue. apache/pinot#13632


More Details on the Cause of the Problem

For a single partition, the following occurs sequentially

  1. helix task (OFFLINE->CONSUMING) scheduled for new segment
  2. A request to /segmentConsumed returns a KEEP response.
    • which will make consuming thread to build(complete) segment

From the thread dump, the consuming thread is waiting for the segment lock, which is held by the (OFFLINE->CONSUMING) task thread.
The (OFFLINE->CONSUMING) task needs to acquire the _partitionGroupConsumerSemaphore, which is not released by the consuming thread.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions