-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't workingregressionSomething that used to work no longer doesSomething that used to work no longer does
Description
Describe the bug
Wrong result when there are 2 count distinct in the select clause,
To Reproduce
I will share there file with @alamb because I cannot attached .parquet file here
Bug
-- BUG
SELECT COUNT(DISTINCT host) AS servers_count, count(distinct pool) as pool_count, server_role, os, env, datacenter from '/tmp/file.parquet' WHERE time >= '2024-02-25T00:00:00Z' and time < '2024-02-25T00:00:01Z' and server_role = 'mesg' GROUP BY server_role, os, env, datacenter;
+---------------+------------+-------------+---------+------------+------------+
| servers_count | pool_count | server_role | os | env | datacenter |
+---------------+------------+-------------+---------+------------+------------+
| 1 | 1 | mesg | windows | production | mn |
| 2 | 2 | mesg | windows | production | va | -- should have 4 servers_count and 3 pool_count
+---------------+------------+-------------+---------+------------+------------+Here are other queries running in DF CLI that tell me the right results
SELECT DISTINCT host, server_role, os, env, datacenter from '/tmp/file.parquet' WHERE time >= '2024-02-25T00:00:00Z' and time < '2024-02-25T00:00:01Z' and server_role = 'mesg' order by datacenter;
+---------------+-------------+---------+------------+------------+
| host | server_role | os | env | datacenter |
+---------------+-------------+---------+------------+------------+
| mpm-mesg1002c | mesg | windows | production | mn |
| vpm-mesg1001c | mesg | windows | production | va |
| vpm-mesg1005b | mesg | windows | production | va |
| vpm-mesg1008d | mesg | windows | production | va |
| vpm-mesg1007b | mesg | windows | production | va |
+---------------+-------------+---------+------------+------------+
SELECT DISTINCT pool, server_role, os, env, datacenter from '/tmp/file.parquet' WHERE time >= '2024-02-25T00:00:00Z' and time < '2024-02-25T00:00:01Z' and server_role = 'mesg' order by datacenter;
+------+-------------+---------+------------+------------+
| pool | server_role | os | env | datacenter |
+------+-------------+---------+------------+------------+
| c | mesg | windows | production | mn |
| b | mesg | windows | production | va |
| c | mesg | windows | production | va |
| d | mesg | windows | production | va |
+------+-------------+---------+------------+------------+
SELECT COUNT(DISTINCT host) AS servers_count, "server_role", "os", "env", "datacenter" from '/tmp/file.parquet' WHERE time >= '2024-02-25T00:00:00Z' and time < '2024-02-25T00:00:01Z' and server_role = 'mesg' GROUP BY "server_role", "os", "env", "datacenter";
+---------------+-------------+---------+------------+------------+
| servers_count | server_role | os | env | datacenter |
+---------------+-------------+---------+------------+------------+
| 4 | mesg | windows | production | va |
| 1 | mesg | windows | production | mn |
+---------------+-------------+---------+------------+------------+
SELECT count(distinct pool) as pool_count, "server_role", "os", "env", "datacenter" from '/tmp/file.parquet' WHERE time >= '2024-02-25T00:00:00Z' and time < '2024-02-25T00:00:01Z' and server_role = 'mesg' GROUP BY "server_role", "os", "env", "datacenter";
+------------+-------------+---------+------------+------------+
| pool_count | server_role | os | env | datacenter |
+------------+-------------+---------+------------+------------+
| 1 | mesg | windows | production | mn |
| 3 | mesg | windows | production | va |
+------------+-------------+---------+------------+------------+Expected behavior
SELECT COUNT(DISTINCT host) AS servers_count, count(distinct pool) as pool_count, server_role, os, env, datacenter from '/tmp/file.parquet' WHERE time >= '2024-02-25T00:00:00Z' and time < '2024-02-25T00:00:01Z' and server_role = 'mesg' GROUP BY server_role, os, env, datacenter;
+---------------+------------+-------------+---------+------------+------------+
| servers_count | pool_count | server_role | os | env | datacenter |
+---------------+------------+-------------+---------+------------+------------+
| 1 | 1 | mesg | windows | production | mn |
| 4 | 3 | mesg | windows | production | va |
+---------------+------------+-------------+---------+------------+------------+
### Additional context
_No response_
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingregressionSomething that used to work no longer doesSomething that used to work no longer does