DuckDB Marisa Extension by Query.Farm
The Marisa extension, developed by Query.Farm, adds MARISA (Matching Algorithm with Recursively Implemented StorAge) trie functionality for DuckDB. MARISA is a static and space-efficient trie data structure that enables fast string lookups, prefix searches, and predictive text operations.
MARISA tries are particularly useful for:
- Autocomplete/Type-ahead functionality: Use
marisa_predictive()
to find all completions for a partial string - Spell checking: Use
marisa_lookup()
to verify if words exist in a dictionary - URL routing: Efficiently match URL patterns and extract parameters
- IP address prefix matching: Network routing and firewall rules
- String deduplication: Compact storage of large sets of strings with common prefixes
marisa
is a DuckDB Community Extension.
You can now use this by using this SQL:
install marisa from community;
load marisa;
The marisa
extension provides several functions for working with MARISA tries:
Use the marisa_trie()
aggregate function to create a trie from string data:
CREATE TABLE employees(name TEXT);
INSERT INTO employees VALUES('Alice'), ('Bob'), ('Charlie'), ('David'), ('Eve'), ('Frank'), ('Mallory'), ('Megan'), ('Oscar'), ('Melissa');
-- Create a trie from the employee names
CREATE TABLE employees_trie AS SELECT marisa_trie(name) AS trie FROM employees;
SELECT trie, octet_length(trie) FROM employees_trie;
┌─────────────────────────────────────────────────────────────────┬────────────────────┐
│ trie │ octet_length(trie) │
│ blob │ int64 │
├─────────────────────────────────────────────────────────────────┼────────────────────┤
│ We love Marisa.\x00\x08\x00\x00\x00\x00\x00\x00\x00\xFD\x1B`\… │ 4160 │
└─────────────────────────────────────────────────────────────────┴────────────────────┘
Check if a string exists in the trie using marisa_lookup()
:
-- Check if 'Alice' exists in the trie (returns true)
SELECT marisa_lookup(trie, 'Alice') FROM employees_trie;
┌──────────────────────────────┐
│ marisa_lookup(trie, 'Alice') │
│ boolean │
├──────────────────────────────┤
│ true │
└──────────────────────────────┘
-- Check if 'Unknown' exists in the trie (returns false)
SELECT marisa_lookup(trie, 'Unknown') FROM employees_trie;
┌────────────────────────────────┐
│ marisa_lookup(trie, 'Unknown') │
│ boolean │
├────────────────────────────────┤
│ false │
└────────────────────────────────┘
Find all strings in the trie that are prefixes of a given string using marisa_common_prefix()
:
CREATE TABLE countries(name TEXT);
INSERT INTO countries VALUES ('U'), ('US'), ('USA');
CREATE TABLE countries_trie AS SELECT marisa_trie(name) AS trie FROM countries;
-- Find all prefixes of 'USA' (returns ['U', 'US', 'USA'])
SELECT marisa_common_prefix(trie, 'USA', 10) FROM countries_trie;
┌───────────────────────────────────────┐
│ marisa_common_prefix(trie, 'USA', 10) │
│ varchar[] │
├───────────────────────────────────────┤
│ [U, US, USA] │
└───────────────────────────────────────┘
Find all strings in the trie that start with a given prefix using marisa_predictive()
:
-- Find all names starting with 'Me' (returns ['Megan', 'Melissa'])
SELECT marisa_predictive(trie, 'Me', 10) FROM employees_trie;
┌───────────────────────────────────┐
│ marisa_predictive(trie, 'Me', 10) │
│ varchar[] │
├───────────────────────────────────┤
│ [Megan, Melissa] │
└───────────────────────────────────┘
Type: Aggregate Function Description: Creates a MARISA trie from string values in a column. Parameters:
column
(VARCHAR): Column containing strings to build the trie from
Example:
SELECT marisa_trie(name) FROM employees;
Type: Scalar Function Description: Checks if a string exists in the trie. Parameters:
trie
(BLOB): The trie created bymarisa_trie()
search_string
(VARCHAR): String to search for
Returns: BOOLEAN (true if found, false otherwise)
Example:
SELECT marisa_lookup(trie, 'Alice') FROM employees_trie;
Type: Scalar Function Description: Finds all strings in the trie that are prefixes of the search string. Parameters:
trie
(BLOB): The trie created bymarisa_trie()
search_string
(VARCHAR): String to find prefixes formax_results
(INTEGER): Maximum number of results to return
Returns: LIST(VARCHAR) - List of prefix matches
Example:
SELECT marisa_common_prefix(trie, 'USA', 10) FROM countries_trie;
-- Returns: ['U', 'US', 'USA']
Type: Scalar Function Description: Finds all strings in the trie that start with the given prefix. Parameters:
trie
(BLOB): The trie created bymarisa_trie()
prefix
(VARCHAR): Prefix to search formax_results
(INTEGER): Maximum number of results to return
Returns: LIST(VARCHAR) - List of strings starting with the prefix
Example:
SELECT marisa_predictive(trie, 'Me', 10) FROM employees_trie;
-- Returns: ['Megan', 'Melissa']