diff --git a/docs/user/ppl/cmd/ad.rst b/docs/user/ppl/cmd/ad.rst index 938e6e79918..26502dea682 100644 --- a/docs/user/ppl/cmd/ad.rst +++ b/docs/user/ppl/cmd/ad.rst @@ -10,41 +10,43 @@ ad (deprecated by ml command) Description -============ +=========== | The ``ad`` command applies Random Cut Forest (RCF) algorithm in the ml-commons plugin on the search result returned by a PPL command. Based on the input, the command uses two types of RCF algorithms: fixed in time RCF for processing time-series data, batch RCF for processing non-time-series data. -Fixed In Time RCF For Time-series Data Command Syntax -===================================================== -ad +Syntax +====== -* number_of_trees(integer): optional. Number of trees in the forest. The default value is 30. -* shingle_size(integer): optional. A shingle is a consecutive sequence of the most recent records. The default value is 8. -* sample_size(integer): optional. The sample size used by stream samplers in this forest. The default value is 256. -* output_after(integer): optional. The number of points required by stream samplers before results are returned. The default value is 32. -* time_decay(double): optional. The decay factor used by stream samplers in this forest. The default value is 0.0001. -* anomaly_rate(double): optional. The anomaly rate. The default value is 0.005. -* time_field(string): mandatory. It specifies the time field for RCF to use as time-series data. -* date_format(string): optional. It's used for formatting time_field field. The default formatting is "yyyy-MM-dd HH:mm:ss". -* time_zone(string): optional. It's used for setting time zone for time_field filed. The default time zone is UTC. -* category_field(string): optional. It specifies the category field used to group inputs. Each category will be independently predicted. +Fixed In Time RCF For Time-series Data +-------------------------------------- +ad [number_of_trees] [shingle_size] [sample_size] [output_after] [time_decay] [anomaly_rate] [date_format] [time_zone] [category_field] +* number_of_trees: optional. Number of trees in the forest. **Default:** 30. +* shingle_size: optional. A shingle is a consecutive sequence of the most recent records. **Default:** 8. +* sample_size: optional. The sample size used by stream samplers in this forest. **Default:** 256. +* output_after: optional. The number of points required by stream samplers before results are returned. **Default:** 32. +* time_decay: optional. The decay factor used by stream samplers in this forest. **Default:** 0.0001. +* anomaly_rate: optional. The anomaly rate. **Default:** 0.005. +* time_field: mandatory. Specifies the time field for RCF to use as time-series data. +* date_format: optional. Used for formatting time_field. **Default:** "yyyy-MM-dd HH:mm:ss". +* time_zone: optional. Used for setting time zone for time_field. **Default:** "UTC". +* category_field: optional. Specifies the category field used to group inputs. Each category will be independently predicted. -Batch RCF for Non-time-series Data Command Syntax -================================================= -ad +Batch RCF For Non-time-series Data +---------------------------------- +ad [number_of_trees] [sample_size] [output_after] [training_data_size] [anomaly_score_threshold] [category_field] -* number_of_trees(integer): optional. Number of trees in the forest. The default value is 30. -* sample_size(integer): optional. Number of random samples given to each tree from the training data set. The default value is 256. -* output_after(integer): optional. The number of points required by stream samplers before results are returned. The default value is 32. -* training_data_size(integer): optional. The default value is the size of your training data set. -* anomaly_score_threshold(double): optional. The threshold of anomaly score. The default value is 1.0. -* category_field(string): optional. It specifies the category field used to group inputs. Each category will be independently predicted. +* number_of_trees: optional. Number of trees in the forest. **Default:** 30. +* sample_size: optional. Number of random samples given to each tree from the training data set. **Default:** 256. +* output_after: optional. The number of points required by stream samplers before results are returned. **Default:** 32. +* training_data_size: optional. **Default:** size of your training data set. +* anomaly_score_threshold: optional. The threshold of anomaly score. **Default:** 1.0. +* category_field: optional. Specifies the category field used to group inputs. Each category will be independently predicted. Example 1: Detecting events in New York City from taxi ridership data with time-series data =========================================================================================== -The example trains an RCF model and uses the model to detect anomalies in the time-series ridership data. +This example trains an RCF model and uses the model to detect anomalies in the time-series ridership data. PPL query:: @@ -59,7 +61,7 @@ PPL query:: Example 2: Detecting events in New York City from taxi ridership data with time-series data independently with each category ============================================================================================================================ -The example trains an RCF model and uses the model to detect anomalies in the time-series ridership data with multiple category values. +This example trains an RCF model and uses the model to detect anomalies in the time-series ridership data with multiple category values. PPL query:: @@ -76,7 +78,7 @@ PPL query:: Example 3: Detecting events in New York City from taxi ridership data with non-time-series data =============================================================================================== -The example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data. +This example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data. PPL query:: @@ -91,7 +93,7 @@ PPL query:: Example 4: Detecting events in New York City from taxi ridership data with non-time-series data independently with each category ================================================================================================================================ -The example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data with multiple category values. +This example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data with multiple category values. PPL query:: @@ -108,4 +110,3 @@ PPL query:: Limitations =========== The ``ad`` command can only work with ``plugins.calcite.enabled=false``. -It means ``ad`` command cannot work together with new PPL commands/functions introduced in 3.0.0 and above. diff --git a/docs/user/ppl/cmd/append.rst b/docs/user/ppl/cmd/append.rst index 25303aeb87b..4b8f461b5ac 100644 --- a/docs/user/ppl/cmd/append.rst +++ b/docs/user/ppl/cmd/append.rst @@ -1,6 +1,6 @@ -========= +====== append -========= +====== .. rubric:: Table of contents @@ -10,16 +10,12 @@ append Description -============ -| Using ``append`` command to append the result of a sub-search and attach it as additional rows to the bottom of the input search results (The main search). -The command aligns columns with the same field names and types. For different column fields between the main search and sub-search, NULL values are filled in the respective rows. - -Version -======= -3.3.0 +=========== +| The ``append`` command appends the result of a sub-search and attaches it as additional rows to the bottom of the input search results (The main search). +| The command aligns columns with the same field names and types. For different column fields between the main search and sub-search, NULL values are filled in the respective rows. Syntax -============ +====== append * sub-search: mandatory. Executes PPL commands as a secondary search. @@ -30,7 +26,7 @@ Limitations * **Schema Compatibility**: When fields with the same name exist between the main search and sub-search but have incompatible types, the query will fail with an error. To avoid type conflicts, ensure that fields with the same name have the same data type, or use different field names (e.g., by renaming with ``eval`` or using ``fields`` to select non-conflicting columns). Example 1: Append rows from a count aggregation to existing search result -=============================================================== +========================================================================= This example appends rows from "count by gender" to "sum by gender, state". @@ -50,7 +46,7 @@ PPL query:: +----------+--------+-------+------------+ Example 2: Append rows with merged column names -==================================================================================== +=============================================== This example appends rows from "sum by gender" to "sum by gender, state" with merged column of same field name and type. @@ -69,3 +65,21 @@ PPL query:: | 101 | M | null | +-----+--------+-------+ + +This example shows how column type conflicts are handled when appending results. Same name columns with different types will generate two different columns in appended result. + +PPL query:: + + os> source=accounts | stats sum(age) as sum by gender, state | sort -sum | head 5 | append [ source=accounts | stats sum(age) as sum by gender | eval sum = cast(sum as double) ]; + fetched rows / total rows = 6/6 + +------+--------+-------+-------+ + | sum | gender | state | sum0 | + |------+--------+-------+-------| + | 36 | M | TN | null | + | 33 | M | MD | null | + | 32 | M | IL | null | + | 28 | F | VA | null | + | null | F | null | 28.0 | + | null | M | null | 101.0 | + +------+--------+-------+-------+ + diff --git a/docs/user/ppl/cmd/appendcol.rst b/docs/user/ppl/cmd/appendcol.rst index b9eeeae83b8..a9cb714256b 100644 --- a/docs/user/ppl/cmd/appendcol.rst +++ b/docs/user/ppl/cmd/appendcol.rst @@ -11,47 +11,15 @@ appendcol Description ============ -| (Experimental) -| (From 3.1.0) -| Using ``appendcol`` command to append the result of a sub-search and attach it alongside with the input search results (The main search). - -Version -======= -3.1.0 +The ``appendcol`` command appends the result of a sub-search and attaches it alongside with the input search results (The main search). Syntax -============ +====== appendcol [override=] -* override=: optional. Boolean field to specify should result from main-result be overwritten in the case of column name conflict. +* override=: optional. Boolean field to specify should result from main-result be overwritten in the case of column name conflict. **Default:** false. * sub-search: mandatory. Executes PPL commands as a secondary search. The sub-search uses the same data specified in the source clause of the main search results as its input. -Configuration -============= -This command requires Calcite enabled. - -Enable Calcite:: - - >> curl -H 'Content-Type: application/json' -X PUT localhost:9200/_plugins/_query/settings -d '{ - "transient" : { - "plugins.calcite.enabled" : true - } - }' - -Result set:: - - { - "acknowledged": true, - "persistent": { - "plugins": { - "calcite": { - "enabled": "true" - } - } - }, - "transient": {} - } - Example 1: Append a count aggregation to existing search result =============================================================== @@ -103,6 +71,8 @@ PPL query:: Example 3: Append multiple sub-search results ============================================= +This example shows how to chain multiple appendcol commands to add columns from different sub-searches. + PPL query:: PPL> source=employees | fields name, dept, age | appendcol [ stats avg(age) as avg_age ] | appendcol [ stats max(age) as max_age ]; @@ -124,6 +94,8 @@ PPL query:: Example 4: Override case of column name conflict ================================================ +This example demonstrates the override option when column names conflict between main search and sub-search. + PPL query:: PPL> source=employees | stats avg(age) as agg by dept | appendcol override=true [ stats max(age) as agg by dept ]; diff --git a/docs/user/ppl/cmd/bin.rst b/docs/user/ppl/cmd/bin.rst index 1ebdc3f897e..d35a89b83ab 100644 --- a/docs/user/ppl/cmd/bin.rst +++ b/docs/user/ppl/cmd/bin.rst @@ -1,6 +1,6 @@ -============= +=== bin -============= +=== .. rubric:: Table of contents @@ -9,234 +9,48 @@ bin :depth: 2 -.. note:: - - Available since version 3.3 - - Description ============ -| The ``bin`` command groups numeric values into buckets of equal intervals, making it useful for creating histograms and analyzing data distribution. It takes a numeric field and generates a new field with values that represent the lower bound of each bucket. +The ``bin`` command groups numeric values into buckets of equal intervals, making it useful for creating histograms and analyzing data distribution. It takes a numeric field and generates a new field with values that represent the lower bound of each bucket. Syntax -============ +====== bin [span=] [minspan=] [bins=] [aligntime=(earliest | latest | )] [start=] [end=] * field: mandatory. The numeric field to bin. * span: optional. The interval size for each bin. Cannot be used with bins or minspan parameters. + * Supports numeric (e.g., ``1000``), logarithmic (e.g., ``log10``, ``2log10``), and time intervals + * Available time units: + * microsecond (us) + * millisecond (ms) + * centisecond (cs) + * decisecond (ds) + * second (s, sec, secs, second, seconds) + * minute (m, min, mins, minute, minutes) + * hour (h, hr, hrs, hour, hours) + * day (d, day, days) + * month (mon, month, months) * minspan: optional. The minimum interval size for automatic span calculation. Cannot be used with span or bins parameters. -* bins: optional. The maximum number of equal-width bins to create. Cannot be used with span or minspan parameters. +* bins: optional. The maximum number of equal-width bins to create. Cannot be used with span or minspan parameters. The bins parameter must be between 2 and 50000 (inclusive). * aligntime: optional. Align the bin times for time-based fields. Valid only for time-based discretization. Options: - - earliest: Align bins to the earliest timestamp in the data - - latest: Align bins to the latest timestamp in the data - - : Align bins to a specific epoch time value or time modifier expression -* start: optional. The starting value for binning range. If not specified, uses the minimum field value. -* end: optional. The ending value for binning range. If not specified, uses the maximum field value. - -Parameter Priority Order -======================== -When multiple parameters are specified, the bin command follows this priority order: - -1. **span** (highest priority) - Set the interval for binning -2. **minspan** (second priority) - Set the Minimum span for binning -3. **bins** (third priority) - Sets the maximum amount of bins -4. **start/end** (fourth priority) - Expand the range for binning -5. **default** (lowest priority) - Automatic magnitude-based binning - -**Note**: The **aligntime** parameter is a modifier that only applies to span-based binning (when using **span**) for time-based fields. It does not affect the priority order for bin type selection. - -Parameters -============ - -span Parameter --------------- -Specifies the width of each bin interval with support for multiple span types: - -**1. Numeric Span ** -- ``span=1000`` - Creates bins of width 1000 for numeric fields -- Calculation: ``floor(field / span) * span`` -- Dynamic binning: No artificial limits on number of bins, no "Other" category - -**2. Log-based Span (logarithmic binning)** -- **Syntax**: ``[]log[]`` or ``logN`` where N is the base -- **Examples**: - - ``span=log10`` - Base 10 logarithmic bins (coefficient=1) - - ``span=2log10`` - Base 10 with coefficient 2 - - ``span=log2`` - Base 2 logarithmic bins - - ``span=log3`` - Base 3 logarithmic bins (arbitrary base) - - ``span=1.5log3`` - Base 3 with coefficient 1.5 -- **Algorithm**: - - For each value: ``bin_number = floor(log_base(value/coefficient))`` - - Bin boundaries: ``[coefficient * base^n, coefficient * base^(n+1))`` - - Only creates bins where data exists (data-driven approach) -- **Rules**: - - Coefficient: Real number ≥ 1.0 and < base (optional, defaults to 1) - - Base: Real number > 1.0 (required) - - Creates logarithmic bin boundaries instead of linear - -**3. Time Scale Span (comprehensive time units)** -- **Subseconds**: ``us`` (microseconds), ``ms`` (milliseconds), ``cs`` (centiseconds), ``ds`` (deciseconds) -- **Seconds**: ``s``, ``sec``, ``secs``, ``second``, ``seconds`` -- **Minutes**: ``m``, ``min``, ``mins``, ``minute``, ``minutes`` -- **Hours**: ``h``, ``hr``, ``hrs``, ``hour``, ``hours`` -- **Days**: ``d``, ``day``, ``days`` - **Uses precise daily binning algorithm** -- **Months**: ``mon``, ``month``, ``months`` - **Uses precise monthly binning algorithm** -- **Examples**: - - ``span=30seconds`` - - ``span=15minutes`` - - ``span=2hours`` - - ``span=7days`` - - ``span=4months`` - - ``span=500ms`` - - ``span=100us`` - - ``span=50cs`` (centiseconds) - - ``span=2ds`` (deciseconds) - -**Daily Binning Algorithm (for day-based spans)** - -For daily spans (``1days``, ``7days``, ``30days``), the implementation uses a **precise daily binning algorithm** with Unix epoch reference: - -1. **Unix Epoch Reference**: Uses January 1, 1970 as the fixed reference point for all daily calculations -2. **Modular Arithmetic**: Calculates ``days_since_epoch % span_days`` to find position within span cycle -3. **Consistent Alignment**: Ensures identical input dates always produce identical bin start dates -4. **Date String Output**: Returns formatted date strings (``YYYY-MM-DD``) instead of timestamps - -**Algorithm Example**: For July 28, 2025 (day 20,297 since Unix epoch): -- ``span=6days``: 20,297 % 6 = 5 → bin starts July 23, 2025 (``"2025-07-23"``) -- ``span=7days``: 20,297 % 7 = 4 → bin starts July 24, 2025 (``"2025-07-24"``) - -**Monthly Binning Algorithm (for month-based spans)** - -For monthly spans (``1months``, ``4months``, ``6months``), the implementation uses a **precise monthly binning algorithm** with Unix epoch reference: - -1. **Unix Epoch Reference**: Uses January 1970 as the fixed reference point for all monthly calculations -2. **Modular Arithmetic**: Calculates ``months_since_epoch % span_months`` to find position within span cycle -3. **Consistent Alignment**: Ensures identical input dates always produce identical bin start months -4. **Month String Output**: Returns formatted month strings (``YYYY-MM``) instead of timestamps - -**Algorithm Example**: For July 2025 (666 months since Unix epoch): -- ``span=4months``: 666 % 4 = 2 → bin starts at month 664 = May 2025 (``"2025-05"``) -- ``span=6months``: 666 % 6 = 0 → bin starts at month 666 = July 2025 (``"2025-07"``) - -This ensures precise and consistent behavior for both daily and monthly binning operations. - -minspan Parameter ------------------ -Specifies the minimum allowed interval size using a magnitude-based algorithm. The algorithm works as follows: - -1. **Calculate default width**: ``10^FLOOR(LOG10(data_range))`` - the largest power of 10 that fits within the data range -2. **Apply minspan constraint**: - - If ``default_width >= minspan``: use the default width - - If ``default_width < minspan``: use ``10^CEIL(LOG10(minspan))`` - -This ensures bins use human-readable widths (powers of 10) while respecting the minimum span requirement. - -**Example**: For age data with range 20-40 (range=20) and minspan=11: -- Default width = 10^FLOOR(LOG10(20)) = 10^1 = 10 -- Since minspan=11 > 10, use 10^CEIL(LOG10(11)) = 10^2 = 100 -- Result: Single bin "0-100" covering all age values - -aligntime Parameter -------------------- -For time-based fields, aligntime allows you to specify how bins should be aligned. This parameter is essential for creating consistent time-based bins that align to meaningful boundaries like start of day, hour, etc. - -**Alignment Options:** - -* ``earliest``: Aligns bins to the earliest timestamp in the dataset -* ``latest``: Aligns bins to the latest timestamp in the dataset -* ````: Aligns bins to a specific epoch timestamp (e.g., 1640995200) -* ````: Aligns bins using time modifier expressions (standard-compatible) - -**Time Modifier Expressions:** - -Time modifiers provide a flexible way to align bins to specific time boundaries: - -* ``@d``: Align to start of day (00:00:00) -* ``@d+``: Align to start of day plus offset (e.g., ``@d+3h`` = 03:00:00) -* ``@d-``: Align to start of day minus offset (e.g., ``@d-1h`` = 23:00:00 previous day) - -**Supported Time Spans:** - -**Aligntime applies to:** -* ``us``, ``ms``, ``cs``, ``ds``: Subsecond units (microseconds, milliseconds, centiseconds, deciseconds) -* ``s``, ``sec``, ``secs``, ``seconds``: Seconds -* ``m``, ``min``, ``mins``, ``minutes``: Minutes -* ``h``, ``hr``, ``hrs``, ``hours``: Hours - -**Aligntime ignored for:** -* ``d``, ``days``: Days - automatically aligns to midnight using daily binning algorithm -* ``M``, ``months``: Months - automatically aligns to month start using monthly binning algorithm - -**How Aligntime Works:** - -The aligntime parameter modifies the binning calculation: -* **Without aligntime**: ``floor(timestamp / span) * span`` -* **With aligntime**: ``floor((timestamp - aligntime) / span) * span + aligntime`` -* **With day/month spans**: Aligntime is ignored, natural boundaries used via specialized algorithms - -This ensures that bins are aligned to meaningful time boundaries rather than arbitrary epoch-based intervals. - -bins Parameter --------------- -Automatically calculates the span using a mathematical O(1) algorithm to create human-readable bin widths based on powers of 10. - -**Validation**: The bins parameter must be between 2 and 50000 (inclusive). Values outside this range will result in an error. - -The algorithm uses **mathematical optimization** instead of iteration for O(1) performance: - -1. **Validate bins**: Ensure ``2 ≤ bins ≤ 50000`` -2. **Calculate data range**: ``data_range = max_value - min_value`` -3. **Calculate target width**: ``target_width = data_range / requested_bins`` -4. **Find optimal starting point**: ``exponent = CEIL(LOG10(target_width))`` -5. **Select optimal width**: ``optimal_width = 10^exponent`` -6. **Account for boundaries**: If ``max_value % optimal_width == 0``, add one extra bin -7. **Adjust if needed**: If ``actual_bins > requested_bins``, use ``10^(exponent + 1)`` - -**Mathematical Formula**: -- ``optimal_width = 10^CEIL(LOG10(data_range / requested_bins))`` -- **Boundary condition**: ``actual_bins = CEIL(data_range / optimal_width) + (max_value % optimal_width == 0 ? 1 : 0)`` - -**Example**: For age data with range 20-50 (range=30) and bins=3: -- ``target_width = 30 / 3 = 10`` -- ``exponent = CEIL(LOG10(10)) = CEIL(1.0) = 1`` -- ``optimal_width = 10^1 = 10`` -- ``actual_bins = CEIL(30/10) = 3`` ≤ 3 -- Result: Use width=10, creating bins "20-30", "30-40", "40-50" - -start and end Parameters -------------------------- -Define the range for binning using an effective range expansion algorithm. The key insight is that start/end parameters affect the **width calculation**, not just the binning boundaries. - -**Algorithm:** -1. **Calculate effective range**: Only expand, never shrink the data range - - ``effective_min = MIN(start, data_min)`` if start specified - - ``effective_max = MAX(end, data_max)`` if end specified - - ``effective_range = effective_max - effective_min`` - -2. **Apply magnitude-based width calculation** with boundary handling: - - If ``effective_range`` is exactly a power of 10: ``width = 10^(FLOOR(LOG10(effective_range)) - 1)`` - - Otherwise: ``width = 10^FLOOR(LOG10(effective_range))`` - -3. **Create bins** using the calculated width - -**Examples**: + * earliest: Align bins to the earliest timestamp in the data + * latest: Align bins to the latest timestamp in the data + * : Align bins to a specific epoch time value or time modifier expression +* start: optional. The starting value for binning range. **Default:** minimum field value. +* end: optional. The ending value for binning range. **Default:** maximum field value. -- **end=100000**: effective_range = 100,000 (exact power of 10) - - Width = 10^(5-1) = 10^4 = 10,000 - - Result: 5 bins "0-10000", "10000-20000", ..., "40000-50000" +**Parameter Behavior** -- **end=100001**: effective_range = 100,001 (not exact power of 10) - - Width = 10^FLOOR(LOG10(100,001)) = 10^5 = 100,000 - - Result: Single bin "0-100000" with count 1000 +When multiple parameters are specified, priority order is: span > minspan > bins > start/end > default. -Examples -======== - -Span Parameter Examples -======================= +**Special Behaviors:** +* Logarithmic span (``log10``, ``2log10``, etc.) creates logarithmic bin boundaries instead of linear +* Daily/monthly spans automatically align to calendar boundaries and return date strings (YYYY-MM-DD) instead of timestamps +* aligntime parameter only applies to time spans excluding days/months +* start/end parameters expand the range (never shrink) and affect bin width calculation Example 1: Basic numeric span -============================== +============================= PPL query:: @@ -266,7 +80,7 @@ PPL query:: Example 3: Logarithmic span (log10) -==================================== +=================================== PPL query:: @@ -280,7 +94,7 @@ PPL query:: +------------------+ Example 4: Logarithmic span with coefficient -============================================= +============================================ PPL query:: @@ -294,11 +108,8 @@ PPL query:: | 20000.0-200000.0 | +------------------+ -Bins Parameter Examples -======================= - Example 5: Basic bins parameter -================================ +=============================== PPL query:: @@ -313,7 +124,7 @@ PPL query:: +------------+ Example 6: Low bin count -========================= +======================== PPL query:: @@ -326,7 +137,7 @@ PPL query:: +-------+ Example 7: High bin count -========================== +========================= PPL query:: @@ -340,11 +151,8 @@ PPL query:: | 28-29 | 13 | +-------+----------------+ -Minspan Parameter Examples -========================== - Example 8: Basic minspan -========================= +======================== PPL query:: @@ -359,7 +167,7 @@ PPL query:: +-------+----------------+ Example 9: Large minspan -========================== +======================== PPL query:: @@ -371,11 +179,8 @@ PPL query:: | 0-1000 | +--------+ -Start/End Parameter Examples -============================ - Example 10: Start and end range -================================ +=============================== PPL query:: @@ -388,7 +193,7 @@ PPL query:: +-------+ Example 11: Large end range -============================ +=========================== PPL query:: @@ -401,7 +206,7 @@ PPL query:: +----------+ Example 12: Span with start/end -================================ +=============================== PPL query:: @@ -416,11 +221,8 @@ PPL query:: | 33-34 | +-------+ -Time-based Examples -=================== - Example 13: Hour span -====================== +===================== PPL query:: @@ -435,7 +237,7 @@ PPL query:: +---------------------+-------+ Example 14: Minute span -======================== +======================= PPL query:: @@ -450,7 +252,7 @@ PPL query:: +---------------------+-------+ Example 15: Second span -======================== +======================= PPL query:: @@ -465,7 +267,7 @@ PPL query:: +---------------------+-------+ Example 16: Daily span -======================= +====================== PPL query:: @@ -479,11 +281,8 @@ PPL query:: | 2025-07-24 00:00:00 | 9187 | +---------------------+-------+ -Aligntime Parameter Examples -============================ - Example 17: Aligntime with time modifier -========================================= +======================================== PPL query:: @@ -498,7 +297,7 @@ PPL query:: +---------------------+-------+ Example 18: Aligntime with epoch timestamp -=========================================== +========================================== PPL query:: @@ -512,11 +311,8 @@ PPL query:: | 2025-07-28 00:40:00 | 9187 | +---------------------+-------+ -Default Binning Example -======================= - Example 19: Default behavior (no parameters) -============================================== +============================================ PPL query:: diff --git a/docs/user/ppl/cmd/dedup.rst b/docs/user/ppl/cmd/dedup.rst index 264d3b3c9b8..bc3e9a48ca5 100644 --- a/docs/user/ppl/cmd/dedup.rst +++ b/docs/user/ppl/cmd/dedup.rst @@ -1,6 +1,6 @@ -============= +===== dedup -============= +===== .. rubric:: Table of contents @@ -10,25 +10,22 @@ dedup Description -============ -| Using ``dedup`` command to remove identical document defined by field from the search result. - +=========== +The ``dedup`` command removes duplicate documents defined by specified fields from the search result. Syntax -============ +====== dedup [int] [keepempty=] [consecutive=] - -* int: optional. The ``dedup`` command retains multiple events for each combination when you specify . The number for must be greater than 0. If you do not specify a number, only the first occurring event is kept. All other duplicates are removed from the results. **Default:** 1 -* keepempty: optional. if true, keep the document if the any field in the field-list has NULL value or field is MISSING. **Default:** false. +* int: optional. The ``dedup`` command retains multiple events for each combination when you specify . The number for must be greater than 0. All other duplicates are removed from the results. **Default:** 1 +* keepempty: optional. If set to true, keep the document if the any field in the field-list has NULL value or field is MISSING. **Default:** false. * consecutive: optional. If set to true, removes only events with duplicate combinations of values that are consecutive. **Default:** false. * field-list: mandatory. The comma-delimited field list. At least one field is required. - Example 1: Dedup by one field ============================= -The example show dedup the document with gender field. +This example shows deduplicating documents by gender field. PPL query:: @@ -44,7 +41,7 @@ PPL query:: Example 2: Keep 2 duplicates documents ====================================== -The example show dedup the document with gender field keep 2 duplication. +This example shows deduplicating documents by gender field while keeping 2 duplicates. PPL query:: @@ -59,9 +56,9 @@ PPL query:: +----------------+--------+ Example 3: Keep or Ignore the empty field by default -============================================ +==================================================== -The example show dedup the document by keep null value field. +This example shows deduplicating documents while keeping null values. PPL query:: @@ -77,7 +74,7 @@ PPL query:: +----------------+-----------------------+ -The example show dedup the document by ignore the empty value field. +This example shows deduplicating documents while ignoring null values. PPL query:: @@ -93,9 +90,9 @@ PPL query:: Example 4: Dedup in consecutive document -========================================= +======================================== -The example show dedup the consecutive document. +This example shows deduplicating consecutive documents. PPL query:: @@ -112,4 +109,3 @@ PPL query:: Limitations =========== The ``dedup`` with ``consecutive=true`` command can only work with ``plugins.calcite.enabled=false``. -It means ``dedup`` with ``consecutive=true`` command cannot work together with new PPL commands/functions introduced in 3.0.0 and above. diff --git a/docs/user/ppl/cmd/describe.rst b/docs/user/ppl/cmd/describe.rst index c732480e328..2fbb4003414 100644 --- a/docs/user/ppl/cmd/describe.rst +++ b/docs/user/ppl/cmd/describe.rst @@ -1,6 +1,6 @@ -============= +======== describe -============= +======== .. rubric:: Table of contents @@ -10,24 +10,21 @@ describe Description -============ -| Using ``describe`` command to query metadata of the index. ``describe`` command could be only used as the first command in the PPL query. - +=========== +Use the ``describe`` command to query metadata of the index. ``describe`` command can only be used as the first command in the PPL query. Syntax -============ -describe .. +====== +describe [dataSource.][schema.] * dataSource: optional. If dataSource is not provided, it resolves to opensearch dataSource. -* schema: optional. If schema is not provided, it resolves to default schema. +* schema: optional. If schema is not provided, it resolves to default schema. * tablename: mandatory. describe command must specify which tablename to query from. - - Example 1: Fetch all the metadata ================================= -The example describes accounts index. +This example describes the accounts index. PPL query:: @@ -52,7 +49,7 @@ PPL query:: Example 2: Fetch metadata with condition and filter =================================================== -The example retrieves columns with type long in accounts index. +This example retrieves columns with type bigint in the accounts index. PPL query:: diff --git a/docs/user/ppl/cmd/eval.rst b/docs/user/ppl/cmd/eval.rst index 187f4e3f7cc..bd0cd735af9 100644 --- a/docs/user/ppl/cmd/eval.rst +++ b/docs/user/ppl/cmd/eval.rst @@ -1,6 +1,6 @@ -============= +==== eval -============= +==== .. rubric:: Table of contents @@ -10,21 +10,21 @@ eval Description -============ -| The ``eval`` command evaluate the expression and append the result to the search result. +=========== +The ``eval`` command evaluates the expression and appends the result to the search result. Syntax -============ +====== eval = ["," = ]... -* field: mandatory. If the field name not exist, a new field is added. If the field name already exists, it will be overrided. -* expression: mandatory. Any expression support by the system. +* field: mandatory. If the field name does not exist, a new field is added. If the field name already exists, it will be overridden. +* expression: mandatory. Any expression supported by the system. -Example 1: Create the new field -=============================== +Example 1: Create a new field +============================= -The example show to create new field doubleAge for each document. The new doubleAge is the evaluation result of age multiply by 2. +This example shows creating a new field doubleAge for each document. The new doubleAge field is the result of multiplying age by 2. PPL query:: @@ -40,10 +40,10 @@ PPL query:: +-----+-----------+ -Example 2: Override the existing field -====================================== +Example 2: Override an existing field +===================================== -The example show to override the exist age field with age plus 1. +This example shows overriding the existing age field by adding 1 to it. PPL query:: @@ -58,10 +58,10 @@ PPL query:: | 34 | +-----+ -Example 3: Create the new field with field defined in eval -========================================================== +Example 3: Create a new field with field defined in eval +======================================================== -The example show to create a new field ddAge with field defined in eval command. The new field ddAge is the evaluation result of doubleAge multiply by 2, the doubleAge is defined in the eval command. +This example shows creating a new field ddAge using a field defined in the same eval command. The new field ddAge is the result of multiplying doubleAge by 2, where doubleAge is defined in the same eval command. PPL query:: @@ -76,12 +76,12 @@ PPL query:: | 33 | 66 | 132 | +-----+-----------+-------+ -Example 4: String concatenation with + operator(need to enable calcite) -=============================================== +Example 4: String concatenation +=============================== -The example shows how to use the + operator for string concatenation in eval command. You can concatenate string literals and field values. +This example shows using the + operator for string concatenation. You can concatenate string literals and field values. -PPL query example 1 - Concatenating a literal with a field:: +PPL query:: source=accounts | eval greeting = 'Hello ' + firstname | fields firstname, greeting @@ -96,7 +96,12 @@ Expected result:: | Dale | Hello Dale | +---------------+---------------------+ -PPL query example 2 - Multiple concatenations with type casting:: +Example 5: Multiple string concatenation with type casting +========================================================== + +This example shows multiple concatenations with type casting from numeric to string. + +PPL query:: source=accounts | eval full_info = 'Name: ' + firstname + ', Age: ' + CAST(age AS STRING) | fields firstname, age, full_info diff --git a/docs/user/ppl/cmd/eventstats.rst b/docs/user/ppl/cmd/eventstats.rst index 958b28e606b..d51ba852303 100644 --- a/docs/user/ppl/cmd/eventstats.rst +++ b/docs/user/ppl/cmd/eventstats.rst @@ -1,6 +1,6 @@ -============= +========== eventstats -============= +========== .. rubric:: Table of contents @@ -10,10 +10,8 @@ eventstats Description -============ -| (Experimental) -| (From 3.1.0) -| Using ``eventstats`` command to enriches your event data with calculated summary statistics. It operates by analyzing specified fields within your events, computing various statistical measures, and then appending these results as new fields to each original event. +=========== +| The ``eventstats`` command enriches your event data with calculated summary statistics. It operates by analyzing specified fields within your events, computing various statistical measures, and then appending these results as new fields to each original event. | Key aspects of `eventstats`: @@ -35,392 +33,43 @@ The ``stats`` and ``eventstats`` commands are both used for calculating statisti * ``eventstats``: Useful when you need to enrich events with statistical context for further analysis or filtering. Can be used mid-search to add statistics that can be used in subsequent commands. -Version -======= -3.1.0 - - Syntax ====== eventstats ... [by-clause] - -* function: mandatory. A aggregation function or window function. - -* by-clause: optional. - - * Syntax: by [span-expression,] [field,]... - * Description: The by clause could be the fields and expressions like scalar functions and aggregation functions. Besides, the span clause can be used to split specific field into buckets in the same interval, the stats then does the aggregation by these span buckets. - * Default: If no is specified, the stats command returns only one row, which is the aggregation over the entire result set. - -* span-expression: optional, at most one. - - * Syntax: span(field_expr, interval_expr) - * Description: The unit of the interval expression is the natural unit by default. If the field is a date and time type field, and the interval is in date/time units, you will need to specify the unit in the interval expression. For example, to split the field ``age`` into buckets by 10 years, it looks like ``span(age, 10)``. And here is another example of time span, the span to split a ``timestamp`` field into hourly intervals, it looks like ``span(timestamp, 1h)``. - -* Available time unit: -+----------------------------+ -| Span Interval Units | -+============================+ -| millisecond (ms) | -+----------------------------+ -| second (s) | -+----------------------------+ -| minute (m, case sensitive) | -+----------------------------+ -| hour (h) | -+----------------------------+ -| day (d) | -+----------------------------+ -| week (w) | -+----------------------------+ -| month (M, case sensitive) | -+----------------------------+ -| quarter (q) | -+----------------------------+ -| year (y) | -+----------------------------+ +* function: mandatory. An aggregation function or window function. +* by-clause: optional. Groups results by specified fields or expressions. Syntax: by [span-expression,] [field,]... **Default:** aggregation over the entire result set. +* span-expression: optional, at most one. Splits field into buckets by intervals. Syntax: span(field_expr, interval_expr). For example, ``span(age, 10)`` creates 10-year age buckets, ``span(timestamp, 1h)`` creates hourly buckets. + * Available time units: + * millisecond (ms) + * second (s) + * minute (m, case sensitive) + * hour (h) + * day (d) + * week (w) + * month (M, case sensitive) + * quarter (q) + * year (y) Aggregation Functions ===================== -COUNT ------ - -Description ->>>>>>>>>>> - -Usage: Returns a count of the number of expr in the rows retrieved by a SELECT statement. - -Example:: - - os> source=accounts | fields account_number, gender, age | eventstats count() | sort account_number; - fetched rows / total rows = 4/4 - +----------------+--------+-----+---------+ - | account_number | gender | age | count() | - |----------------+--------+-----+---------| - | 1 | M | 32 | 4 | - | 6 | M | 36 | 4 | - | 13 | F | 28 | 4 | - | 18 | M | 33 | 4 | - +----------------+--------+-----+---------+ - -SUM ---- - -Description ->>>>>>>>>>> - -Usage: SUM(expr). Returns the sum of expr. - -Example:: - - os> source=accounts | fields account_number, gender, age | eventstats sum(age) by gender | sort account_number; - fetched rows / total rows = 4/4 - +----------------+--------+-----+----------+ - | account_number | gender | age | sum(age) | - |----------------+--------+-----+----------| - | 1 | M | 32 | 101 | - | 6 | M | 36 | 101 | - | 13 | F | 28 | 28 | - | 18 | M | 33 | 101 | - +----------------+--------+-----+----------+ - -AVG ---- - -Description ->>>>>>>>>>> - -Usage: AVG(expr). Returns the average value of expr. - -Example:: - - os> source=accounts | fields account_number, gender, age | eventstats avg(age) by gender | sort account_number; - fetched rows / total rows = 4/4 - +----------------+--------+-----+--------------------+ - | account_number | gender | age | avg(age) | - |----------------+--------+-----+--------------------| - | 1 | M | 32 | 33.666666666666664 | - | 6 | M | 36 | 33.666666666666664 | - | 13 | F | 28 | 28.0 | - | 18 | M | 33 | 33.666666666666664 | - +----------------+--------+-----+--------------------+ - -MAX ---- - -Description ->>>>>>>>>>> - -Usage: MAX(expr). Returns the maximum value of expr. - -Example:: - - os> source=accounts | fields account_number, gender, age | eventstats max(age) | sort account_number; - fetched rows / total rows = 4/4 - +----------------+--------+-----+----------+ - | account_number | gender | age | max(age) | - |----------------+--------+-----+----------| - | 1 | M | 32 | 36 | - | 6 | M | 36 | 36 | - | 13 | F | 28 | 36 | - | 18 | M | 33 | 36 | - +----------------+--------+-----+----------+ - -MIN ---- - -Description ->>>>>>>>>>> - -Usage: MIN(expr). Returns the minimum value of expr. - -Example:: - - os> source=accounts | fields account_number, gender, age | eventstats min(age) by gender | sort account_number; - fetched rows / total rows = 4/4 - +----------------+--------+-----+----------+ - | account_number | gender | age | min(age) | - |----------------+--------+-----+----------| - | 1 | M | 32 | 32 | - | 6 | M | 36 | 32 | - | 13 | F | 28 | 28 | - | 18 | M | 33 | 32 | - +----------------+--------+-----+----------+ - - -VAR_SAMP --------- - -Description ->>>>>>>>>>> - -Usage: VAR_SAMP(expr). Returns the sample variance of expr. - -Example:: - - os> source=accounts | fields account_number, gender, age | eventstats var_samp(age) | sort account_number; - fetched rows / total rows = 4/4 - +----------------+--------+-----+--------------------+ - | account_number | gender | age | var_samp(age) | - |----------------+--------+-----+--------------------| - | 1 | M | 32 | 10.916666666666666 | - | 6 | M | 36 | 10.916666666666666 | - | 13 | F | 28 | 10.916666666666666 | - | 18 | M | 33 | 10.916666666666666 | - +----------------+--------+-----+--------------------+ - - -VAR_POP -------- - -Description ->>>>>>>>>>> - -Usage: VAR_POP(expr). Returns the population standard variance of expr. - -Example:: - - os> source=accounts | fields account_number, gender, age | eventstats var_pop(age) | sort account_number; - fetched rows / total rows = 4/4 - +----------------+--------+-----+--------------+ - | account_number | gender | age | var_pop(age) | - |----------------+--------+-----+--------------| - | 1 | M | 32 | 8.1875 | - | 6 | M | 36 | 8.1875 | - | 13 | F | 28 | 8.1875 | - | 18 | M | 33 | 8.1875 | - +----------------+--------+-----+--------------+ - -STDDEV_SAMP ------------ - -Description ->>>>>>>>>>> - -Usage: STDDEV_SAMP(expr). Return the sample standard deviation of expr. - -Example:: - - os> source=accounts | fields account_number, gender, age | eventstats stddev_samp(age) | sort account_number; - fetched rows / total rows = 4/4 - +----------------+--------+-----+-------------------+ - | account_number | gender | age | stddev_samp(age) | - |----------------+--------+-----+-------------------| - | 1 | M | 32 | 3.304037933599835 | - | 6 | M | 36 | 3.304037933599835 | - | 13 | F | 28 | 3.304037933599835 | - | 18 | M | 33 | 3.304037933599835 | - +----------------+--------+-----+-------------------+ - - -STDDEV_POP ----------- - -Description ->>>>>>>>>>> - -Usage: STDDEV_POP(expr). Return the population standard deviation of expr. - -Example:: - - os> source=accounts | fields account_number, gender, age | eventstats stddev_pop(age) | sort account_number; - fetched rows / total rows = 4/4 - +----------------+--------+-----+--------------------+ - | account_number | gender | age | stddev_pop(age) | - |----------------+--------+-----+--------------------| - | 1 | M | 32 | 2.8613807855648994 | - | 6 | M | 36 | 2.8613807855648994 | - | 13 | F | 28 | 2.8613807855648994 | - | 18 | M | 33 | 2.8613807855648994 | - +----------------+--------+-----+--------------------+ - - -DISTINCT_COUNT, DC(Since 3.3) ------------------- - -Description ->>>>>>>>>>> - -Usage: DISTINCT_COUNT(expr), DC(expr). Returns the approximate number of distinct values using the HyperLogLog++ algorithm. Both functions are equivalent. - -For details on algorithm accuracy and precision control, see the `OpenSearch Cardinality Aggregation documentation `_. - - -Example:: - - os> source=accounts | fields account_number, gender, state, age | eventstats dc(state) as distinct_states, distinct_count(state) as dc_states_alt by gender | sort account_number; - fetched rows / total rows = 4/4 - +----------------+--------+-------+-----+-----------------+---------------+ - | account_number | gender | state | age | distinct_states | dc_states_alt | - |----------------+--------+-------+-----+-----------------+---------------| - | 1 | M | IL | 32 | 3 | 3 | - | 6 | M | TN | 36 | 3 | 3 | - | 13 | F | VA | 28 | 1 | 1 | - | 18 | M | MD | 33 | 3 | 3 | - +----------------+--------+-------+-----+-----------------+---------------+ - -EARLIEST (Since 3.3) ---------------------- - -Description ->>>>>>>>>>> - -Usage: EARLIEST(field [, time_field]). Return the earliest value of a field based on timestamp ordering. This function enriches each event with the earliest value found within the specified grouping. - -* field: mandatory. The field to return the earliest value for. -* time_field: optional. The field to use for time-based ordering. Defaults to @timestamp if not specified. - -Note: This function requires Calcite to be enabled (see `Configuration`_ section above). - -Example:: - - os> source=events | fields @timestamp, host, message | eventstats earliest(message) by host | sort @timestamp; - fetched rows / total rows = 8/8 - +---------------------+---------+----------------------+-------------------+ - | @timestamp | host | message | earliest(message) | - |---------------------+---------+----------------------+-------------------| - | 2023-01-01 10:00:00 | server1 | Starting up | Starting up | - | 2023-01-01 10:05:00 | server2 | Initializing | Initializing | - | 2023-01-01 10:10:00 | server1 | Ready to serve | Starting up | - | 2023-01-01 10:15:00 | server2 | Ready | Initializing | - | 2023-01-01 10:20:00 | server1 | Processing requests | Starting up | - | 2023-01-01 10:25:00 | server2 | Handling connections | Initializing | - | 2023-01-01 10:30:00 | server1 | Shutting down | Starting up | - | 2023-01-01 10:35:00 | server2 | Maintenance mode | Initializing | - +---------------------+---------+----------------------+-------------------+ - -Example with custom time field:: - - os> source=events | fields event_time, status, category | eventstats earliest(status, event_time) by category | sort event_time; - fetched rows / total rows = 8/8 - +---------------------+------------+----------+------------------------------+ - | event_time | status | category | earliest(status, event_time) | - |---------------------+------------+----------+------------------------------| - | 2023-01-01 09:55:00 | pending | orders | pending | - | 2023-01-01 10:00:00 | active | users | active | - | 2023-01-01 10:05:00 | processing | orders | pending | - | 2023-01-01 10:10:00 | inactive | users | active | - | 2023-01-01 10:15:00 | completed | orders | pending | - | 2023-01-01 10:20:00 | pending | users | active | - | 2023-01-01 10:25:00 | cancelled | orders | pending | - | 2023-01-01 10:30:00 | inactive | users | active | - +---------------------+------------+----------+------------------------------+ - - -LATEST (Since 3.3) -------------------- - -Description ->>>>>>>>>>> - -Usage: LATEST(field [, time_field]). Return the latest value of a field based on timestamp ordering. This function enriches each event with the latest value found within the specified grouping. - -* field: mandatory. The field to return the latest value for. -* time_field: optional. The field to use for time-based ordering. Defaults to @timestamp if not specified. - -Note: This function requires Calcite to be enabled (see `Configuration`_ section above). - -Example:: - - os> source=events | fields @timestamp, host, message | eventstats latest(message) by host | sort @timestamp; - fetched rows / total rows = 8/8 - +---------------------+---------+----------------------+------------------+ - | @timestamp | host | message | latest(message) | - |---------------------+---------+----------------------+------------------| - | 2023-01-01 10:00:00 | server1 | Starting up | Shutting down | - | 2023-01-01 10:05:00 | server2 | Initializing | Maintenance mode | - | 2023-01-01 10:10:00 | server1 | Ready to serve | Shutting down | - | 2023-01-01 10:15:00 | server2 | Ready | Maintenance mode | - | 2023-01-01 10:20:00 | server1 | Processing requests | Shutting down | - | 2023-01-01 10:25:00 | server2 | Handling connections | Maintenance mode | - | 2023-01-01 10:30:00 | server1 | Shutting down | Shutting down | - | 2023-01-01 10:35:00 | server2 | Maintenance mode | Maintenance mode | - +---------------------+---------+----------------------+------------------+ - -Example with custom time field:: - - os> source=events | fields event_time, status message, category | eventstats latest(status, event_time) by category | sort event_time; - fetched rows / total rows = 8/8 - +---------------------+------------+----------------------+----------+----------------------------+ - | event_time | status | message | category | latest(status, event_time) | - |---------------------+------------+----------------------+----------+----------------------------| - | 2023-01-01 09:55:00 | pending | Starting up | orders | cancelled | - | 2023-01-01 10:00:00 | active | Initializing | users | inactive | - | 2023-01-01 10:05:00 | processing | Ready to serve | orders | cancelled | - | 2023-01-01 10:10:00 | inactive | Ready | users | inactive | - | 2023-01-01 10:15:00 | completed | Processing requests | orders | cancelled | - | 2023-01-01 10:20:00 | pending | Handling connections | users | inactive | - | 2023-01-01 10:25:00 | cancelled | Shutting down | orders | cancelled | - | 2023-01-01 10:30:00 | inactive | Maintenance mode | users | inactive | - +---------------------+------------+----------------------+----------+----------------------------+ - - -Configuration -============= -This command requires Calcite enabled. - -Enable Calcite:: - >> curl -H 'Content-Type: application/json' -X PUT localhost:9200/_plugins/_query/settings -d '{ - "transient" : { - "plugins.calcite.enabled" : true - } - }' +The eventstats command supports the following aggregation functions: -Result set:: +* COUNT: Count of values +* SUM: Sum of numeric values +* AVG: Average of numeric values +* MAX: Maximum value +* MIN: Minimum value +* VAR_SAMP: Sample variance +* VAR_POP: Population variance +* STDDEV_SAMP: Sample standard deviation +* STDDEV_POP: Population standard deviation +* DISTINCT_COUNT/DC: Distinct count of values +* EARLIEST: Earliest value by timestamp +* LATEST: Latest value by timestamp - { - "acknowledged": true, - "persistent": { - "plugins": { - "calcite": { - "enabled": "true" - } - } - }, - "transient": {} - } +For detailed documentation of each function, see `Aggregation Functions <../functions/aggregation.rst>`_. Usage ===== @@ -436,9 +85,9 @@ Eventstats:: Example 1: Calculate the average, sum and count of a field by group -================================================================== +=================================================================== -The example show calculate the average age, sum age and count of events of all the accounts group by gender. +This example shows calculating the average age, sum of age, and count of events for all accounts grouped by gender. PPL query:: @@ -456,7 +105,7 @@ PPL query:: Example 2: Calculate the count by a gender and span =================================================== -The example gets the count of age by the interval of 10 years and group by gender. +This example shows counting events by age intervals of 5 years, grouped by gender. PPL query:: diff --git a/docs/user/ppl/cmd/expand.rst b/docs/user/ppl/cmd/expand.rst index 77061385478..c8065a2da0f 100644 --- a/docs/user/ppl/cmd/expand.rst +++ b/docs/user/ppl/cmd/expand.rst @@ -1,6 +1,6 @@ -============= +====== expand -============= +====== .. rubric:: Table of contents @@ -10,38 +10,27 @@ expand Description -============ -| (Experimental) - -Use the ``expand`` command on a nested array field to transform a single -document into multiple documents—each containing one element from the array. -All other fields in the original document are duplicated across the resulting -documents. +=========== +| The ``expand`` command transforms a single document with a nested array field into multiple documents—each containing one element from the array. All other fields in the original document are duplicated across the resulting documents. -The expand command generates one row per element in the specified array field: +| Key aspects of ``expand``: +* It generates one row per element in the specified array field. * The specified array field is converted into individual rows. -* If an alias is provided, the expanded values appear under the alias instead - of the original field name. -* If the specified field is an empty array, the row is retained with the - expanded field set to null. - -Version -======= -Since 3.1.0 +* If an alias is provided, the expanded values appear under the alias instead of the original field name. +* If the specified field is an empty array, the row is retained with the expanded field set to null. Syntax ====== expand [as alias] -* field: The field to be expanded (exploded). Currently only nested arrays are - supported. -* alias: (Optional) The name to use instead of the original field name. +* field: mandatory. The field to be expanded (exploded). Currently only nested arrays are supported. +* alias: optional. The name to use instead of the original field name. -Example: expand address field with an alias -=========================================== +Example 1: Expand address field with an alias +============================================= Given a dataset ``migration`` with the following data: @@ -65,19 +54,8 @@ PPL query:: +-------+-----+-------------------------------------------------------------------------------------------+ Limitations -============ +=========== * The ``expand`` command currently only supports nested arrays. Primitive fields storing arrays are not supported. E.g. a string field storing an array of strings cannot be expanded with the current implementation. -* The command works only with Calcite enabled. This can be set with the - following command: - - .. code-block:: - - PUT /_cluster/settings - { - "persistent":{ - "plugins.calcite.enabled": true - } - } diff --git a/docs/user/ppl/cmd/explain.rst b/docs/user/ppl/cmd/explain.rst index c06025022e6..dbb810dd814 100644 --- a/docs/user/ppl/cmd/explain.rst +++ b/docs/user/ppl/cmd/explain.rst @@ -1,6 +1,6 @@ -============= +======= explain -============= +======= .. rubric:: Table of contents @@ -10,25 +10,24 @@ explain Description -============ -| Using ``explain`` command to explain the plan of query which is used very often for query translation and troubleshooting. ``explain`` command could be only used as the first command in the PPL query. - +=========== +The ``explain`` command explains the plan of query which is often used for query translation and troubleshooting. The ``explain`` command can only be used as the first command in the PPL query. Syntax -============ +====== explain queryStatement -* mode: optional. There are 4 explain modes: "simple", "standard", "cost", "extended". If mode is not provided, "standard" will be set by default. - * standard: The default mode. Display logical and physical plan with pushdown information (DSL). - * simple: Display the logical plan tree without attributes. Only works with Calcite. - * cost: Display the standard information plus plan cost attributes. Only works with Calcite. - * extended: Display the standard information plus generated code. Only works with Calcite. +* mode: optional. There are 4 explain modes: "simple", "standard", "cost", "extended". **Default:** standard. + * standard: The default mode. Display logical and physical plan with pushdown information (DSL). + * simple: Display the logical plan tree without attributes. Only works with Calcite. + * cost: Display the standard information plus plan cost attributes. Only works with Calcite. + * extended: Display the standard information plus generated code. Only works with Calcite. * queryStatement: mandatory. A PPL query to explain. Example 1: Explain a PPL query in v2 engine -============================== +=========================================== When Calcite is disabled (plugins.calcite.enabled=false), explaining a PPL query will get its physical plan of v2 engine and pushdown information. PPL query:: @@ -56,7 +55,7 @@ Explain:: } Example 2: Explain a PPL query in v3 engine -=================================================== +=========================================== When Calcite is enabled (plugins.calcite.enabled=true), explaining a PPL query will get its logical and physical plan of v3 engine and pushdown information. @@ -81,9 +80,9 @@ Explain:: Example 3: Explain a PPL query with simple mode -========================================================= +=============================================== -When Calcite is enabled (plugins.calcite.enabled=true), you can explain a PPL query will the "simple" mode. +When Calcite is enabled (plugins.calcite.enabled=true), you can explain a PPL query with the "simple" mode. PPL query:: @@ -102,9 +101,9 @@ Explain:: } Example 4: Explain a PPL query with cost mode -========================================================= +============================================= -When Calcite is enabled (plugins.calcite.enabled=true), you can explain a PPL query will the "cost" mode. +When Calcite is enabled (plugins.calcite.enabled=true), you can explain a PPL query with the "cost" mode. PPL query:: @@ -126,9 +125,7 @@ Explain:: } Example 5: Explain a PPL query with extended mode -========================================================= - -When Calcite is enabled (plugins.calcite.enabled=true), you can explain a PPL query will the "extended" mode. +================================================= PPL query:: diff --git a/docs/user/ppl/cmd/fields.rst b/docs/user/ppl/cmd/fields.rst index 18aeaff4d06..81ccff71b80 100644 --- a/docs/user/ppl/cmd/fields.rst +++ b/docs/user/ppl/cmd/fields.rst @@ -1,6 +1,6 @@ -============= +====== fields -============= +====== .. rubric:: Table of contents @@ -10,26 +10,20 @@ fields Description -============ -Using ``field`` command to keep or remove fields from the search result. - -Enhanced field features are available when the Calcite engine is enabled with 3.3+ version. When Calcite is disabled, only basic comma-delimited field selection is supported. +=========== +The ``fields`` command keeps or removes fields from the search result. Syntax -============ -field [+|-] - -* index: optional. if the plus (+) is used, only the fields specified in the field list will be keep. if the minus (-) is used, all the fields specified in the field list will be removed. **Default** + -* field list: mandatory. comma-delimited keep or remove fields. - +====== +fields [+|-] -Basic Examples -============== +* +|-: optional. If the plus (+) is used, only the fields specified in the field list will be kept. If the minus (-) is used, all the fields specified in the field list will be removed. **Default:** +. +* field-list: mandatory. Comma-delimited or space-delimited list of fields to keep or remove. Supports wildcard patterns. Example 1: Select specified fields from result ----------------------------------------------- +============================================== -The example show fetch account_number, firstname and lastname fields from search results. +This example shows selecting account_number, firstname and lastname fields from search results. PPL query:: @@ -45,9 +39,9 @@ PPL query:: +----------------+-----------+----------+ Example 2: Remove specified fields from result ----------------------------------------------- +============================================== -The example show fetch remove account_number field from search results. +This example shows removing the account_number field from search results. PPL query:: @@ -62,13 +56,8 @@ PPL query:: | Dale | Adams | +-----------+----------+ -Enhanced Features (Version 3.3.0) -=========================================== - -All features in this section require the Calcite engine to be enabled. When Calcite is disabled, only basic comma-delimited field selection is supported. - Example 3: Space-delimited field selection -------------------------------------------- +========================================== Fields can be specified using spaces instead of commas, providing a more concise syntax. @@ -88,7 +77,7 @@ PPL query:: +-----------+----------+-----+ Example 4: Prefix wildcard pattern ------------------------------------ +================================== Select fields starting with a pattern using prefix wildcards. @@ -106,7 +95,7 @@ PPL query:: +----------------+ Example 5: Suffix wildcard pattern ------------------------------------ +================================== Select fields ending with a pattern using suffix wildcards. @@ -124,7 +113,7 @@ PPL query:: +-----------+----------+ Example 6: Contains wildcard pattern ------------------------------------- +==================================== Select fields containing a pattern using contains wildcards. @@ -139,7 +128,7 @@ PPL query:: +----------------+-----------+-----------------+---------+-------+-----+----------------------+----------+ Example 7: Mixed delimiter syntax ----------------------------------- +================================= Combine spaces and commas for flexible field specification. @@ -157,7 +146,7 @@ PPL query:: +-----------+----------------+----------+ Example 8: Field deduplication -------------------------------- +============================== Automatically prevents duplicate columns when wildcards expand to already specified fields. @@ -177,7 +166,7 @@ PPL query:: Note: Even though ``firstname`` is explicitly specified and would also match ``*name``, it appears only once due to automatic deduplication. Example 9: Full wildcard selection ------------------------------------ +================================== Select all available fields using ``*`` or ```*```. This selects all fields defined in the index schema, including fields that may contain null values. @@ -194,7 +183,7 @@ PPL query:: Note: The ``*`` wildcard selects fields based on the index schema, not on data content. Fields with null values are included in the result set. Use backticks ```*``` if the plain ``*`` doesn't return all expected fields. Example 10: Wildcard exclusion -------------------------------- +============================== Remove fields using wildcard patterns with the minus (-) operator. @@ -211,11 +200,6 @@ PPL query:: | 18 | 467 Hutchinson Court | 4180 | M | Orick | null | MD | 33 | daleadams@boink.com | +----------------+----------------------+---------+--------+--------+----------+-------+-----+-----------------------+ -Requirements -============ -- **Calcite Engine**: All enhanced features require the Calcite engine to be enabled -- **Backward Compatibility**: Basic comma-delimited syntax continues to work when Calcite is disabled -- **Error Handling**: Attempting to use enhanced features without Calcite will result in an ``UnsupportedOperationException`` See Also ======== diff --git a/docs/user/ppl/cmd/fillnull.rst b/docs/user/ppl/cmd/fillnull.rst index 483755f723f..4bf1c882b5c 100644 --- a/docs/user/ppl/cmd/fillnull.rst +++ b/docs/user/ppl/cmd/fillnull.rst @@ -1,6 +1,6 @@ -============= +======== fillnull -============= +======== .. rubric:: Table of contents @@ -10,39 +10,30 @@ fillnull Description -============ -Using ``fillnull`` command to fill null with provided value in one or more fields in the search result. +=========== +| The ``fillnull`` command fills null values with the provided value in one or more fields in the search result. Syntax -============ +====== -fillnull with [in ] +| fillnull with [in ] +| fillnull using = [, = ] +| fillnull value= [] -fillnull using = [, = ] +* replacement: mandatory. The value used to replace null values. +* field-list: optional. List of fields to apply the replacement to. Can be comma-delimited (with ``with`` or ``using`` syntax) or space-delimited (with ``value=`` syntax). **Default:** all fields. +* field: mandatory when using ``using`` syntax. Individual field name to assign a specific replacement value. -fillnull value= [] - - -Parameters -============ - -* replacement: Mandatory. The value used to replace `null`s. - -* field-list: Optional. Comma-delimited (when using ``with`` or ``using``) or space-delimited (when using ``value=``) list of fields. The `null` values in the field will be replaced with the values from the replacement. **Default:** If no field specified, the replacement is applied to all fields. - -**Syntax Variations:** - -* ``with in `` - Apply same value to specified fields -* ``using =, ...`` - Apply different values to different fields -* ``value= []`` - Alternative syntax with optional space-delimited field list - - -Examples -============ +* **Syntax variations:** + * ``with in `` - Apply same value to specified fields + * ``using =, ...`` - Apply different values to different fields + * ``value= []`` - Alternative syntax with optional space-delimited field list Example 1: Replace null values with a specified value on one field -------------------------------------------------------------------- +================================================================== + +This example shows replacing null values in the email field with ''. PPL query:: @@ -58,7 +49,9 @@ PPL query:: +-----------------------+----------+ Example 2: Replace null values with a specified value on multiple fields -------------------------------------------------------------------------- +======================================================================== + +This example shows replacing null values in both email and employer fields with the same replacement value ''. PPL query:: @@ -74,7 +67,9 @@ PPL query:: +-----------------------+-------------+ Example 3: Replace null values with a specified value on all fields --------------------------------------------------------------------- +=================================================================== + +This example shows replacing null values in all fields when no field list is specified. PPL query:: @@ -90,7 +85,9 @@ PPL query:: +-----------------------+-------------+ Example 4: Replace null values with multiple specified values on multiple fields ---------------------------------------------------------------------------------- +================================================================================ + +This example shows using different replacement values for different fields using the 'using' syntax. PPL query:: @@ -107,7 +104,9 @@ PPL query:: Example 5: Replace null with specified value on specific fields (value= syntax) --------------------------------------------------------------------------------- +=============================================================================== + +This example shows using the alternative 'value=' syntax to replace null values in specific fields. PPL query:: @@ -123,7 +122,7 @@ PPL query:: +-----------------------+-------------+ Example 6: Replace null with specified value on all fields (value= syntax) ---------------------------------------------------------------------------- +========================================================================== When no field list is specified, the replacement applies to all fields in the result. @@ -141,7 +140,7 @@ PPL query:: +-----------------------+-------------+ Limitations -============ +=========== * The ``fillnull`` command is not rewritten to OpenSearch DSL, it is only executed on the coordination node. * When applying the same value to all fields without specifying field names, all fields must be the same type. For mixed types, use separate fillnull commands or explicitly specify fields. * The replacement value type must match ALL field types in the field list. When applying the same value to multiple fields, all fields must be the same type (all strings or all numeric). diff --git a/docs/user/ppl/cmd/flatten.rst b/docs/user/ppl/cmd/flatten.rst index 3c1780531f1..e366fe32daa 100644 --- a/docs/user/ppl/cmd/flatten.rst +++ b/docs/user/ppl/cmd/flatten.rst @@ -1,6 +1,6 @@ -============= +======= flatten -============= +======= .. rubric:: Table of contents @@ -10,42 +10,25 @@ flatten Description =========== +| The ``flatten`` command flattens a struct or an object field into separate fields in a document. -Use ``flatten`` command to flatten a struct or an object field into separate -fields in a document. - -The flattened fields will be ordered **lexicographically** by their original -key names in the struct. I.e. if the struct has keys ``b``, ``c`` and ``Z``, -the flattened fields will be ordered as ``Z``, ``b``, ``c``. +| The flattened fields will be ordered **lexicographically** by their original key names in the struct. For example, if the struct has keys ``b``, ``c`` and ``Z``, the flattened fields will be ordered as ``Z``, ``b``, ``c``. -Note that ``flatten`` should not be applied to arrays. Please use ``expand`` -command to expand an array field into multiple rows instead. However, since -an array can be stored in a non-array field in OpenSearch, when expanding a -field storing a nested array, only the first element of the array will be -flattened. - -Version -======= -3.1.0 +| Note that ``flatten`` should not be applied to arrays. Use the ``expand`` command to expand an array field into multiple rows instead. However, since an array can be stored in a non-array field in OpenSearch, when flattening a field storing a nested array, only the first element of the array will be flattened. Syntax ====== flatten [as ()] -* field: The field to be flattened. Only object and nested fields are - supported. -* alias-list: (Optional) The names to use instead of the original key names. - Names are separated by commas. It is advised to put the alias-list in - parentheses if there is more than one alias. E.g. both - ``country, state, city`` and ``(country, state, city)`` are supported, - but the latter is advised. Its length must match the number of keys in the - struct field. Please note that the provided alias names **must** follow - the lexicographical order of the corresponding original keys in the struct. +* field: mandatory. The field to be flattened. Only object and nested fields are supported. +* alias-list: optional. The names to use instead of the original key names. Names are separated by commas. It is advised to put the alias-list in parentheses if there is more than one alias. The length must match the number of keys in the struct field. The provided alias names **must** follow the lexicographical order of the corresponding original keys in the struct. Example: flatten an object field with aliases ============================================= +This example shows flattening a message object field and using aliases to rename the flattened fields. + Given the following index ``my-index`` .. code-block:: @@ -116,15 +99,3 @@ Limitations invisible. As an alternative, you can change to ``source=my-index | flatten message``. - -* The command works only with Calcite enabled. This can be set with the - following command: - - .. code-block:: - - PUT /_cluster/settings - { - "persistent":{ - "plugins.calcite.enabled": true - } - } diff --git a/docs/user/ppl/cmd/grok.rst b/docs/user/ppl/cmd/grok.rst index 35f3b0c8461..836d01b6a89 100644 --- a/docs/user/ppl/cmd/grok.rst +++ b/docs/user/ppl/cmd/grok.rst @@ -1,6 +1,6 @@ -============= +==== grok -============= +==== .. rubric:: Table of contents @@ -10,26 +10,20 @@ grok Description -============ -| The ``grok`` command parses a text field with a grok pattern and appends the results to the search result. - +=========== +The ``grok`` command parses a text field with a grok pattern and appends the results to the search result. Syntax -============ +====== grok * field: mandatory. The field must be a text field. -* pattern: mandatory string. The grok pattern used to extract new fields from the given text field. If a new field name already exists, it will replace the original field. - -Grok Pattern -============ - -The grok pattern is used to match the text field of each document to extract new fields. +* pattern: mandatory. The grok pattern used to extract new fields from the given text field. If a new field name already exists, it will replace the original field. Example 1: Create the new field =============================== -The example shows how to create new field ``host`` for each document. ``host`` will be the host name after ``@`` in ``email`` field. Parsing a null field will return an empty string. +This example shows how to create new field ``host`` for each document. ``host`` will be the host name after ``@`` in ``email`` field. Parsing a null field will return an empty string. PPL query:: @@ -48,7 +42,7 @@ PPL query:: Example 2: Override the existing field ====================================== -The example shows how to override the existing ``address`` field with street number removed. +This example shows how to override the existing ``address`` field with street number removed. PPL query:: @@ -66,7 +60,7 @@ PPL query:: Example 3: Using grok to parse logs =================================== -The example shows how to use grok to parse raw logs. +This example shows how to use grok to parse raw logs. PPL query:: diff --git a/docs/user/ppl/cmd/head.rst b/docs/user/ppl/cmd/head.rst index c13a495a77f..a17f283026d 100644 --- a/docs/user/ppl/cmd/head.rst +++ b/docs/user/ppl/cmd/head.rst @@ -1,6 +1,6 @@ -============= +==== head -============= +==== .. rubric:: Table of contents @@ -10,21 +10,20 @@ head Description -============ -| The ``head`` command returns the first N number of specified results after an optional offset in search order. - +=========== +The ``head`` command returns the first N number of specified results after an optional offset in search order. Syntax -============ +====== head [] [from ] -* : optional integer. number of results to return. **Default:** 10 -* : integer after optional ``from``. number of results to skip. **Default:** 0 +* size: optional integer. Number of results to return. **Default:** 10 +* offset: optional integer after ``from``. Number of results to skip. **Default:** 0 Example 1: Get first 10 results -=========================================== +=============================== -The example show maximum 10 results from accounts index. +This example shows getting a maximum of 10 results from accounts index. PPL query:: @@ -40,9 +39,9 @@ PPL query:: +-----------+-----+ Example 2: Get first N results -=========================================== +============================== -The example show first N results from accounts index. +This example shows getting the first 3 results from accounts index. PPL query:: @@ -59,7 +58,7 @@ PPL query:: Example 3: Get first N results after offset M ============================================= -The example show first N results after offset M from accounts index. +This example shows getting the first 3 results after offset 1 from accounts index. PPL query:: diff --git a/docs/user/ppl/cmd/join.rst b/docs/user/ppl/cmd/join.rst index 3b986071261..61dfc31042d 100644 --- a/docs/user/ppl/cmd/join.rst +++ b/docs/user/ppl/cmd/join.rst @@ -1,6 +1,6 @@ -============= +==== join -============= +==== .. rubric:: Table of contents @@ -11,63 +11,39 @@ join Description =========== -| Using ``join`` command to combines two datasets together. The left side could be an index or results from a piped commands, the right side could be either an index or a subsearch. +| The ``join`` command combines two datasets together. The left side could be an index or results from a piped commands, the right side could be either an index or a subsearch. -Version -======= -3.0.0 +Syntax +====== -Basic syntax in 3.0.0 -===================== -| [joinType] join [leftAlias] [rightAlias] (on | where) +Basic syntax: +------------- -* joinType: optional. The type of join to perform. The default is ``inner`` if not specified. Other option is ``left``, ``semi``, ``anti`` and performance sensitive types ``right``, ``full`` and ``cross``. -* leftAlias: optional. The subsearch alias to use with the left join side, to avoid ambiguous naming. Fixed pattern: ``left = `` -* rightAlias: optional. The subsearch alias to use with the right join side, to avoid ambiguous naming. Fixed pattern: ``right = `` -* joinCriteria: mandatory. It could be any comparison expression. Must follow with ``on`` (since 3.0.0) or ``where`` (since 3.3.0) keyword. +[joinType] join [leftAlias] [rightAlias] (on | where) + +* joinType: optional. The type of join to perform. Options: ``left``, ``semi``, ``anti``, and performance sensitive types ``right``, ``full``, ``cross``. **Default:** ``inner``. +* leftAlias: optional. The subsearch alias to use with the left join side, to avoid ambiguous naming. Pattern: ``left = `` +* rightAlias: optional. The subsearch alias to use with the right join side, to avoid ambiguous naming. Pattern: ``right = `` +* joinCriteria: mandatory. Any comparison expression. Must follow ``on`` or ``where`` keyword. * right-dataset: mandatory. Right dataset could be either an ``index`` or a ``subsearch`` with/without alias. -Extended syntax since 3.3.0 -=========================== -| join [type=] [overwrite=] [max=n] ( | [leftAlias] [rightAlias] (on | where) ) -| From 3.3.0, the join syntax is enhanced to support more join options and join with field list. +Extended syntax: +---------------- -* type=: optional. The type of join to perform. The default is ``inner`` if not specified. Other option is ``left``, ``outer``(alias of ``left``), ``semi``, ``anti`` and performance sensitive types ``right``, ``full`` and ``cross``. -* overwrite=: optional. Only works with ``join-field-list``. Specifies whether duplicate-named fields from (subsearch results) should replace corresponding fields in the main search results. The default value is ``true``. -* max=n: optional. Controls how many subsearch results could be joined against to each row in main search. The default value is 0, means unlimited. -* join-field-list: optional. The fields used to build the join criteria. The join field list must exist on both sides. If no join field list is specified, all fields common to both sides will be used as join keys. The comma is optional. +join [type=] [overwrite=] [max=n] ( | [leftAlias] [rightAlias] (on | where) ) + +* type: optional. Join type using extended syntax. Options: ``left``, ``outer`` (alias of ``left``), ``semi``, ``anti``, and performance sensitive types ``right``, ``full``, ``cross``. **Default:** ``inner``. +* overwrite: optional boolean. Only works with ``join-field-list``. Specifies whether duplicate-named fields from right-dataset should replace corresponding fields in the main search results. **Default:** ``true``. +* max: optional integer. Controls how many subsearch results could be joined against each row in main search. **Default:** 0 (unlimited). +* join-field-list: optional. The fields used to build the join criteria. The join field list must exist on both sides. If not specified, all fields common to both sides will be used as join keys. +* leftAlias: optional. Same as basic syntax when used with extended syntax. +* rightAlias: optional. Same as basic syntax when used with extended syntax. +* joinCriteria: mandatory. Same as basic syntax when used with extended syntax. +* right-dataset: mandatory. Same as basic syntax. Configuration ============= -plugins.calcite.enabled ------------------------ - -This command requires Calcite enabled. In 3.0.0, as an experimental the Calcite configuration is disabled by default. - -Enable Calcite:: - - >> curl -H 'Content-Type: application/json' -X PUT localhost:9200/_plugins/_query/settings -d '{ - "transient" : { - "plugins.calcite.enabled" : true - } - }' - -Result set:: - - { - "acknowledged": true, - "persistent": { - "plugins": { - "calcite": { - "enabled": "true" - } - } - }, - "transient": {} - } - - plugins.ppl.join.subsearch_maxout --------------------------------- @@ -96,7 +72,7 @@ Change the join.subsearch_maxout to 5000:: Usage ===== -Join on criteria (in 3.0.0):: +Basic join syntax:: source = table1 | inner join left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c source = table1 | inner join left = l right = r where l.a = r.a table2 | fields l.a, r.a, b, c @@ -113,7 +89,7 @@ Join on criteria (in 3.0.0):: source = table1 as t1 | join left = l right = r on l.a = r.a table2 as t2 | fields t1.a, t2.a source = table1 | join left = l right = r on l.a = r.a [ source = table2 ] as s | fields l.a, s.a -Extended syntax and option supported (since 3.3.0):: +Extended syntax with options:: source = table1 | join type=outer left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c source = table1 | join type=left left = l right = r where l.a = r.a table2 | fields l.a, r.a, b, c @@ -127,6 +103,8 @@ Extended syntax and option supported (since 3.3.0):: Example 1: Two indices join =========================== +This example shows joining two indices using the basic join syntax. + PPL query:: os> source = state_country | inner join left=a right=b ON a.name = b.name occupation | stats avg(salary) by span(age, 10) as age_span, b.country; @@ -144,6 +122,8 @@ PPL query:: Example 2: Join with subsearch ============================== +This example shows joining with a subsearch using the basic join syntax. + PPL query:: PPL> source = state_country as a | where country = 'USA' OR country = 'England' | left join ON a.name = b.name [ source = occupation | where salary > 0 | fields name, country, salary | sort salary | head 3 ] as b | stats avg(salary) by span(age, 10) as age_span, b.country; @@ -159,6 +139,8 @@ PPL query:: Example 3: Join with field list =============================== +This example shows joining using the extended syntax with field list. + PPL query:: PPL> source = state_country | where country = 'USA' OR country = 'England' | join type=left overwrite=true name [ source = occupation | where salary > 0 | fields name, country, salary | sort salary | head 3 ] | stats avg(salary) by span(age, 10) as age_span, country; @@ -174,6 +156,8 @@ PPL query:: Example 4: Join with options ============================ +This example shows joining using the extended syntax with additional options. + PPL query:: os> source = state_country | join type=inner overwrite=false max=1 name occupation | stats avg(salary) by span(age, 10) as age_span, country; @@ -189,7 +173,7 @@ PPL query:: Limitations =========== -For basic syntax in 3.0.0, if fields in the left outputs and right outputs have the same name. Typically, in the join criteria +For basic syntax, if fields in the left outputs and right outputs have the same name. Typically, in the join criteria ``ON t1.id = t2.id``, the names ``id`` in output are ambiguous. To avoid ambiguous, the ambiguous fields in output rename to ``.id``, or else ``.id`` if no alias existing. @@ -210,6 +194,5 @@ Assume table1 and table2 only contain field ``id``, following PPL queries and th * - source=table1 | join right=tt on table1.id=t2.id [ source=table2 as t2 | eval b = id ] | eval a = 1 - table1.id, tt.id, tt.b, a -For extended syntax (join with field list) in 3.3.0, when duplicate-named fields in output results are deduplicated, the fields in output determined by the value of 'overwrite' option. - -Since 3.3.0, join types ``inner``, ``left``, ``outer`` (alias of ``left``), ``semi`` and ``anti`` are supported by default. ``right``, ``full``, ``cross`` are performance sensitive join types which are disabled by default. Set config ``plugins.calcite.all_join_types.allowed = true`` to enable. +| For extended syntax (join with field list), when duplicate-named fields in output results are deduplicated, the fields in output determined by the value of 'overwrite' option. +| Join types ``inner``, ``left``, ``outer`` (alias of ``left``), ``semi`` and ``anti`` are supported by default. ``right``, ``full``, ``cross`` are performance sensitive join types which are disabled by default. Set config ``plugins.calcite.all_join_types.allowed = true`` to enable. diff --git a/docs/user/ppl/cmd/kmeans.rst b/docs/user/ppl/cmd/kmeans.rst index 6d558248ee4..ca4ba255c7e 100644 --- a/docs/user/ppl/cmd/kmeans.rst +++ b/docs/user/ppl/cmd/kmeans.rst @@ -13,20 +13,19 @@ Description =========== | The ``kmeans`` command applies the kmeans algorithm in the ml-commons plugin on the search result returned by a PPL command. - Syntax ====== kmeans -* centroids: optional. The number of clusters you want to group your data points into. The default value is 2. -* iterations: optional. Number of iterations. The default value is 10. -* distance_type: optional. The distance type can be COSINE, L1, or EUCLIDEAN, The default type is EUCLIDEAN. +* centroids: optional. The number of clusters you want to group your data points into. **Default:** 2. +* iterations: optional. Number of iterations. **Default:** 10. +* distance_type: optional. The distance type can be COSINE, L1, or EUCLIDEAN. **Default:** EUCLIDEAN. Example: Clustering of Iris Dataset =================================== -The example shows how to classify three Iris species (Iris setosa, Iris virginica and Iris versicolor) based on the combination of four features measured from each sample: the length and the width of the sepals and petals. +This example shows how to classify three Iris species (Iris setosa, Iris virginica and Iris versicolor) based on the combination of four features measured from each sample: the length and the width of the sepals and petals. PPL query:: @@ -42,5 +41,4 @@ PPL query:: Limitations =========== -The ``kmeans`` command can only work with ``plugins.calcite.enabled=false``. -It means ``kmeans`` command cannot work together with new PPL commands/functions introduced in 3.0.0 and above. \ No newline at end of file +The ``kmeans`` command can only work with ``plugins.calcite.enabled=false``. \ No newline at end of file diff --git a/docs/user/ppl/cmd/lookup.rst b/docs/user/ppl/cmd/lookup.rst index dfa093c117b..4d4cf84a48b 100644 --- a/docs/user/ppl/cmd/lookup.rst +++ b/docs/user/ppl/cmd/lookup.rst @@ -1,6 +1,6 @@ -============= +====== lookup -============= +====== .. rubric:: Table of contents @@ -10,54 +10,19 @@ lookup Description -============ -| (Experimental) -| (From 3.0.0) -| Lookup command enriches your search data by adding or replacing data from a lookup index (dimension table). -You can extend fields of an index with values from a dimension table, append or replace values when lookup condition is matched. -As an alternative of join command, lookup command is more suitable for enriching the source data with a static dataset. - -Version -======= -3.0.0 +=========== +| The ``lookup`` command enriches your search data by adding or replacing data from a lookup index (dimension table). You can extend fields of an index with values from a dimension table, append or replace values when lookup condition is matched. As an alternative of join command, lookup command is more suitable for enriching the source data with a static dataset. Syntax ====== -LOOKUP ( [AS ])... [(REPLACE | APPEND) ( [AS ])...] +lookup ( [as ])... [(replace | append) ( [as ])...] * lookupIndex: mandatory. The name of lookup index (dimension table). -* lookupMappingField: mandatory. A mapping key in \, analogy to a join key from right table. You can specify multiple \ with comma-delimited. -* sourceMappingField: optional. A mapping key from source (left side), analogy to a join key from left side. If you don't specify any \, its default value is \. -* inputField: optional. A field in \ where matched values are applied to result output. You can specify multiple \ with comma-delimited. If you don't specify any \, all fields expect \ from \ where matched values are applied to result output. -* outputField: optional. A field of output. You can specify zero or multiple \. If you specify \ with an existing field name in source query, its values will be replaced or appended by matched values from \. If the field specified in \ is a new field, in REPLACE strategy, an extended new field will be applied to the results, but fail in APPEND strategy. -* REPLACE | APPEND: optional. The output strategies. Default is REPLACE. If you specify REPLACE, matched values in \ field overwrite the values in result. If you specify APPEND, matched values in \ field only append to the missing values in result. - -Configuration -============= -This command requires Calcite enabled. In 3.0.0-beta, as an experimental the Calcite configuration is disabled by default. - -Enable Calcite:: - - >> curl -H 'Content-Type: application/json' -X PUT localhost:9200/_plugins/_query/settings -d '{ - "transient" : { - "plugins.calcite.enabled" : true - } - }' - -Result set:: - - { - "acknowledged": true, - "persistent": { - "plugins": { - "calcite": { - "enabled": "true" - } - } - }, - "transient": {} - } - +* lookupMappingField: mandatory. A mapping key in ``lookupIndex``, analogy to a join key from right table. You can specify multiple ``lookupMappingField`` with comma-delimited. +* sourceMappingField: optional. A mapping key from source (left side), analogy to a join key from left side. If not specified, defaults to ``lookupMappingField``. +* inputField: optional. A field in ``lookupIndex`` where matched values are applied to result output. You can specify multiple ``inputField`` with comma-delimited. If not specified, all fields except ``lookupMappingField`` from ``lookupIndex`` are applied to result output. +* outputField: optional. A field of output. You can specify zero or multiple ``outputField``. If ``outputField`` has an existing field name in source query, its values will be replaced or appended by matched values from ``inputField``. If the field specified in ``outputField`` is a new field, in replace strategy, an extended new field will be applied to the results, but fail in append strategy. +* replace | append: optional. The output strategies. If replace, matched values in ``lookupIndex`` field overwrite the values in result. If append, matched values in ``lookupIndex`` field only append to the missing values in result. **Default:** replace. Usage ===== @@ -73,8 +38,10 @@ Lookup:: source = table1 | lookup table2 id as cid, name append dept as department, city as location -Example 1: replace -================== +Example 1: Replace strategy +=========================== + +This example shows using the lookup command with the REPLACE strategy to overwrite existing values. PPL query:: @@ -169,8 +136,10 @@ Result set:: "size": 6 } -Example 2: append -================= +Example 2: Append strategy +========================== + +This example shows using the lookup command with the APPEND strategy to fill missing values only. PPL query:: @@ -183,8 +152,10 @@ PPL query:: }' -Example 3: no inputField -======================== +Example 3: No inputField specified +================================== + +This example shows using the lookup command without specifying inputField, which applies all fields from the lookup index. PPL query:: @@ -279,9 +250,11 @@ Result set:: "size": 6 } -Example 4: outputField as a new field +Example 4: OutputField as a new field ===================================== +This example shows using the lookup command with outputField as a new field name. + PPL query:: >> curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl -d '{ diff --git a/docs/user/ppl/cmd/ml.rst b/docs/user/ppl/cmd/ml.rst index f38697adbbc..371df4de880 100644 --- a/docs/user/ppl/cmd/ml.rst +++ b/docs/user/ppl/cmd/ml.rst @@ -10,47 +10,53 @@ ml Description -============ -| The ``ml`` command is to train/predict/trainandpredict on any algorithm in the ml-commons plugin on the search result returned by a PPL command. - +=========== +| Use the ``ml`` command to train/predict/train and predict on any algorithm in the ml-commons plugin on the search result returned by a PPL command. -List of algorithms supported -============ -AD(RCF) -KMEANS +Syntax +====== +AD - Fixed In Time RCF For Time-series Data: +-------------------------------------------- -AD - Fixed In Time RCF For Time-series Data Command Syntax -===================================================== ml action='train' algorithm='rcf' -* number_of_trees(integer): optional. Number of trees in the forest. The default value is 30. -* shingle_size(integer): optional. A shingle is a consecutive sequence of the most recent records. The default value is 8. -* sample_size(integer): optional. The sample size used by stream samplers in this forest. The default value is 256. -* output_after(integer): optional. The number of points required by stream samplers before results are returned. The default value is 32. -* time_decay(double): optional. The decay factor used by stream samplers in this forest. The default value is 0.0001. -* anomaly_rate(double): optional. The anomaly rate. The default value is 0.005. -* time_field(string): mandatory. It specifies the time field for RCF to use as time-series data. -* date_format(string): optional. It's used for formatting time_field field. The default formatting is "yyyy-MM-dd HH:mm:ss". -* time_zone(string): optional. It's used for setting time zone for time_field filed. The default time zone is UTC. -* category_field(string): optional. It specifies the category field used to group inputs. Each category will be independently predicted. +* number_of_trees: optional integer. Number of trees in the forest. **Default:** 30. +* shingle_size: optional integer. A shingle is a consecutive sequence of the most recent records. **Default:** 8. +* sample_size: optional integer. The sample size used by stream samplers in this forest. **Default:** 256. +* output_after: optional integer. The number of points required by stream samplers before results are returned. **Default:** 32. +* time_decay: optional double. The decay factor used by stream samplers in this forest. **Default:** 0.0001. +* anomaly_rate: optional double. The anomaly rate. **Default:** 0.005. +* time_field: mandatory string. It specifies the time field for RCF to use as time-series data. +* date_format: optional string. It's used for formatting time_field field. **Default:** "yyyy-MM-dd HH:mm:ss". +* time_zone: optional string. It's used for setting time zone for time_field field. **Default:** UTC. +* category_field: optional string. It specifies the category field used to group inputs. Each category will be independently predicted. +AD - Batch RCF for Non-time-series Data: +---------------------------------------- -AD - Batch RCF for Non-time-series Data Command Syntax -================================================= ml action='train' algorithm='rcf' -* number_of_trees(integer): optional. Number of trees in the forest. The default value is 30. -* sample_size(integer): optional. Number of random samples given to each tree from the training data set. The default value is 256. -* output_after(integer): optional. The number of points required by stream samplers before results are returned. The default value is 32. -* training_data_size(integer): optional. The default value is the size of your training data set. -* anomaly_score_threshold(double): optional. The threshold of anomaly score. The default value is 1.0. -* category_field(string): optional. It specifies the category field used to group inputs. Each category will be independently predicted. +* number_of_trees: optional integer. Number of trees in the forest. **Default:** 30. +* sample_size: optional integer. Number of random samples given to each tree from the training data set. **Default:** 256. +* output_after: optional integer. The number of points required by stream samplers before results are returned. **Default:** 32. +* training_data_size: optional integer. **Default:** size of your training data set. +* anomaly_score_threshold: optional double. The threshold of anomaly score. **Default:** 1.0. +* category_field: optional string. It specifies the category field used to group inputs. Each category will be independently predicted. + +KMEANS: +------- + +ml action='train' algorithm='kmeans' + +* centroids: optional integer. The number of clusters you want to group your data points into. **Default:** 2. +* iterations: optional integer. Number of iterations. **Default:** 10. +* distance_type: optional string. The distance type can be COSINE, L1, or EUCLIDEAN. **Default:** EUCLIDEAN. Example 1: Detecting events in New York City from taxi ridership data with time-series data =========================================================================================== -The example trains an RCF model and uses the model to detect anomalies in the time-series ridership data. +This example trains an RCF model and uses the model to detect anomalies in the time-series ridership data. PPL query:: @@ -65,7 +71,7 @@ PPL query:: Example 2: Detecting events in New York City from taxi ridership data with time-series data independently with each category ============================================================================================================================ -The example trains an RCF model and uses the model to detect anomalies in the time-series ridership data with multiple category values. +This example trains an RCF model and uses the model to detect anomalies in the time-series ridership data with multiple category values. PPL query:: @@ -82,7 +88,7 @@ PPL query:: Example 3: Detecting events in New York City from taxi ridership data with non-time-series data =============================================================================================== -The example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data. +This example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data. PPL query:: @@ -97,7 +103,7 @@ PPL query:: Example 4: Detecting events in New York City from taxi ridership data with non-time-series data independently with each category ================================================================================================================================ -The example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data with multiple category values. +This example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data with multiple category values. PPL query:: @@ -110,19 +116,10 @@ PPL query:: | day | 6526.0 | 0.0 | False | +----------+---------+-------+-----------+ -KMEANS -====== -ml action='train' algorithm='kmeans' - -* centroids: optional. The number of clusters you want to group your data points into. The default value is 2. -* iterations: optional. Number of iterations. The default value is 10. -* distance_type: optional. The distance type can be COSINE, L1, or EUCLIDEAN, The default type is EUCLIDEAN. - - -Example: Clustering of Iris Dataset -=================================== +Example 5: KMEANS - Clustering of Iris Dataset +=============================================== -The example shows how to classify three Iris species (Iris setosa, Iris virginica and Iris versicolor) based on the combination of four features measured from each sample: the length and the width of the sepals and petals. +This example shows how to use KMEANS to classify three Iris species (Iris setosa, Iris virginica and Iris versicolor) based on the combination of four features measured from each sample: the length and the width of the sepals and petals. PPL query:: @@ -139,4 +136,3 @@ PPL query:: Limitations =========== The ``ml`` command can only work with ``plugins.calcite.enabled=false``. -It means ``ml`` command cannot work together with new PPL commands/functions introduced in 3.0.0 and above. diff --git a/docs/user/ppl/cmd/multisearch.rst b/docs/user/ppl/cmd/multisearch.rst index 2bac577ef23..02285b5e82f 100644 --- a/docs/user/ppl/cmd/multisearch.rst +++ b/docs/user/ppl/cmd/multisearch.rst @@ -1,6 +1,6 @@ -============= +=========== multisearch -============= +=========== .. rubric:: Table of contents @@ -10,9 +10,8 @@ multisearch Description -============ -| (Experimental) -| Using ``multisearch`` command to run multiple search subsearches and merge their results together. The command allows you to combine data from different queries on the same or different sources, and optionally apply subsequent processing to the combined result set. +=========== +| Use the ``multisearch`` command to run multiple search subsearches and merge their results together. The command allows you to combine data from different queries on the same or different sources, and optionally apply subsequent processing to the combined result set. | Key aspects of ``multisearch``: @@ -32,30 +31,10 @@ Description Syntax ====== -| multisearch ... - -**Requirements:** - -* **Minimum 2 subsearches required** - multisearch must contain at least two subsearch blocks -* **Maximum unlimited** - you can specify as many subsearches as needed +multisearch ... -**Subsearch Format:** - -* Each subsearch must be enclosed in square brackets: ``[search ...]`` -* Each subsearch must start with the ``search`` keyword -* Syntax: ``[search source=index | commands...]`` -* Description: Each subsearch is a complete search pipeline enclosed in square brackets - * Supported commands in subsearches: All PPL commands are supported (``where``, ``eval``, ``fields``, ``head``, ``rename``, ``stats``, ``sort``, ``dedup``, etc.) - -* result-processing: optional. Commands applied to the merged results. - - * Description: After the multisearch operation, you can apply any PPL command to process the combined results, such as ``stats``, ``sort``, ``head``, etc. - -Limitations -=========== - -* **Minimum Subsearches**: At least two subsearches must be specified -* **Schema Compatibility**: When fields with the same name exist across subsearches but have incompatible types, the query will fail with an error. To avoid type conflicts, ensure that fields with the same name have the same data type across all subsearches, or use different field names (e.g., by renaming with ``eval`` or using ``fields`` to select non-conflicting columns). +* subsearch1, subsearch2, ...: mandatory. At least two subsearches required. Each subsearch must be enclosed in square brackets and start with the ``search`` keyword. Format: ``[search source=index | commands...]``. All PPL commands are supported within subsearches. +* result-processing: optional. Commands applied to the merged results after the multisearch operation, such as ``stats``, ``sort``, ``head``, etc. Usage ===== @@ -69,7 +48,7 @@ Basic multisearch:: Example 1: Basic Age Group Analysis =================================== -Combine young and adult customers into a single result set for further analysis. +This example combines young and adult customers into a single result set for further analysis. PPL query:: @@ -87,7 +66,7 @@ PPL query:: Example 2: Success Rate Pattern =============================== -Combine high-balance and all valid accounts for comparison analysis. +This example combines high-balance and all valid accounts for comparison analysis. PPL query:: @@ -103,14 +82,31 @@ PPL query:: +-----------+---------+--------------+ Example 3: Timestamp Interleaving -================================== +================================= -Combine time-series data from multiple sources with automatic timestamp-based ordering. +This example combines time-series data from multiple sources with automatic timestamp-based ordering. PPL query:: os> | multisearch [search source=time_data | where category IN ("A", "B")] [search source=time_data2 | where category IN ("E", "F")] | fields @timestamp, category, value, timestamp | head 5; fetched rows / total rows = 5/5 + +This example shows how multisearch gracefully handles cases where some subsearches return no results. + +PPL query:: + + os> | multisearch [search source=accounts | where age > 25 | fields firstname, age] [search source=accounts | where age > 200 | eval impossible = "yes" | fields firstname, age, impossible] | head 5; + fetched rows / total rows = 4/4 + +-----------+-----+------------+ + | firstname | age | impossible | + |-----------+-----+------------| + | Nanette | 28 | null | + | Amber | 32 | null | + | Hattie | 36 | null | + | Dale | 37 | null | + +-----------+-----+------------+ + +Example 5: Type Compatibility - Missing Fields +---------------------+----------+-------+---------------------+ | @timestamp | category | value | timestamp | |---------------------+----------+-------+---------------------| @@ -122,9 +118,8 @@ PPL query:: +---------------------+----------+-------+---------------------+ Example 4: Type Compatibility - Missing Fields -================================================= -Demonstrate how missing fields are handled with NULL insertion. +This example demonstrates how missing fields are handled with NULL insertion. PPL query:: @@ -139,3 +134,25 @@ PPL query:: | Hattie | 36 | null | +-----------+-----+------------+ + +This example shows when the same field name has incompatible types across subsearches, the system automatically renames conflicting fields with numeric suffixes. + +PPL query:: + + os> | multisearch [search source=accounts | fields firstname, age, balance | head 2] [search source=locations | fields description, age, place_id | head 2]; + fetched rows / total rows = 4/4 + +-----------+-----+---------+------------------+------+----------+ + | firstname | age | balance | description | age0 | place_id | + |-----------+-----+---------+------------------+------+----------| + | Amber | 32 | 39225 | null | null | null | + | Hattie | 36 | 5686 | null | null | null | + | null | null| null | Central Park | old | 1001 | + | null | null| null | Times Square | modern| 1002 | + +-----------+-----+---------+------------------+------+----------+ + +In this example, the ``age`` field has type ``bigint`` in accounts but type ``string`` in locations. The system keeps the first occurrence as ``age`` (bigint) and renames the second occurrence to ``age0`` (string), preserving all data while avoiding type conflicts. + +Limitations + +* **Minimum Subsearches**: At least two subsearches must be specified +* **Schema Compatibility**: When fields with the same name exist across subsearches but have incompatible types, the query will fail with an error. To avoid type conflicts, ensure that fields with the same name have the same data type across all subsearches, or use different field names (e.g., by renaming with ``eval`` or using ``fields`` to select non-conflicting columns). \ No newline at end of file diff --git a/docs/user/ppl/cmd/parse.rst b/docs/user/ppl/cmd/parse.rst index 8e0dc7da080..833736238b9 100644 --- a/docs/user/ppl/cmd/parse.rst +++ b/docs/user/ppl/cmd/parse.rst @@ -1,6 +1,6 @@ -============= +===== parse -============= +===== .. rubric:: Table of contents @@ -10,26 +10,25 @@ parse Description -============ +=========== | The ``parse`` command parses a text field with a regular expression and appends the result to the search result. Syntax -============ +====== parse * field: mandatory. The field must be a text field. -* pattern: mandatory string. The regular expression pattern used to extract new fields from the given text field. If a new field name already exists, it will replace the original field. +* pattern: mandatory. The regular expression pattern used to extract new fields from the given text field. If a new field name already exists, it will replace the original field. Regular Expression ================== - The regular expression pattern is used to match the whole text field of each document with Java regex engine. Each named capture group in the expression will become a new ``STRING`` field. Example 1: Create a new field ============================= -The example shows how to create a new field ``host`` for each document. ``host`` will be the host name after ``@`` in ``email`` field. Parsing a null field will return an empty string. +This example shows how to create a new field ``host`` for each document. ``host`` will be the host name after ``@`` in ``email`` field. Parsing a null field will return an empty string. PPL query:: @@ -48,7 +47,7 @@ PPL query:: Example 2: Override an existing field ===================================== -The example shows how to override the existing ``address`` field with street number removed. +This example shows how to override the existing ``address`` field with street number removed. PPL query:: @@ -66,7 +65,7 @@ PPL query:: Example 3: Filter and sort by casted parsed field ================================================= -The example shows how to sort street numbers that are higher than 500 in ``address`` field. +This example shows how to sort street numbers that are higher than 500 in ``address`` field. PPL query:: diff --git a/docs/user/ppl/cmd/patterns.rst b/docs/user/ppl/cmd/patterns.rst index c3a785ce274..eb187de4fab 100644 --- a/docs/user/ppl/cmd/patterns.rst +++ b/docs/user/ppl/cmd/patterns.rst @@ -1,6 +1,6 @@ -============= +======== patterns -============= +======== .. rubric:: Table of contents @@ -10,39 +10,39 @@ patterns Description -============ -* The ``patterns`` command extracts log patterns from a text field and appends the results to the search result. Grouping logs by their patterns makes it easier to aggregate stats from large volumes of log data for analysis and troubleshooting. -* ``patterns`` command now allows users to select different log parsing algorithms to get high log pattern grouping accuracy. Two pattern methods are supported, aka ``simple_pattern`` and ``brain``. -* ``simple_pattern`` algorithm is basically a regex parsing method vs ``brain`` algorithm is an automatic log grouping algorithm with high grouping accuracy and keeps semantic meaning. -(From 3.1.0) +=========== +| The ``patterns`` command extracts log patterns from a text field and appends the results to the search result. Grouping logs by their patterns makes it easier to aggregate stats from large volumes of log data for analysis and troubleshooting. -* ``patterns`` command supports two modes, aka ``label`` and ``aggregation``. ``label`` mode is similar to previous 3.0.0 output. ``aggregation`` mode returns aggregated results on target field. -* V2 Engine engine still have the same output in ``label`` mode as before. In ``aggregation`` mode, it returns aggregated pattern count on labeled pattern as well as sample logs (sample count is configurable) per pattern. -* Calcite engine by default labels the variables with '<*>' placeholder. -* If ``show_numbered_token`` option is turned on, Calcite engine's ``label`` mode not only labels pattern of text but also labels variable tokens in map. In ``aggregation`` mode, it will also output labeled pattern as well as variable tokens per pattern. The variable placeholder is in the format of '' instead of '<*>'. +| ``patterns`` command allows users to select different log parsing algorithms to get high log pattern grouping accuracy. Two pattern methods are supported: ``simple_pattern`` and ``brain``. + +| ``simple_pattern`` algorithm is basically a regex parsing method vs ``brain`` algorithm is an automatic log grouping algorithm with high grouping accuracy and keeps semantic meaning. + +| ``patterns`` command supports two modes: ``label`` and ``aggregation``. ``label`` mode returns individual pattern labels. ``aggregation`` mode returns aggregated results on target field. + +| Calcite engine by default labels the variables with '<*>' placeholder. If ``show_numbered_token`` option is turned on, Calcite engine's ``label`` mode not only labels pattern of text but also labels variable tokens in map. In ``aggregation`` mode, it will also output labeled pattern as well as variable tokens per pattern. The variable placeholder is in the format of '' instead of '<*>'. Syntax -============ +====== patterns [by byClause...] [method=simple_pattern | brain] [mode=label | aggregation] [max_sample_count=integer] [buffer_limit=integer] [show_numbered_token=boolean] [new_field=] (algorithm parameters...) -* field: mandatory. The text(string) field to analyze for patterns. +* field: mandatory. The text field to analyze for patterns. * byClause: optional. Fields or scalar functions used to group logs for labeling/aggregation. -* method: optional. Algorithm choice: ``simple_pattern`` (default) or ``brain``. The method is configured by the setting ``plugins.ppl.pattern.method``. -* mode: optional. Output mode: ``label`` (default) or ``aggregation``. The mode is configured by the setting ``plugins.ppl.pattern.mode``. -* max_sample_count: optional. Max sample logs returned per pattern in aggregation mode (default: 10). The max_sample_count is configured by the setting ``plugins.ppl.pattern.max.sample.count``. -* buffer_limit: optional. Safeguard parameter for ``brain`` algorithm to limit internal temporary buffer size (default: 100,000, min: 50,000). The buffer_limit is configured by the setting ``plugins.ppl.pattern.buffer.limit``. -* show_numbered_token: optional. The flag to turn on numbered token output format (default: false). The show_numbered_token is configured by the setting ``plugins.ppl.pattern.show.numbered.token``. -* new_field: Alias of the output pattern field. (default: "patterns_field"). +* method: optional. Algorithm choice: ``simple_pattern`` or ``brain``. **Default:** ``simple_pattern``. +* mode: optional. Output mode: ``label`` or ``aggregation``. **Default:** ``label``. +* max_sample_count: optional. Max sample logs returned per pattern in aggregation mode. **Default:** 10. +* buffer_limit: optional. Safeguard parameter for ``brain`` algorithm to limit internal temporary buffer size (min: 50,000). **Default:** 100,000. +* show_numbered_token: optional. The flag to turn on numbered token output format. **Default:** false. +* new_field: optional. Alias of the output pattern field. **Default:** "patterns_field". * algorithm parameters: optional. Algorithm-specific tuning: - - ``simple_pattern`` : Define regex via "pattern". - - ``brain`` : Adjust sensitivity with variable_count_threshold (int > 0) and frequency_threshold_percentage (double 0.0 - 1.0). + - ``simple_pattern``: Define regex via "pattern". + - ``brain``: Adjust sensitivity with variable_count_threshold and frequency_threshold_percentage. - - ``variable_count_threshold``: Optional integer(Default value is 5). Words(or we say tokens) are split by space. Algorithm will count how many distinct words are at specific position in initial log groups. Same log group's constant word ideally should be distinct at its position but it's not guaranteed because some words could be enums. Adjusting this threshold can primarily determine the sensitivity of constant words. - - ``frequency_threshold_percentage``: Optional double(Default value is 0.3). Brain's log pattern is selected based on longest word combination. A word combination is words with same frequency per message. To select longest word combination frequency, it needs a lower bound of frequency to ignore too low frequency words. The representative frequency of longest word combination should be >= highest token frequency of log * threshold percentage. Adjusting this threshold could prune some low frequency words. + - ``variable_count_threshold``: optional integer. Words are split by space. Algorithm counts how many distinct words are at specific position in initial log groups. Adjusting this threshold can determine the sensitivity of constant words. **Default:** 5. + - ``frequency_threshold_percentage``: optional double. Brain's log pattern is selected based on longest word combination. This sets the lower bound of frequency to ignore low frequency words. **Default:** 0.3. -Change default pattern method -============ +Change the default pattern method +================================= To override default pattern parameters, users can run following command .. code-block:: @@ -59,9 +59,9 @@ To override default pattern parameters, users can run following command } Simple Pattern Example 1: Create the new field -=============================== +============================================== -The example shows how to extract patterns in ``email`` for each document. Parsing a null field will return an empty string. +This example shows how to extract patterns in ``email`` for each document. Parsing a null field will return an empty string. PPL query:: @@ -77,9 +77,9 @@ PPL query:: +-----------------------+----------------+ Simple Pattern Example 2: Extract log patterns -=============================== +============================================== -The example shows how to extract patterns from a raw log field using the default patterns. +This example shows how to extract patterns from a raw log field using the default patterns. PPL query:: @@ -95,9 +95,9 @@ PPL query:: +-----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------+ Simple Pattern Example 3: Extract log patterns with custom regex pattern -========================================================= +======================================================================== -The example shows how to extract patterns from a raw log field using user defined patterns. +This example shows how to extract patterns from a raw log field using user defined patterns. PPL query:: @@ -113,9 +113,9 @@ PPL query:: +-----------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Simple Pattern Example 4: Return log patterns aggregation result -========================================================= +================================================================ -Starting 3.1.0, patterns command support aggregation mode. The example shows how to get aggregated results from a raw log field. +This example shows how to get aggregated results from a raw log field. PPL query:: @@ -131,13 +131,13 @@ PPL query:: +---------------------------------------------------------------------------------------------------+---------------+-------------------------------------------------------------------------------------------------------------------------------+ Simple Pattern Example 5: Return log patterns aggregation result with detected variable tokens -========================================================= +============================================================================================== -Starting 3.1.0, patterns command support aggregation mode. +This example shows how to get aggregated results with detected variable tokens. Configuration ------------- -With Calcite specific option ``show_numbered_token`` enabled, the output can detect numbered variable tokens from the pattern field. +With option ``show_numbered_token`` enabled, the output can detect numbered variable tokens from the pattern field. PPL query:: @@ -150,9 +150,9 @@ PPL query:: +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Brain Example 1: Extract log patterns -=============================== +===================================== -The example shows how to extract semantic meaningful log patterns from a raw log field using the brain algorithm. The default variable count threshold is 5. +This example shows how to extract semantic meaningful log patterns from a raw log field using the brain algorithm. The default variable count threshold is 5. PPL query:: @@ -168,9 +168,9 @@ PPL query:: +-----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+ Brain Example 2: Extract log patterns with custom parameters -=============================== +============================================================ -The example shows how to extract semantic meaningful log patterns from a raw log field using defined parameter of brain algorithm. +This example shows how to extract semantic meaningful log patterns from a raw log field using custom parameters of the brain algorithm. PPL query:: @@ -186,9 +186,9 @@ PPL query:: +-----------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------+ Brain Example 3: Return log patterns aggregation result -=============================== +======================================================= -Starting 3.1.0, patterns command support aggregation mode. +This example shows how to get aggregated results from a raw log field using the brain algorithm. PPL query:: @@ -201,13 +201,13 @@ PPL query:: +----------------------------------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Brain Example 4: Return log patterns aggregation result with detected variable tokens -========================================================= +===================================================================================== -Starting 3.1.0, patterns command support aggregation mode. +This example shows how to get aggregated results with detected variable tokens using the brain algorithm. Configuration ------------- -With Calcite specific option ``show_numbered_token`` enabled, the output can detect numbered variable tokens from the pattern field. +With option ``show_numbered_token`` enabled, the output can detect numbered variable tokens from the pattern field. PPL query:: @@ -220,6 +220,6 @@ PPL query:: +----------------------------------------------------------------------------------------------------------------------------------------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Limitations -========== +=========== - Patterns command is not pushed down to OpenSearch data node for now. It will only group log patterns on log messages returned to coordinator node. diff --git a/docs/user/ppl/cmd/rare.rst b/docs/user/ppl/cmd/rare.rst index 8d2011cc1b2..95ad7c96cab 100644 --- a/docs/user/ppl/cmd/rare.rst +++ b/docs/user/ppl/cmd/rare.rst @@ -1,6 +1,6 @@ -============= +==== rare -============= +==== .. rubric:: Table of contents @@ -10,28 +10,26 @@ rare Description -============ -| Using ``rare`` command to find the least common tuple of values of all fields in the field list. +=========== +| The ``rare`` command finds the least common tuple of values of all fields in the field list. -**Note**: A maximum of 10 results is returned for each distinct tuple of values of the group-by fields. +| **Note**: A maximum of 10 results is returned for each distinct tuple of values of the group-by fields. Syntax -============ -rare [by-clause] - -rare [rare-options] [by-clause] ``(available from 3.1.0+)`` +====== +rare [rare-options] [by-clause] -* field-list: mandatory. comma-delimited list of field names. -* by-clause: optional. one or more fields to group the results by. -* rare-options: optional. options for the rare command. Supported syntax is [countfield=] [showcount=]. -* showcount=: optional. whether to create a field in output that represent a count of the tuple of values. Default value is ``true``. -* countfield=: optional. the name of the field that contains count. Default value is ``'count'``. +* field-list: mandatory. Comma-delimited list of field names. +* by-clause: optional. One or more fields to group the results by. +* rare-options: optional. Options for the rare command. Supported syntax is [countfield=] [showcount=]. +* showcount=: optional. Whether to create a field in output that represent a count of the tuple of values. **Default:** ``true``. +* countfield=: optional. The name of the field that contains count. **Default:** ``'count'``. Example 1: Find the least common values in a field -=========================================== +================================================== -The example finds least common gender of all the accounts. +This example shows how to find the least common gender of all the accounts. PPL query:: @@ -46,9 +44,9 @@ PPL query:: Example 2: Find the least common values organized by gender -==================================================== +=========================================================== -The example finds least common age of all the accounts group by gender. +This example shows how to find the least common age of all the accounts grouped by gender. PPL query:: @@ -63,10 +61,10 @@ PPL query:: | M | 36 | +--------+-----+ -Example 3: Rare command with Calcite enabled -============================================ +Example 3: Rare command +======================= -The example finds least common gender of all the accounts when ``plugins.calcite.enabled`` is true. +This example shows how to find the least common gender of all the accounts. PPL query:: @@ -83,7 +81,7 @@ PPL query:: Example 4: Specify the count field option ========================================= -The example specifies the count field when ``plugins.calcite.enabled`` is true. +This example shows how to specify the count field. PPL query:: diff --git a/docs/user/ppl/cmd/regex.rst b/docs/user/ppl/cmd/regex.rst index 307aa0129d1..154949ba133 100644 --- a/docs/user/ppl/cmd/regex.rst +++ b/docs/user/ppl/cmd/regex.rst @@ -1,6 +1,6 @@ -============= +===== regex -============= +===== .. rubric:: Table of contents @@ -10,15 +10,11 @@ regex Description -============ +=========== | The ``regex`` command filters search results by matching field values against a regular expression pattern. Only documents where the specified field matches the pattern are included in the results. -Version -======= -3.3.0 - Syntax -============ +====== regex = regex != @@ -28,7 +24,7 @@ regex != * != : operator for negative matching (exclude matches) Regular Expression Engine -========================== +========================= The regex command uses Java's built-in regular expression engine, which supports: @@ -42,7 +38,7 @@ For complete documentation of Java regex patterns and available modes, see the ` Example 1: Basic pattern matching ================================= -The example shows how to filter documents where the ``lastname`` field matches names starting with uppercase letters. +This example shows how to filter documents where the ``lastname`` field matches names starting with uppercase letters. PPL query:: @@ -61,7 +57,7 @@ PPL query:: Example 2: Negative matching ============================ -The example shows how to exclude documents where the ``lastname`` field ends with "son". +This example shows how to exclude documents where the ``lastname`` field ends with "son". PPL query:: @@ -80,7 +76,7 @@ PPL query:: Example 3: Email domain matching ================================ -The example shows how to filter documents by email domain patterns. +This example shows how to filter documents by email domain patterns. PPL query:: @@ -96,7 +92,7 @@ PPL query:: Example 4: Complex patterns with character classes ================================================== -The example shows how to use complex regex patterns with character classes and quantifiers. +This example shows how to use complex regex patterns with character classes and quantifiers. PPL query:: @@ -115,7 +111,7 @@ PPL query:: Example 5: Case-sensitive matching ================================== -The example demonstrates that regex matching is case-sensitive by default. +This example demonstrates that regex matching is case-sensitive by default. PPL query:: @@ -140,5 +136,5 @@ PPL query:: Limitations =========== -* **Field specification required**: A field name must be specified in the regex command. Pattern-only syntax (e.g., ``regex "pattern"``) is not currently supported -* **String fields only**: The regex command currently only supports string fields. Using it on numeric or boolean fields will result in an error +| * **Field specification required**: A field name must be specified in the regex command. Pattern-only syntax (e.g., ``regex "pattern"``) is not currently supported +| * **String fields only**: The regex command currently only supports string fields. Using it on numeric or boolean fields will result in an error diff --git a/docs/user/ppl/cmd/rename.rst b/docs/user/ppl/cmd/rename.rst index ed7f806aad1..eb92a45b8cb 100644 --- a/docs/user/ppl/cmd/rename.rst +++ b/docs/user/ppl/cmd/rename.rst @@ -1,6 +1,6 @@ -============= +====== rename -============= +====== .. rubric:: Table of contents @@ -10,19 +10,18 @@ rename Description -============ -| Using ``rename`` command to rename one or more fields in the search result. - +=========== +| The ``rename`` command renames one or more fields in the search result. Syntax -============ +====== rename AS ["," AS ]... -* source-field: mandatory. The name of the field you want to rename. Supports wildcard patterns since version 3.3 using ``*``. +* source-field: mandatory. The name of the field you want to rename. Supports wildcard patterns using ``*``. * target-field: mandatory. The name you want to rename to. Must have same number of wildcards as the source. -Field Rename Behavior (Since version 3.3) -========================================== +Behavior +======== The rename command handles non-existent fields as follows: @@ -30,17 +29,10 @@ The rename command handles non-existent fields as follows: * **Renaming a non-existent field to an existing field**: The existing target field is removed from the result set. * **Renaming an existing field to an existing field**: The existing target field is removed and the source field is renamed to the target. - -**Notes:** - -* Literal asterisk (*) characters in field names cannot be replaced as asterisk is used for wildcard matching. -* Wildcards are only supported when the Calcite query engine is enabled. - - Example 1: Rename one field =========================== -The example show rename one field. +This example shows how to rename one field. PPL query:: @@ -59,7 +51,7 @@ PPL query:: Example 2: Rename multiple fields ================================= -The example show rename multiple fields. +This example shows how to rename multiple fields. PPL query:: @@ -76,9 +68,9 @@ PPL query:: Example 3: Rename with wildcards -================================= +================================ -The example shows renaming multiple fields using wildcard patterns. (Requires Calcite query engine) +This example shows how to rename multiple fields using wildcard patterns. PPL query:: @@ -95,9 +87,9 @@ PPL query:: Example 4: Rename with multiple wildcard patterns -================================================== +================================================= -The example shows renaming multiple fields using multiple wildcard patterns. (Requires Calcite query engine) +This example shows how to rename multiple fields using multiple wildcard patterns. PPL query:: @@ -113,9 +105,9 @@ PPL query:: +------------+-----------+---------------+ Example 5: Rename existing field to existing field -==================================== +================================================== -The example shows renaming an existing field to an existing field. The target field gets removed and the source field is renamed to the target field. +This example shows how to rename an existing field to an existing field. The target field gets removed and the source field is renamed to the target field. PPL query:: @@ -134,4 +126,5 @@ PPL query:: Limitations =========== -The ``rename`` command is not rewritten to OpenSearch DSL, it is only executed on the coordination node. +| The ``rename`` command is not rewritten to OpenSearch DSL, it is only executed on the coordination node. +| Literal asterisk (*) characters in field names cannot be replaced as asterisk is used for wildcard matching. diff --git a/docs/user/ppl/cmd/replace.rst b/docs/user/ppl/cmd/replace.rst index bcb0d57e677..6ca1e81cfd4 100644 --- a/docs/user/ppl/cmd/replace.rst +++ b/docs/user/ppl/cmd/replace.rst @@ -1,6 +1,6 @@ -============= +======= replace -============= +======= .. rubric:: Table of contents @@ -10,31 +10,22 @@ replace Description -============ -Using ``replace`` command to replace text in one or more fields in the search result. - -Note: This command is only available when Calcite engine is enabled. +=========== +The ``replace`` replaces text in one or more fields in the search result. Syntax -============ +====== replace '' WITH '' [, '' WITH '']... IN [, ]... - -Parameters -========== -* **pattern**: mandatory. The text pattern you want to replace. Currently supports only plain text literals (no wildcards or regular expressions). -* **replacement**: mandatory. The text you want to replace with. -* **field-name**: mandatory. One or more field names where the replacement should occur. - - -Examples -======== +* pattern: mandatory. The text pattern you want to replace. Currently supports only plain text literals (no wildcards or regular expressions). +* replacement: mandatory. The text you want to replace with. +* field-name: mandatory. One or more field names where the replacement should occur. Example 1: Replace text in one field ------------------------------------- +==================================== -The example shows replacing text in one field. +This example shows replacing text in one field. PPL query:: @@ -51,9 +42,9 @@ PPL query:: Example 2: Replace text in multiple fields ------------------------------------- +========================================== -The example shows replacing text in multiple fields. +This example shows replacing text in multiple fields. PPL query:: @@ -70,9 +61,9 @@ PPL query:: Example 3: Replace with other commands in a pipeline ------------------------------------- +==================================================== -The example shows using replace with other commands in a query pipeline. +This example shows using replace with other commands in a query pipeline. PPL query:: @@ -87,9 +78,9 @@ PPL query:: +----------+-----+ Example 4: Replace with multiple pattern/replacement pairs ------------------------------------- +========================================================== -The example shows using multiple pattern/replacement pairs in a single replace command. The replacements are applied sequentially. +This example shows using multiple pattern/replacement pairs in a single replace command. The replacements are applied sequentially. PPL query:: @@ -105,7 +96,7 @@ PPL query:: +-----------+ Example 5: Pattern matching with LIKE and replace ------------------------------------- +================================================= Since replace command only supports plain string literals, you can use LIKE command with replace for pattern matching needs. diff --git a/docs/user/ppl/cmd/reverse.rst b/docs/user/ppl/cmd/reverse.rst index 2efe833855f..d839a687bf9 100644 --- a/docs/user/ppl/cmd/reverse.rst +++ b/docs/user/ppl/cmd/reverse.rst @@ -1,6 +1,6 @@ -============= +======= reverse -============= +======= .. rubric:: Table of contents @@ -10,28 +10,23 @@ reverse Description -============ -| Using ``reverse`` command to reverse the display order of search results. The same results are returned, but in reverse order. - -Version -======= -3.2.0 +=========== +| The ``reverse`` command reverses the display order of search results. The same results are returned, but in reverse order. Syntax -============ +====== reverse - * No parameters: The reverse command takes no arguments or options. Note -===== -The `reverse` command processes the entire dataset. If applied directly to millions of records, it will consume significant memory resources on the coordinating node. Users should only apply the `reverse` command to smaller datasets, typically after aggregation operations. +==== +| The `reverse` command processes the entire dataset. If applied directly to millions of records, it will consume significant memory resources on the coordinating node. Users should only apply the `reverse` command to smaller datasets, typically after aggregation operations. Example 1: Basic reverse operation ================================== -The example shows reversing the order of all documents. +This example shows reversing the order of all documents. PPL query:: @@ -50,7 +45,7 @@ PPL query:: Example 2: Reverse with sort ============================ -The example shows reversing results after sorting by age in ascending order, effectively giving descending order. +This example shows reversing results after sorting by age in ascending order, effectively giving descending order. PPL query:: @@ -69,7 +64,7 @@ PPL query:: Example 3: Reverse with head ============================ -The example shows using reverse with head to get the last 2 records from the original order. +This example shows using reverse with head to get the last 2 records from the original order. PPL query:: @@ -86,7 +81,7 @@ PPL query:: Example 4: Double reverse ========================= -The example shows that applying reverse twice returns to the original order. +This example shows that applying reverse twice returns to the original order. PPL query:: @@ -103,9 +98,9 @@ PPL query:: Example 5: Reverse with complex pipeline -======================================= +======================================== -The example shows reverse working with filtering and field selection. +This example shows reverse working with filtering and field selection. PPL query:: diff --git a/docs/user/ppl/cmd/rex.rst b/docs/user/ppl/cmd/rex.rst index 28839247194..5a22d2965ae 100644 --- a/docs/user/ppl/cmd/rex.rst +++ b/docs/user/ppl/cmd/rex.rst @@ -1,6 +1,6 @@ -============= +=== rex -============= +=== .. rubric:: Table of contents @@ -10,37 +10,31 @@ rex Description -============ +=========== | The ``rex`` command extracts fields from a raw text field using regular expression named capture groups. -Version -======= -3.3.0 - Syntax -============ +====== rex [mode=] field= [max_match=] [offset_field=] * field: mandatory. The field must be a string field to extract data from. * pattern: mandatory string. The regular expression pattern with named capture groups used to extract new fields. Pattern must contain at least one named capture group using ``(?pattern)`` syntax. -* mode: optional. Either ``extract`` (default) or ``sed``. - - - **extract mode** (default): Creates new fields from regular expression named capture groups. This is the standard field extraction behavior. - - **sed mode**: Performs text substitution on the field using sed-style patterns: - - - ``s/pattern/replacement/`` - Replace first occurrence - - ``s/pattern/replacement/g`` - Replace all occurrences (global) - - ``s/pattern/replacement/n`` - Replace only the nth occurrence (where n is a number) - - ``y/from_chars/to_chars/`` - Character-by-character transliteration - - Backreferences: ``\1``, ``\2``, etc. reference captured groups in replacement +* mode: optional. Either ``extract`` or ``sed``. **Default:** extract + * **extract mode** (default): Creates new fields from regular expression named capture groups. This is the standard field extraction behavior. + * **sed mode**: Performs text substitution on the field using sed-style patterns: + * ``s/pattern/replacement/`` - Replace first occurrence + * ``s/pattern/replacement/g`` - Replace all occurrences (global) + * ``s/pattern/replacement/n`` - Replace only the nth occurrence (where n is a number) + * ``y/from_chars/to_chars/`` - Character-by-character transliteration + * Backreferences: ``\1``, ``\2``, etc. reference captured groups in replacement * max_match: optional integer (default=1). Maximum number of matches to extract. If greater than 1, extracted fields become arrays. The value 0 means unlimited matches, but is automatically capped to the configured limit (default: 10, configurable via ``plugins.ppl.rex.max_match.limit``). * offset_field: optional string. Field name to store the character offset positions of matches. Only available in extract mode. Example 1: Basic Field Extraction -================================== +================================= -Extract username and domain from email addresses using named capture groups. Both extracted fields are returned as string type. +This example shows extracting username and domain from email addresses using named capture groups. Both extracted fields are returned as string type. PPL query:: @@ -55,9 +49,9 @@ PPL query:: Example 2: Handling Non-matching Patterns -========================================== +========================================= -The rex command returns all events, setting extracted fields to null for non-matching patterns. Extracted fields would be string type when matches are found. +This example shows the rex command returning all events, setting extracted fields to null for non-matching patterns. Extracted fields would be string type when matches are found. PPL query:: @@ -72,9 +66,9 @@ PPL query:: Example 3: Multiple Matches with max_match -=========================================== +========================================== -Extract multiple words from address field using max_match parameter. The extracted field is returned as an array type containing string elements. +This example shows extracting multiple words from address field using max_match parameter. The extracted field is returned as an array type containing string elements. PPL query:: @@ -90,9 +84,9 @@ PPL query:: Example 4: Text Replacement with mode=sed -========================================== +========================================= -Replace email domains using sed mode for text substitution. The extracted field is returned as string type. +This example shows replacing email domains using sed mode for text substitution. The extracted field is returned as string type. PPL query:: @@ -107,9 +101,9 @@ PPL query:: Example 5: Using offset_field -============================== +============================= -Track the character positions where matches occur. Extracted fields are string type, and the offset_field is also string type. +This example shows tracking the character positions where matches occur. Extracted fields are string type, and the offset_field is also string type. PPL query:: @@ -124,9 +118,9 @@ PPL query:: Example 6: Complex Email Pattern -================================= +================================ -Extract comprehensive email components including top-level domain. All extracted fields are returned as string type. +This example shows extracting comprehensive email components including top-level domain. All extracted fields are returned as string type. PPL query:: @@ -141,9 +135,9 @@ PPL query:: Example 7: Chaining Multiple rex Commands -========================================== +========================================= -Extract initial letters from both first and last names. All extracted fields are returned as string type. +This example shows extracting initial letters from both first and last names. All extracted fields are returned as string type. PPL query:: @@ -159,9 +153,9 @@ PPL query:: Example 8: Named Capture Group Limitations -============================================ +========================================== -Demonstrates naming restrictions for capture groups. Group names cannot contain underscores due to Java regex limitations. +This example demonstrates naming restrictions for capture groups. Group names cannot contain underscores due to Java regex limitations. Invalid PPL query with underscores:: @@ -182,9 +176,9 @@ Correct PPL query without underscores:: Example 9: Max Match Limit Protection -====================================== +===================================== -Demonstrates the max_match limit protection mechanism. When max_match=0 (unlimited) is specified, the system automatically caps it to prevent memory exhaustion. +This example demonstrates the max_match limit protection mechanism. When max_match=0 (unlimited) is specified, the system automatically caps it to prevent memory exhaustion. PPL query with max_match=0 automatically capped to default limit of 10:: @@ -221,22 +215,19 @@ Special Characters in Group Names No No Limitations =========== - -There are several important limitations with the rex command: - **Named Capture Group Naming:** -- Group names must start with a letter and contain only letters and digits -- For detailed Java regex pattern syntax and usage, refer to the `official Java Pattern documentation `_ +* Group names must start with a letter and contain only letters and digits +* For detailed Java regex pattern syntax and usage, refer to the `official Java Pattern documentation `_ **Pattern Requirements:** -- Pattern must contain at least one named capture group -- Regular capture groups ``(...)`` without names are not allowed +* Pattern must contain at least one named capture group +* Regular capture groups ``(...)`` without names are not allowed **Max Match Limit:** - -- The ``max_match`` parameter is subject to a configurable system limit to prevent memory exhaustion -- When ``max_match=0`` (unlimited) is specified, it is automatically capped at the configured limit (default: 10) -- User-specified values exceeding the configured limit will result in an error -- Users can adjust the limit via the ``plugins.ppl.rex.max_match.limit`` cluster setting. Setting this limit to a large value is not recommended as it can lead to excessive memory consumption, especially with patterns that match empty strings (e.g., ``\d*``, ``\w*``) \ No newline at end of file + +* The ``max_match`` parameter is subject to a configurable system limit to prevent memory exhaustion +* When ``max_match=0`` (unlimited) is specified, it is automatically capped at the configured limit (default: 10) +* User-specified values exceeding the configured limit will result in an error +* Users can adjust the limit via the ``plugins.ppl.rex.max_match.limit`` cluster setting. Setting this limit to a large value is not recommended as it can lead to excessive memory consumption, especially with patterns that match empty strings (e.g., ``\d*``, ``\w*``) \ No newline at end of file diff --git a/docs/user/ppl/cmd/search.rst b/docs/user/ppl/cmd/search.rst index 44cd2377ce0..463aa57c343 100644 --- a/docs/user/ppl/cmd/search.rst +++ b/docs/user/ppl/cmd/search.rst @@ -1,6 +1,6 @@ -============= +====== search -============= +====== .. rubric:: Table of contents @@ -10,12 +10,12 @@ search Description -============ -| Using ``search`` command to retrieve document from the index. ``search`` command could be only used as the first command in the PPL query. +=========== +| The ``search`` command retrieves document from the index. The ``search`` command can only be used as the first command in the PPL query. Syntax -============ +====== search source=[:] [search-expression] * search: search keyword, which could be ignored. @@ -88,7 +88,7 @@ You can check or modify the default field setting:: } Field Types and Search Behavior -================================ +=============================== **Text Fields**: Full-text search, phrase search @@ -135,11 +135,8 @@ Cross-Cluster Search ==================== Cross-cluster search lets any node in a cluster execute search requests against other clusters. Refer to `Cross-Cluster Search `_ for configuration. -Examples -======== - Example 1: Text Search ------------------------------------ +====================== **Basic Text Search** (unquoted single term):: @@ -194,7 +191,7 @@ Note: ``search user email`` is equivalent to ``search user AND email``. Multiple +----------------------------------------------------------------------------------------------------------+ Example 2: Boolean Logic and Operator Precedence -------------------------------------------------- +================================================= **Boolean Operators**:: @@ -230,7 +227,7 @@ Example 2: Boolean Logic and Operator Precedence The above evaluates as ``(severityText="ERROR" OR severityText="WARN") AND severityNumber>15`` Example 3: NOT vs != Semantics -------------------------------- +============================== **!= operator** (field must exist and not equal the value):: @@ -260,7 +257,7 @@ Example 3: NOT vs != Semantics Dale Adams (account 18) has ``employer=null``. He appears in ``NOT employer="Quility"`` but not in ``employer!="Quility"``. Example 4: Wildcards --------------------- +==================== **Wildcard Patterns**:: @@ -302,7 +299,7 @@ Example 4: Wildcards Example 5: Range Queries -------------------------- +======================== Use comparison operators (>, <, >=, <=) to filter numeric and date fields within specific ranges. Range queries are particularly useful for filtering by age, price, timestamps, or any numeric metrics. @@ -327,7 +324,7 @@ Use comparison operators (>, <, >=, <=) to filter numeric and date fields within +---------------------------------------------------------+ Example 6: Field Search with Wildcards ---------------------------------------- +====================================== When searching in text or keyword fields, wildcards enable partial matching. This is particularly useful for finding records where you only know part of the value. Note that wildcards work best with keyword fields, while text fields may produce unexpected results due to tokenization. @@ -359,7 +356,7 @@ When searching in text or keyword fields, wildcards enable partial matching. Thi * **Case sensitivity**: Keyword field wildcards are case-sensitive unless normalized during indexing Example 7: IN Operator and Field Comparisons ---------------------------------------------- +============================================ The IN operator efficiently checks if a field matches any value from a list. This is cleaner and more performant than chaining multiple OR conditions for the same field. @@ -394,7 +391,7 @@ The IN operator efficiently checks if a field matches any value from a list. Thi +---------------------------------------------------------+ Example 8: Complex Expressions -------------------------------- +============================== Combine multiple conditions using boolean operators and parentheses to create sophisticated search queries. @@ -419,7 +416,7 @@ Combine multiple conditions using boolean operators and parentheses to create so +---------------------------------------------------------+ Example 9: Time Modifiers --------------------------- +========================= Time modifiers filter search results by time range using the implicit ``@timestamp`` field. They support various time formats for precise temporal filtering. @@ -476,7 +473,7 @@ Time modifiers filter search results by time range using the implicit ``@timesta +-------------------------------+--------------+ Example 10: Special Characters and Escaping -------------------------------------------- +=========================================== Understand when and how to escape special characters in your search queries. There are two categories of characters that need escaping: @@ -541,7 +538,7 @@ Note: Each backslash in the search value needs to be escaped with another backsl +--------------------------------------------------------------------------------------------------------------------------------------------------------+ Example 11: Fetch All Data ----------------------------- +========================== Retrieve all documents from an index by specifying only the source without any search conditions. This is useful for exploring small datasets or verifying data ingestion. diff --git a/docs/user/ppl/cmd/showdatasources.rst b/docs/user/ppl/cmd/showdatasources.rst index f12622f54da..c60136d979e 100644 --- a/docs/user/ppl/cmd/showdatasources.rst +++ b/docs/user/ppl/cmd/showdatasources.rst @@ -10,19 +10,19 @@ show datasources Description -============ -| Using ``show datasources`` command to query datasources configured in the PPL engine. ``show datasources`` command could be only used as the first command in the PPL query. +=========== +| Use the ``show datasources`` command to query datasources configured in the PPL engine. The ``show datasources`` command can only be used as the first command in the PPL query. Syntax -============ +====== show datasources Example 1: Fetch all PROMETHEUS datasources =========================================== -The example fetches all the datasources of type prometheus. +This example shows fetching all the datasources of type prometheus. PPL query for all PROMETHEUS DATASOURCES:: @@ -38,4 +38,3 @@ PPL query for all PROMETHEUS DATASOURCES:: Limitations =========== The ``show datasources`` command can only work with ``plugins.calcite.enabled=false``. -It means ``show datasources`` command cannot work together with new PPL commands/functions introduced in 3.0.0 and above. diff --git a/docs/user/ppl/cmd/sort.rst b/docs/user/ppl/cmd/sort.rst index e02a8fdae8d..579fe0aab19 100644 --- a/docs/user/ppl/cmd/sort.rst +++ b/docs/user/ppl/cmd/sort.rst @@ -1,6 +1,6 @@ -============= +==== sort -============= +==== .. rubric:: Table of contents @@ -10,16 +10,14 @@ sort Description -============ -| Using ``sort`` command to sorts all the search result by the specified fields. - +=========== +| The ``sort`` command sorts all the search results by the specified fields. Syntax -============ sort [count] <[+|-] sort-field | sort-field [asc|a|desc|d]>... -* count (Since 3.3): optional. The number of results to return. **Default:** returns all results. Specifying a count of 0 or less than 0 also returns all results. +* count: optional. The number of results to return. Specifying a count of 0 or less than 0 returns all results. **Default:** 0. * [+|-]: optional. The plus [+] stands for ascending order and NULL/MISSING first and a minus [-] stands for descending order and NULL/MISSING last. **Default:** ascending order and NULL/MISSING first. * [asc|a|desc|d]: optional. asc/a stands for ascending order and NULL/MISSING first. desc/d stands for descending order and NULL/MISSING last. **Default:** ascending order and NULL/MISSING first. * sort-field: mandatory. The field used to sort. Can use ``auto(field)``, ``str(field)``, ``ip(field)``, or ``num(field)`` to specify how to interpret field values. @@ -29,9 +27,9 @@ sort [count] <[+|-] sort-field | sort-field [asc|a|desc|d]>... Example 1: Sort by one field -============================= +============================ -The example show sort all the document with age field in ascending order. +This example shows sorting all documents by age field in ascending order. PPL query:: @@ -50,7 +48,7 @@ PPL query:: Example 2: Sort by one field return all the result ================================================== -The example show sort all the document with age field in ascending order. +This example shows sorting all documents by age field in ascending order and returning all results. PPL query:: @@ -69,7 +67,7 @@ PPL query:: Example 3: Sort by one field in descending order (using -) ========================================================== -The example show sort all the document with age field in descending order using the - operator. +This example shows sorting all documents by age field in descending order using the - operator. PPL query:: @@ -85,9 +83,9 @@ PPL query:: +----------------+-----+ Example 4: Sort by one field in descending order (using desc) -============================================================== +============================================================= -The example show sort all the document with age field in descending order using the desc keyword. +This example shows sorting all documents by age field in descending order using the desc keyword. PPL query:: @@ -105,7 +103,7 @@ PPL query:: Example 5: Sort by multiple fields (using +/-) ============================================== -The example show sort all the document with gender field in ascending order and age field in descending using +/- operators. +This example shows sorting all the documents by the gender field in ascending order and age field in descending using the +/- operators. PPL query:: @@ -121,9 +119,9 @@ PPL query:: +----------------+--------+-----+ Example 6: Sort by multiple fields (using asc/desc) -==================================================== +=================================================== -The example show sort all the document with gender field in ascending order and age field in descending using asc/desc keywords. +This example shows sorting all the document by the gender field in ascending order and age field in descending using the asc/desc keywords. PPL query:: @@ -141,7 +139,7 @@ PPL query:: Example 7: Sort by field include null value =========================================== -The example shows sorting the employer field by the default option (ascending order and null first), the result shows that the null value is in the first row. +This example shows sorting the employer field by the default option (ascending order and null first). The result shows that the null value is in the first row. PPL query:: @@ -157,9 +155,9 @@ PPL query:: +----------+ Example 8: Specify the number of sorted documents to return -============================================================ +=========================================================== -The example shows sorting all the document and returning 2 documents. +This example shows sorting all the documents and returning 2 documents. PPL query:: @@ -173,9 +171,9 @@ PPL query:: +----------------+-----+ Example 9: Sort with desc modifier -=================================== +================================== -The example shows sorting with the desc modifier to reverse sort order. +This example shows sorting with the desc modifier to reverse sort order. PPL query:: @@ -191,9 +189,9 @@ PPL query:: +----------------+-----+ Example 10: Sort with specifying field type -================================== +=========================================== -The example shows sorting with str() to sort numeric values lexicographically. +This example shows sorting with str() to sort numeric values lexicographically. PPL query:: diff --git a/docs/user/ppl/cmd/spath.rst b/docs/user/ppl/cmd/spath.rst index 7defb4437f2..b4ab6157803 100644 --- a/docs/user/ppl/cmd/spath.rst +++ b/docs/user/ppl/cmd/spath.rst @@ -1,6 +1,6 @@ -============= +===== spath -============= +===== .. rubric:: Table of contents @@ -10,20 +10,15 @@ spath Description -============ +=========== | The `spath` command allows extracting fields from structured text data. It currently allows selecting from JSON data with JSON paths. -Version -======= -3.3.0 - Syntax -============ +====== spath input= [output=] [path=] - * input: mandatory. The field to scan for JSON data. -* output: optional. The destination field that the data will be loaded to. Defaults to the value of `path`. +* output: optional. The destination field that the data will be loaded to. **Default:** value of `path`. * path: mandatory. The path of the data to load for the object. For more information on path syntax, see `json_extract <../functions/json.rst#json_extract>`_. Note @@ -33,7 +28,7 @@ The `spath` command currently does not support pushdown behavior for extraction. Example 1: Simple Field Extraction ================================== -The simplest spath is to extract a single field. This extracts `n` from the `doc` field of type `text`. +The simplest spath is to extract a single field. This example extracts `n` from the `doc` field of type `text`. PPL query:: @@ -48,9 +43,9 @@ PPL query:: +----------+---+ Example 2: Lists & Nesting -============================ +========================== -These queries demonstrate more JSON path uses, like traversing nested fields and extracting list elements. +This example demonstrates more JSON path uses, like traversing nested fields and extracting list elements. PPL query:: @@ -65,9 +60,9 @@ PPL query:: +------------------------------------------------------+---------------+--------------+--------+ Example 3: Sum of inner elements -============================ +================================ -The example shows extracting an inner field and doing statistics on it, using the docs from example 1. It also demonstrates that `spath` always returns strings for inner types. +This example shows extracting an inner field and doing statistics on it, using the docs from example 1. It also demonstrates that `spath` always returns strings for inner types. PPL query:: diff --git a/docs/user/ppl/cmd/stats.rst b/docs/user/ppl/cmd/stats.rst index 24b80d4675b..9d19b57f3e1 100644 --- a/docs/user/ppl/cmd/stats.rst +++ b/docs/user/ppl/cmd/stats.rst @@ -1,6 +1,6 @@ -============= +===== stats -============= +===== .. rubric:: Table of contents @@ -10,652 +10,36 @@ stats Description -============ -| Using ``stats`` command to calculate the aggregation from search result. - -The following table dataSources the aggregation functions and also indicates how the NULL/MISSING values is handled: - -+----------+-------------+-------------+ -| Function | NULL | MISSING | -+----------+-------------+-------------+ -| COUNT | Not counted | Not counted | -+----------+-------------+-------------+ -| SUM | Ignore | Ignore | -+----------+-------------+-------------+ -| AVG | Ignore | Ignore | -+----------+-------------+-------------+ -| MAX | Ignore | Ignore | -+----------+-------------+-------------+ -| MIN | Ignore | Ignore | -+----------+-------------+-------------+ -| FIRST | Ignore | Ignore | -+----------+-------------+-------------+ -| LAST | Ignore | Ignore | -+----------+-------------+-------------+ -| LIST | Ignore | Ignore | -+----------+-------------+-------------+ -| VALUES | Ignore | Ignore | -+----------+-------------+-------------+ +=========== +| The ``stats`` command calculates the aggregation from the search result. + Syntax -============ +====== stats [bucket_nullable=bool] ... [by-clause] - -* aggregation: mandatory. A aggregation function. The argument of aggregation must be field. - -* bucket_nullable: optional (since 3.3.0). Controls whether the stats command includes null buckets in group-by aggregations. When set to ``false``, the aggregation ignores records where the group-by field is null, resulting in faster performance by excluding null bucket. The default value of ``bucket_nullable`` is determined by ``plugins.ppl.syntax.legacy.preferred``: - - * When ``plugins.ppl.syntax.legacy.preferred=true``, ``bucket_nullable`` defaults to ``true`` - * When ``plugins.ppl.syntax.legacy.preferred=false``, ``bucket_nullable`` defaults to ``false`` - -* by-clause: optional. - - * Syntax: by [span-expression,] [field,]... - * Description: The by clause could be the fields and expressions like scalar functions and aggregation functions. Besides, the span clause can be used to split specific field into buckets in the same interval, the stats then does the aggregation by these span buckets. - * Default: If no is specified, the stats command returns only one row, which is the aggregation over the entire result set. - -* span-expression: optional, at most one. - - * Syntax: span([field_expr,] interval_expr) - * Description: The unit of the interval expression is the natural unit by default. If ``field_expr`` is omitted, span will use the implicit ``@timestamp`` field. An error will be thrown if this field doesn't exist. **If the field is a date/time type field, the aggregation results always ignore null bucket**. And the interval is in date/time units, you will need to specify the unit in the interval expression. For example, to split the field ``age`` into buckets by 10 years, it looks like ``span(age, 10)``. And here is another example of time span, the span to split a ``timestamp`` field into hourly intervals, it looks like ``span(timestamp, 1h)``. -* Available time unit: - -+----------------------------+ -| Span Interval Units | -+============================+ -| millisecond (ms) | -+----------------------------+ -| second (s) | -+----------------------------+ -| minute (m, case sensitive) | -+----------------------------+ -| hour (h) | -+----------------------------+ -| day (d) | -+----------------------------+ -| week (w) | -+----------------------------+ -| month (M, case sensitive) | -+----------------------------+ -| quarter (q) | -+----------------------------+ -| year (y) | -+----------------------------+ - -Configuration -============= -Some aggregation functions require Calcite to be enabled for proper functionality. To enable Calcite, use the following command: - -Enable Calcite:: - - >> curl -H 'Content-Type: application/json' -X PUT localhost:9200/_plugins/_query/settings -d '{ - "persistent" : { - "plugins.calcite.enabled" : true - } - }' - -Aggregation Functions -===================== - -COUNT ------ - -Description ->>>>>>>>>>> - -Usage: Returns a count of the number of expr in the rows retrieved. The ``C()`` function, ``c``, and ``count`` can be used as abbreviations for ``COUNT()``. To perform a filtered counting, wrap the condition to satisfy in an `eval` expression. - -Example:: - - os> source=accounts | stats count(), c(), count, c; - fetched rows / total rows = 1/1 - +---------+-----+-------+---+ - | count() | c() | count | c | - |---------+-----+-------+---| - | 4 | 4 | 4 | 4 | - +---------+-----+-------+---+ - -Example of filtered counting:: - - os> source=accounts | stats count(eval(age > 30)) as mature_users; - fetched rows / total rows = 1/1 - +--------------+ - | mature_users | - |--------------| - | 3 | - +--------------+ - -Example of filtered counting with complex conditions:: - - os> source=accounts | stats count(eval(age > 30 and balance > 25000)) as high_value_users; - fetched rows / total rows = 1/1 - +------------------+ - | high_value_users | - |------------------| - | 1 | - +------------------+ - -SUM ---- - -Description ->>>>>>>>>>> - -Usage: SUM(expr). Returns the sum of expr. - -Example:: - - os> source=accounts | stats sum(age) by gender; - fetched rows / total rows = 2/2 - +----------+--------+ - | sum(age) | gender | - |----------+--------| - | 28 | F | - | 101 | M | - +----------+--------+ - -AVG ---- - -Description ->>>>>>>>>>> - -Usage: AVG(expr). Returns the average value of expr. - -Example:: - - os> source=accounts | stats avg(age) by gender; - fetched rows / total rows = 2/2 - +--------------------+--------+ - | avg(age) | gender | - |--------------------+--------| - | 28.0 | F | - | 33.666666666666664 | M | - +--------------------+--------+ - -MAX ---- - -Description ->>>>>>>>>>> - -Usage: MAX(expr). Returns the maximum value of expr. - -For non-numeric fields, values are sorted lexicographically. - -Note: Non-numeric field support requires Calcite to be enabled (see `Configuration`_ section above). Available since version 3.3.0. - -Example:: - - os> source=accounts | stats max(age); - fetched rows / total rows = 1/1 - +----------+ - | max(age) | - |----------| - | 36 | - +----------+ - -Example with text field:: - - os> source=accounts | stats max(firstname); - fetched rows / total rows = 1/1 - +----------------+ - | max(firstname) | - |----------------| - | Nanette | - +----------------+ - -MIN ---- - -Description ->>>>>>>>>>> - -Usage: MIN(expr). Returns the minimum value of expr. - -For non-numeric fields, values are sorted lexicographically. - -Note: Non-numeric field support requires Calcite to be enabled (see `Configuration`_ section above). Available since version 3.3.0. - -Example:: - - os> source=accounts | stats min(age); - fetched rows / total rows = 1/1 - +----------+ - | min(age) | - |----------| - | 28 | - +----------+ - -Example with text field:: - - os> source=accounts | stats min(firstname); - fetched rows / total rows = 1/1 - +----------------+ - | min(firstname) | - |----------------| - | Amber | - +----------------+ - -VAR_SAMP --------- - -Description ->>>>>>>>>>> - -Usage: VAR_SAMP(expr). Returns the sample variance of expr. - -Example:: - - os> source=accounts | stats var_samp(age); - fetched rows / total rows = 1/1 - +--------------------+ - | var_samp(age) | - |--------------------| - | 10.916666666666666 | - +--------------------+ - -VAR_POP -------- - -Description ->>>>>>>>>>> - -Usage: VAR_POP(expr). Returns the population standard variance of expr. - -Example:: - - os> source=accounts | stats var_pop(age); - fetched rows / total rows = 1/1 - +--------------+ - | var_pop(age) | - |--------------| - | 8.1875 | - +--------------+ - -STDDEV_SAMP ------------ - -Description ->>>>>>>>>>> - -Usage: STDDEV_SAMP(expr). Return the sample standard deviation of expr. - -Example:: - - os> source=accounts | stats stddev_samp(age); - fetched rows / total rows = 1/1 - +-------------------+ - | stddev_samp(age) | - |-------------------| - | 3.304037933599835 | - +-------------------+ - -STDDEV_POP ----------- - -Description ->>>>>>>>>>> - -Usage: STDDEV_POP(expr). Return the population standard deviation of expr. - -Example:: - - os> source=accounts | stats stddev_pop(age); - fetched rows / total rows = 1/1 - +--------------------+ - | stddev_pop(age) | - |--------------------| - | 2.8613807855648994 | - +--------------------+ - -DISTINCT_COUNT_APPROX ---------------------- - -Description ->>>>>>>>>>> - -Version: 3.1.0 - -Usage: DISTINCT_COUNT_APPROX(expr). Return the approximate distinct count value of the expr, using the hyperloglog++ algorithm. - -Example:: - - PPL> source=accounts | stats distinct_count_approx(gender); - fetched rows / total rows = 1/1 - +-------------------------------+ - | distinct_count_approx(gender) | - |-------------------------------| - | 2 | - +-------------------------------+ - -TAKE ----- - -Description ->>>>>>>>>>> - -Usage: TAKE(field [, size]). Return original values of a field. It does not guarantee on the order of values. - -* field: mandatory. The field must be a text field. -* size: optional integer. The number of values should be returned. Default is 10. - -Example:: - - os> source=accounts | stats take(firstname); - fetched rows / total rows = 1/1 - +-----------------------------+ - | take(firstname) | - |-----------------------------| - | [Amber,Hattie,Nanette,Dale] | - +-----------------------------+ - -PERCENTILE or PERCENTILE_APPROX -------------------------------- - -Description ->>>>>>>>>>> - -Usage: PERCENTILE(expr, percent) or PERCENTILE_APPROX(expr, percent). Return the approximate percentile value of expr at the specified percentage. - -* percent: The number must be a constant between 0 and 100. - -Note: From 3.1.0, the percentile implementation is switched to MergingDigest from AVLTreeDigest. Ref `issue link `_. - -Example:: - - os> source=accounts | stats percentile(age, 90) by gender; - fetched rows / total rows = 2/2 - +---------------------+--------+ - | percentile(age, 90) | gender | - |---------------------+--------| - | 28 | F | - | 36 | M | - +---------------------+--------+ - -Percentile Shortcut Functions ->>>>>>>>>>>>>>>>>>>>>>>>>>>>> - -Version: 3.3.0 - -For convenience, OpenSearch PPL provides shortcut functions for common percentiles: - -- ``PERC(expr)`` - Equivalent to ``PERCENTILE(expr, )`` -- ``P(expr)`` - Equivalent to ``PERCENTILE(expr, )`` - -Both integer and decimal percentiles from 0 to 100 are supported (e.g., ``PERC95``, ``P99.5``). - -Example:: - - ppl> source=accounts | stats perc99.5(age); - fetched rows / total rows = 1/1 - +---------------+ - | perc99.5(age) | - |---------------| - | 36 | - +---------------+ - - ppl> source=accounts | stats p50(age); - fetched rows / total rows = 1/1 - +---------+ - | p50(age) | - |---------| - | 32 | - +---------+ - -MEDIAN ------- - -Description ->>>>>>>>>>> - -Version: 3.3.0 - -Usage: MEDIAN(expr). Returns the median (50th percentile) value of `expr`. This is equivalent to ``PERCENTILE(expr, 50)``. - -Note: This function requires Calcite to be enabled (see `Configuration`_ section above). - -Example:: - - os> source=accounts | stats median(age); - fetched rows / total rows = 1/1 - +-------------+ - | median(age) | - |-------------| - | 33 | - +-------------+ - -EARLIEST --------- - -Description ->>>>>>>>>>> - -Version: 3.3.0 - -Usage: EARLIEST(field [, time_field]). Return the earliest value of a field based on timestamp ordering. - -* field: mandatory. The field to return the earliest value for. -* time_field: optional. The field to use for time-based ordering. Defaults to @timestamp if not specified. - -Note: This function requires Calcite to be enabled (see `Configuration`_ section above). - -Example:: - - os> source=events | stats earliest(message) by host | sort host; - fetched rows / total rows = 2/2 - +-------------------+---------+ - | earliest(message) | host | - |-------------------+---------| - | Starting up | server1 | - | Initializing | server2 | - +-------------------+---------+ - -Example with custom time field:: - - os> source=events | stats earliest(status, event_time) by category | sort category; - fetched rows / total rows = 2/2 - +------------------------------+----------+ - | earliest(status, event_time) | category | - |------------------------------+----------| - | pending | orders | - | active | users | - +------------------------------+----------+ - -LATEST ------- - -Description ->>>>>>>>>>> - -Version: 3.3.0 - -Usage: LATEST(field [, time_field]). Return the latest value of a field based on timestamp ordering. - -* field: mandatory. The field to return the latest value for. -* time_field: optional. The field to use for time-based ordering. Defaults to @timestamp if not specified. - -Note: This function requires Calcite to be enabled (see `Configuration`_ section above). - -Example:: - - os> source=events | stats latest(message) by host | sort host; - fetched rows / total rows = 2/2 - +------------------+---------+ - | latest(message) | host | - |------------------+---------| - | Shutting down | server1 | - | Maintenance mode | server2 | - +------------------+---------+ - -Example with custom time field:: - - os> source=events | stats latest(status, event_time) by category | sort category; - fetched rows / total rows = 2/2 - +----------------------------+----------+ - | latest(status, event_time) | category | - |----------------------------+----------| - | cancelled | orders | - | inactive | users | - +----------------------------+----------+ - -FIRST ------ - -Description ->>>>>>>>>>> - -Version: 3.3.0 - -Usage: FIRST(field). Return the first non-null value of a field based on natural document order. Returns NULL if no records exist, or if all records have NULL values for the field. - -* field: mandatory. The field to return the first value for. - -Note: This function requires Calcite to be enabled (see `Configuration`_ section above). - -Example:: - - os> source=accounts | stats first(firstname) by gender; - fetched rows / total rows = 2/2 - +------------------+--------+ - | first(firstname) | gender | - |------------------+--------| - | Nanette | F | - | Amber | M | - +------------------+--------+ - -Example with count aggregation:: - - os> source=accounts | stats first(firstname), count() by gender; - fetched rows / total rows = 2/2 - +------------------+---------+--------+ - | first(firstname) | count() | gender | - |------------------+---------+--------| - | Nanette | 1 | F | - | Amber | 3 | M | - +------------------+---------+--------+ - -LAST ----- - -Description ->>>>>>>>>>> - -Version: 3.3.0 - -Usage: LAST(field). Return the last non-null value of a field based on natural document order. Returns NULL if no records exist, or if all records have NULL values for the field. - -* field: mandatory. The field to return the last value for. - -Note: This function requires Calcite to be enabled (see `Configuration`_ section above). - -Example:: - - os> source=accounts | stats last(firstname) by gender; - fetched rows / total rows = 2/2 - +-----------------+--------+ - | last(firstname) | gender | - |-----------------+--------| - | Nanette | F | - | Dale | M | - +-----------------+--------+ - -Example with different fields:: - - os> source=accounts | stats first(account_number), last(balance), first(age); - fetched rows / total rows = 1/1 - +-----------------------+---------------+------------+ - | first(account_number) | last(balance) | first(age) | - |-----------------------+---------------+------------| - | 1 | 4180 | 32 | - +-----------------------+---------------+------------+ - -LIST ----- - -Description ->>>>>>>>>>> - -Version: 3.3.0 (Calcite engine only) - -Usage: LIST(expr). Collects all values from the specified expression into an array. Values are converted to strings, nulls are filtered, and duplicates are preserved. -The function returns up to 100 values with no guaranteed ordering. - -* expr: The field expression to collect values from. -* This aggregation function doesn't support Array, Struct, Object field types. - -Example with string fields:: - - PPL> source=accounts | stats list(firstname); - fetched rows / total rows = 1/1 - +-------------------------------------+ - | list(firstname) | - |-------------------------------------|` - | ["Amber","Hattie","Nanette","Dale"] | - +-------------------------------------+ - -Example with result field rename:: - - PPL> source=accounts | stats list(firstname) as names; - fetched rows / total rows = 1/1 - +-------------------------------------+ - | names | - |-------------------------------------| - | ["Amber","Hattie","Nanette","Dale"] | - +-------------------------------------+ - -VALUES ------- - -Description ->>>>>>>>>>> - -Version: 3.3.0 (Calcite engine only) - -Usage: VALUES(expr). Collects all unique values from the specified expression into a sorted array. Values are converted to strings, nulls are filtered, and duplicates are removed. - -The maximum number of unique values returned is controlled by the ``plugins.ppl.values.max.limit`` setting: - -* Default value is 0, which means unlimited values are returned -* Can be configured to any positive integer to limit the number of unique values -* See the `PPL Settings <../admin/settings.rst#plugins-ppl-values-max-limit>`_ documentation for more details - -Example with string fields:: - - PPL> source=accounts | stats values(firstname); - fetched rows / total rows = 1/1 - +-------------------------------------+ - | values(firstname) | - |-------------------------------------| - | ["Amber","Dale","Hattie","Nanette"] | - +-------------------------------------+ - -Example with numeric fields (sorted as strings):: - - PPL> source=accounts | stats values(age); - fetched rows / total rows = 1/1 - +---------------------------+ - | values(age) | - |---------------------------| - | ["28","32","33","36","39"] | - +---------------------------+ - -Example with result field rename:: - - PPL> source=accounts | stats values(firstname) as unique_names; - fetched rows / total rows = 1/1 - +-------------------------------------+ - | unique_names | - |-------------------------------------| - | ["Amber","Dale","Hattie","Nanette"] | - +-------------------------------------+ +* aggregation: mandatory. An aggregation function. +* bucket_nullable: optional. Controls whether the stats command includes null buckets in group-by aggregations. When set to ``false``, the aggregation ignores records where the group-by field is null, resulting in faster performance by excluding null bucket. **Default:** determined by ``plugins.ppl.syntax.legacy.preferred``: + * When ``plugins.ppl.syntax.legacy.preferred=true``, ``bucket_nullable`` defaults to ``true`` + * When ``plugins.ppl.syntax.legacy.preferred=false``, ``bucket_nullable`` defaults to ``false`` +* by-clause: optional. Groups results by specified fields or expressions. Syntax: by [span-expression,] [field,]... **Default:** If no by-clause is specified, the stats command returns only one row, which is the aggregation over the entire result set. +* span-expression: optional, at most one. Splits field into buckets by intervals. Syntax: span([field_expr,] interval_expr). The unit of the interval expression is the natural unit by default. If ``field_expr`` is omitted, span will use the implicit ``@timestamp`` field. An error will be thrown if this field doesn't exist. If the field is a date/time type field, the aggregation results always ignore null bucket. For example, ``span(age, 10)`` creates 10-year age buckets, ``span(timestamp, 1h)`` creates hourly buckets. + * Available time units: + * millisecond (ms) + * second (s) + * minute (m, case sensitive) + * hour (h) + * day (d) + * week (w) + * month (M, case sensitive) + * quarter (q) + * year (y) Example 1: Calculate the count of events ======================================== -The example show calculate the count of events in the accounts. +This example shows calculating the count of events in the accounts. PPL query:: @@ -671,7 +55,7 @@ PPL query:: Example 2: Calculate the average of a field =========================================== -The example show calculate the average age of all the accounts. +This example shows calculating the average age of all the accounts. PPL query:: @@ -687,7 +71,7 @@ PPL query:: Example 3: Calculate the average of a field by group ==================================================== -The example show calculate the average age of all the accounts group by gender. +This example shows calculating the average age of all the accounts group by gender. PPL query:: @@ -704,7 +88,7 @@ PPL query:: Example 4: Calculate the average, sum and count of a field by group =================================================================== -The example show calculate the average age, sum age and count of events of all the accounts group by gender. +This example shows calculating the average age, sum age and count of events of all the accounts group by gender. PPL query:: @@ -830,7 +214,7 @@ PPL query:: Example 11: Calculate the percentile of a field =============================================== -The example show calculate the percentile 90th age of all the accounts. +This example shows calculating the percentile 90th age of all the accounts. PPL query:: @@ -846,7 +230,7 @@ PPL query:: Example 12: Calculate the percentile of a field by group ======================================================== -The example show calculate the percentile 90th age of all the accounts group by gender. +This example shows calculating the percentile 90th age of all the accounts group by gender. PPL query:: @@ -894,7 +278,6 @@ PPL query:: Example 15: Ignore null bucket ============================== -Note: This argument requires version 3.3.0 or above. PPL query:: diff --git a/docs/user/ppl/cmd/subquery.rst b/docs/user/ppl/cmd/subquery.rst index f7883202c70..48491db22e2 100644 --- a/docs/user/ppl/cmd/subquery.rst +++ b/docs/user/ppl/cmd/subquery.rst @@ -1,6 +1,6 @@ -============= -subquery (aka subsearch) -============= +======== +subquery +======== .. rubric:: Table of contents @@ -10,66 +10,45 @@ subquery (aka subsearch) Description -============ -| (Experimental) -| (From 3.0.0) -| The subquery (aka subsearch) commands contain 4 types: ``InSubquery``, ``ExistsSubquery``, ``ScalarSubquery`` and ``RelationSubquery``. The first three are expressions, they are used in WHERE clause (``where ``) and search filter(``search source=* ``). ``RelationSubquery`` is not an expression, it is a statement. +=========== +| The ``subquery`` command allows you to embed one PPL query inside another, enabling complex filtering and data retrieval operations. A subquery is a nested query that executes first and returns results that are used by the outer query for filtering, comparison, or joining operations. -Version -======= -3.0.0 +| Subqueries are useful for: + +1. Filtering data based on results from another query +2. Checking for the existence of related data +3. Performing calculations that depend on aggregated values from other tables +4. Creating complex joins with dynamic conditions Syntax ====== -Subquery (aka subsearch) has the same syntax with search command, except that it must be enclosed in square brackets. +subquery: [ source=... | ... | ... ] -InSubquery:: +Subqueries use the same syntax as regular PPL queries but must be enclosed in square brackets. There are four main types of subqueries: + +**IN Subquery** +Tests whether a field value exists in the results of a subquery:: where [not] in [ source=... | ... | ... ] -ExistsSubquery:: +**EXISTS Subquery** +Tests whether a subquery returns any results:: where [not] exists [ source=... | ... | ... ] -ScalarSubquery:: +**Scalar Subquery** +Returns a single value that can be used in comparisons or calculations:: where = [ source=... | ... | ... ] -RelationSubquery:: +**Relation Subquery** +Used in join operations to provide dynamic right-side data:: | join ON condition [ source=... | ... | ... ] - Configuration ============= -plugins.calcite.enabled ------------------------ - -This command requires Calcite enabled. In 3.0.0-beta, as an experimental the Calcite configuration is disabled by default. - -Enable Calcite:: - - >> curl -H 'Content-Type: application/json' -X PUT localhost:9200/_plugins/_query/settings -d '{ - "transient" : { - "plugins.calcite.enabled" : true - } - }' - -Result set:: - - { - "acknowledged": true, - "persistent": { - "plugins": { - "calcite": { - "enabled": "true" - } - } - }, - "transient": {} - } - plugins.ppl.subsearch.maxout ---------------------------- @@ -94,7 +73,6 @@ Change the subsearch.maxout to unlimited:: "transient": {} } - Usage ===== @@ -162,11 +140,11 @@ RelationSubquery:: source = table1 | join left = l right = r on condition [ source = table2 | where d > 10 | head 5 ] //subquery in join right side source = [ source = table1 | join left = l right = r [ source = table2 | where d > 10 | head 5 ] | stats count(a) by b ] as outer | head 1 - - Example 1: TPC-H q20 ==================== +This example shows a complex TPC-H query 20 implementation using nested subqueries. + PPL query:: >> curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl -d '{ @@ -199,6 +177,8 @@ PPL query:: Example 2: TPC-H q22 ==================== +This example shows a TPC-H query 22 implementation using EXISTS and scalar subqueries. + PPL query:: >> curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl -d '{ diff --git a/docs/user/ppl/cmd/syntax.rst b/docs/user/ppl/cmd/syntax.rst index 45ffea8ff67..c15aad68e15 100644 --- a/docs/user/ppl/cmd/syntax.rst +++ b/docs/user/ppl/cmd/syntax.rst @@ -1,6 +1,6 @@ -============= +====== Syntax -============= +====== .. rubric:: Table of contents diff --git a/docs/user/ppl/cmd/table.rst b/docs/user/ppl/cmd/table.rst index f8ecbb11f18..3512a648a1c 100644 --- a/docs/user/ppl/cmd/table.rst +++ b/docs/user/ppl/cmd/table.rst @@ -1,6 +1,6 @@ -============= +===== table -============= +===== .. rubric:: Table of contents @@ -10,26 +10,20 @@ table Description -============ +=========== The ``table`` command is an alias for the `fields `_ command and provides the same field selection capabilities. It allows you to keep or remove fields from the search result using enhanced syntax options. -Note: The ``table`` command requires the Calcite to be enabled. All enhanced field features are available through this command. For detailed examples and documentation of all enhanced features, see the `fields command documentation `_. - -Version -======= -3.3.0 - Syntax -============ +====== table [+|-] -* index: optional. if the plus (+) is used, only the fields specified in the field list will be keep. if the minus (-) is used, all the fields specified in the field list will be removed. **Default** + -* field list: mandatory. Fields can be specified using various enhanced syntax options. +* [+|-]: optional. If the plus (+) is used, only the fields specified in the field list will be kept. If the minus (-) is used, all the fields specified in the field list will be removed. **Default:** +. +* field-list: mandatory. Comma-delimited or space-delimited list of fields to keep or remove. Supports wildcard patterns. Example 1: Basic table command usage -------------------------------------- +==================================== -The ``table`` command works identically to the ``fields`` command. This example shows basic field selection. +This example shows basic field selection using the table command. PPL query:: @@ -44,20 +38,7 @@ PPL query:: | Dale | Adams | 33 | +-----------+----------+-----+ -Enhanced Features -================= - -The ``table`` command supports all enhanced features available in the ``fields`` command, including: - -- Space-delimited syntax -- Wildcard pattern matching (prefix, suffix, contains) -- Mixed delimiters -- Field deduplication -- Full wildcard selection -- Wildcard exclusion -Requirements -============ -- **Calcite Engine**: The ``table`` command requires the Calcite engine to be enabled -- **Feature Parity**: All enhanced features available in ``fields`` are also available in ``table`` -- **Error Handling**: Attempting to use the ``table`` command without Calcite will result in an ``UnsupportedOperationException`` \ No newline at end of file +See Also +======== +- `fields `_ - Alias command with identical functionality \ No newline at end of file diff --git a/docs/user/ppl/cmd/timechart.rst b/docs/user/ppl/cmd/timechart.rst index 512fa76370c..c5184d1ce5f 100644 --- a/docs/user/ppl/cmd/timechart.rst +++ b/docs/user/ppl/cmd/timechart.rst @@ -1,6 +1,6 @@ -============= +========= timechart -============= +========= .. rubric:: Table of contents @@ -10,61 +10,39 @@ timechart Description -============ +=========== | The ``timechart`` command creates a time-based aggregation of data. It groups data by time intervals and optionally by a field, then applies an aggregation function to each group. The results are returned in an unpivoted format with separate rows for each time-field combination. -Version -======= -3.3.0 - Syntax -============ - -.. code-block:: text - - timechart [span=] [limit=] [useother=] [by ] - -**Parameters:** - -* **span**: optional. Specifies the time interval for grouping data. - - * Default: 1m (1 minute) - * Available time units: - - * millisecond (ms) - * second (s) - * minute (m, case sensitive) - * hour (h) - * day (d) - * week (w) - * month (M, case sensitive) - * quarter (q) - * year (y) - -* **limit**: optional. Specifies the maximum number of distinct values to display when using the "by" clause. - - * Default: 10 - * When there are more distinct values than the limit, the additional values are grouped into an "OTHER" category if useother is not set to false. - * The "most distinct" values are determined by calculating the sum of the aggregation values across all time intervals for each distinct field value. The top N values with the highest sums are displayed individually, while the rest are grouped into the "OTHER" category. - * Set to 0 to show all distinct values without any limit (when limit=0, useother is automatically set to false). - * The parameters can be specified in any order before the aggregation function. - * Only applies when using the "by" clause to group results. - -* **useother**: optional. Controls whether to create an "OTHER" category for values beyond the limit. - - * Default: true - * When set to false, only the top N values (based on limit) are shown without an "OTHER" column. - * When set to true, values beyond the limit are grouped into an "OTHER" category. - * Only applies when using the "by" clause and when there are more distinct values than the limit. - -* **by**: optional. Groups the results by the specified field in addition to time intervals. - - * If not specified, the aggregation is performed across all documents in each time interval. - -* **aggregation_function**: mandatory. The aggregation function to apply to each time bucket. - - * Currently, only a single aggregation function is supported. - * Available functions: All aggregation functions supported by the :doc:`stats ` command, as well as the timechart-specific aggregations listed below. +====== + +timechart [span=] [limit=] [useother=] [by ] + +* span: optional. Specifies the time interval for grouping data. **Default:** 1m (1 minute). + * Available time units: + * millisecond (ms) + * second (s) + * minute (m, case sensitive) + * hour (h) + * day (d) + * week (w) + * month (M, case sensitive) + * quarter (q) + * year (y) +* limit: optional. Specifies the maximum number of distinct values to display when using the "by" clause. **Default:** 10. + * When there are more distinct values than the limit, the additional values are grouped into an "OTHER" category if useother is not set to false. + * The "most distinct" values are determined by calculating the sum of the aggregation values across all time intervals for each distinct field value. The top N values with the highest sums are displayed individually, while the rest are grouped into the "OTHER" category. + * Set to 0 to show all distinct values without any limit (when limit=0, useother is automatically set to false). + * The parameters can be specified in any order before the aggregation function. + * Only applies when using the "by" clause to group results. +* useother: optional. Controls whether to create an "OTHER" category for values beyond the limit. **Default:** true. + * When set to false, only the top N values (based on limit) are shown without an "OTHER" column. + * When set to true, values beyond the limit are grouped into an "OTHER" category. + * Only applies when using the "by" clause and when there are more distinct values than the limit. +* aggregation_function: mandatory. The aggregation function to apply to each time bucket. + * Currently, only a single aggregation function is supported. + * Available functions: All aggregation functions supported by the :doc:`stats ` command, as well as the timechart-specific aggregations listed below. +* by: optional. Groups the results by the specified field in addition to time intervals. If not specified, the aggregation is performed across all documents in each time interval. PER_SECOND ---------- @@ -114,15 +92,6 @@ Notes * **Null values**: Documents with null values in the "by" field are treated as a separate category and appear as null in the results. -Limitations -============ -* Only a single aggregation function is supported per timechart command. -* The ``bins`` parameter and other bin options are not supported since the ``bin`` command is not implemented yet. Use the ``span`` parameter to control time intervals. - - -Examples -======== - Example 1: Count events by hour =============================== @@ -385,3 +354,9 @@ PPL query:: | 2023-01-01 10:30:00 | server1 | 0.1 | | 2023-01-01 10:30:00 | server2 | 0.05 | +---------------------+---------+---------------------+ + +Limitations +=========== +* Only a single aggregation function is supported per timechart command. +* The ``bins`` parameter and other bin options are not supported since the ``bin`` command is not implemented yet. Use the ``span`` parameter to control time intervals. + diff --git a/docs/user/ppl/cmd/top.rst b/docs/user/ppl/cmd/top.rst index 5f4bfb9b4b6..929a52a163c 100644 --- a/docs/user/ppl/cmd/top.rst +++ b/docs/user/ppl/cmd/top.rst @@ -1,6 +1,6 @@ -============= +=== top -============= +=== .. rubric:: Table of contents @@ -10,28 +10,24 @@ top Description -============ -| Using ``top`` command to find the most common tuple of values of all fields in the field list. - +=========== +| The ``top`` command finds the most common tuple of values of all fields in the field list. Syntax -============ -top [N] [by-clause] - -top [N] [top-options] [by-clause] ``(available from 3.1.0+)`` +====== +top [N] [top-options] [by-clause] -* N: number of results to return. **Default**: 10 +* N: optional. number of results to return. **Default**: 10 +* top-options: optional. options for the top command. Supported syntax is [countfield=] [showcount=]. + * showcount=: optional. whether to create a field in output that represent a count of the tuple of values. **Default:** true. + * countfield=: optional. the name of the field that contains count. **Default:** 'count'. * field-list: mandatory. comma-delimited list of field names. * by-clause: optional. one or more fields to group the results by. -* top-options: optional. options for the top command. Supported syntax is [countfield=] [showcount=]. -* showcount=: optional. whether to create a field in output that represent a count of the tuple of values. Default value is ``true``. -* countfield=: optional. the name of the field that contains count. Default value is ``'count'``. - Example 1: Find the most common values in a field -=========================================== +================================================= -The example finds most common gender of all the accounts. +This example finds the most common gender of all the accounts. PPL query:: @@ -44,10 +40,10 @@ PPL query:: | F | +--------+ -Example 2: Find the most common values in a field -=========================================== +Example 2: Limit results to top N values +======================================== -The example finds most common gender of all the accounts. +This example finds the most common gender and limits results to 1 value. PPL query:: @@ -59,10 +55,10 @@ PPL query:: | M | +--------+ -Example 2: Find the most common values organized by gender -==================================================== +Example 3: Find the most common values grouped by field +======================================================= -The example finds most common age of all the accounts group by gender. +This example finds the most common age of all the accounts grouped by gender. PPL query:: @@ -75,10 +71,10 @@ PPL query:: | M | 32 | +--------+-----+ -Example 3: Top command with Calcite enabled -=========================================== +Example 4: Top command with count field +======================================= -The example finds most common gender of all the accounts when ``plugins.calcite.enabled`` is true. +This example finds the most common gender of all the accounts and includes the count. PPL query:: @@ -92,10 +88,10 @@ PPL query:: +--------+-------+ -Example 4: Specify the count field option +Example 5: Specify the count field option ========================================= -The example specifies the count field when ``plugins.calcite.enabled`` is true. +This example specifies a custom name for the count field. PPL query:: diff --git a/docs/user/ppl/cmd/trendline.rst b/docs/user/ppl/cmd/trendline.rst index d7fb6544ae6..e2fd067d262 100644 --- a/docs/user/ppl/cmd/trendline.rst +++ b/docs/user/ppl/cmd/trendline.rst @@ -1,6 +1,6 @@ -============= +========= trendline -============= +========= .. rubric:: Table of contents @@ -10,41 +10,25 @@ trendline Description -============ -| Using ``trendline`` command to calculate moving averages of fields. +=========== +| The ``trendline`` command calculates moving averages of fields. Syntax -============ -`TRENDLINE [sort <[+|-] sort-field>] [SMA|WMA](number-of-datapoints, field) [AS alias] [[SMA|WMA](number-of-datapoints, field) [AS alias]]...` +====== +trendline [sort <[+|-] sort-field>] [sma|wma](number-of-datapoints, field) [as ] [[sma|wma](number-of-datapoints, field) [as ]]... * [+|-]: optional. The plus [+] stands for ascending order and NULL/MISSING first and a minus [-] stands for descending order and NULL/MISSING last. **Default:** ascending order and NULL/MISSING first. * sort-field: mandatory when sorting is used. The field used to sort. +* sma|wma: mandatory. Simple Moving Average (sma) applies equal weighting to all values, Weighted Moving Average (wma) applies greater weight to more recent values. * number-of-datapoints: mandatory. The number of datapoints to calculate the moving average (must be greater than zero). * field: mandatory. The name of the field the moving average should be calculated for. -* alias: optional. The name of the resulting column containing the moving average (defaults to the field name with "_trendline"). - -Starting with version 3.1.0, two trendline algorithms are supported, aka Simple Moving Average (SMA) and Weighted Moving Average (WMA). - -Suppose: - -* f[i]: The value of field 'f' in the i-th data-point -* n: The number of data-points in the moving window (period) -* t: The current time index - -SMA is calculated like - - SMA(t) = (1/n) * Σ(f[i]), where i = t-n+1 to t - -WMA places more weights on recent values compared to equal-weighted SMA algorithm - - WMA(t) = (1/(1 + 2 + ... + n)) * Σ(1 * f[i-n+1] + 2 * f[t-n+2] + ... + n * f[t]) - = (2/(n * (n + 1))) * Σ((i - t + n) * f[i]), where i = t-n+1 to t +* alias: optional. The name of the resulting column containing the moving average. **Default:** field name with "_trendline". Example 1: Calculate the simple moving average on one field. -===================================================== +============================================================ -The example shows how to calculate the simple moving average on one field. +This example shows how to calculate the simple moving average on one field. PPL query:: @@ -61,9 +45,9 @@ PPL query:: Example 2: Calculate the simple moving average on multiple fields. -=========================================================== +================================================================== -The example shows how to calculate the simple moving average on multiple fields. +This example shows how to calculate the simple moving average on multiple fields. PPL query:: @@ -79,9 +63,9 @@ PPL query:: +------+-----------+ Example 3: Calculate the simple moving average on one field without specifying an alias. -================================================================================= +======================================================================================== -The example shows how to calculate the simple moving average on one field. +This example shows how to calculate the simple moving average on one field. PPL query:: @@ -97,25 +81,9 @@ PPL query:: +--------------------------+ Example 4: Calculate the weighted moving average on one field. -================================================================================= - -Version -------- -3.1.0 - -Configuration -------------- -wma algorithm requires Calcite enabled. - -Enable Calcite: - - >> curl -H 'Content-Type: application/json' -X PUT localhost:9200/_plugins/_query/settings -d '{ - "persistent" : { - "plugins.calcite.enabled" : true - } - }' +============================================================== -The example shows how to calculate the weighted moving average on one field. +This example shows how to calculate the weighted moving average on one field. PPL query:: @@ -132,4 +100,4 @@ PPL query:: Limitations =========== -Starting with version 3.1.0, the ``trendline`` command requires all values in the specified ``field`` to be non-null. Any rows with null values present in the calculation field will be automatically excluded from the command's output. \ No newline at end of file +The ``trendline`` command requires all values in the specified ``field`` to be non-null. Any rows with null values present in the calculation field will be automatically excluded from the command's output. \ No newline at end of file diff --git a/docs/user/ppl/cmd/where.rst b/docs/user/ppl/cmd/where.rst index 9bdb8a75aa3..324af4dcb54 100644 --- a/docs/user/ppl/cmd/where.rst +++ b/docs/user/ppl/cmd/where.rst @@ -1,6 +1,6 @@ -============= +===== where -============= +===== .. rubric:: Table of contents @@ -11,22 +11,18 @@ where Description =========== -| The ``where`` command bool-expression to filter the search result. The ``where`` command only return the result when bool-expression evaluated to true. - +| The ``where`` command filters the search result. The ``where`` command only returns the result when the bool-expression evaluates to true. Syntax ====== where -* bool-expression: optional. any expression which could be evaluated to boolean value. - -Examples -======== +* bool-expression: optional. Any expression which could be evaluated to boolean value. Example 1: Filter result set with condition --------------------------------------------- +=========================================== -The example show fetch all the document from accounts index with . +This example shows fetching all the documents from the accounts index where account_number is 1 or gender is "F". PPL query:: @@ -40,7 +36,7 @@ PPL query:: +----------------+--------+ Example 2: Basic Field Comparison ----------------------------------- +================================= The example shows how to filter accounts with balance greater than 30000. @@ -56,7 +52,7 @@ PPL query:: +----------------+---------+ Example 3: Pattern Matching with LIKE --------------------------------------- +===================================== Pattern Matching with Underscore (_) @@ -87,7 +83,7 @@ PPL query:: +----------------+-------+ Example 4: Multiple Conditions -------------------------------- +============================== The example shows how to combine multiple conditions using AND operator. @@ -104,7 +100,7 @@ PPL query:: +----------------+-----+--------+ Example 5: Using IN Operator ------------------------------ +============================ The example demonstrates using IN operator to match multiple values. @@ -120,7 +116,7 @@ PPL query:: +----------------+-------+ Example 6: NULL Checks ----------------------- +====================== The example shows how to filter records with NULL values. @@ -135,7 +131,7 @@ PPL query:: +----------------+----------+ Example 7: Complex Conditions ------------------------------- +============================= The example demonstrates combining multiple conditions with parentheses and logical operators. @@ -150,7 +146,7 @@ PPL query:: +----------------+---------+-----+--------+ Example 8: NOT Conditions --------------------------- +========================= The example shows how to use NOT operator to exclude matching records. diff --git a/docs/user/ppl/functions/aggregations.rst b/docs/user/ppl/functions/aggregations.rst new file mode 100644 index 00000000000..6605bda0765 --- /dev/null +++ b/docs/user/ppl/functions/aggregations.rst @@ -0,0 +1,522 @@ +===================== +Aggregation Functions +===================== + +.. rubric:: Table of contents + +.. contents:: + :local: + :depth: 2 + + +Description +============ +| Aggregation functions perform calculations across multiple rows to return a single result value. These functions are used with ``stats`` and ``eventstats`` commands to analyze and summarize data. + +| The following table shows how NULL/MISSING values are handled by aggregation functions: + ++----------+-------------+-------------+ +| Function | NULL | MISSING | ++----------+-------------+-------------+ +| COUNT | Not counted | Not counted | ++----------+-------------+-------------+ +| SUM | Ignore | Ignore | ++----------+-------------+-------------+ +| AVG | Ignore | Ignore | ++----------+-------------+-------------+ +| MAX | Ignore | Ignore | ++----------+-------------+-------------+ +| MIN | Ignore | Ignore | ++----------+-------------+-------------+ +| FIRST | Ignore | Ignore | ++----------+-------------+-------------+ +| LAST | Ignore | Ignore | ++----------+-------------+-------------+ +| LIST | Ignore | Ignore | ++----------+-------------+-------------+ +| VALUES | Ignore | Ignore | ++----------+-------------+-------------+ + +Functions +========= + +COUNT +----- + +Description +>>>>>>>>>>> + +Usage: Returns a count of the number of expr in the rows retrieved. The ``C()`` function, ``c``, and ``count`` can be used as abbreviations for ``COUNT()``. To perform a filtered counting, wrap the condition to satisfy in an `eval` expression. + +Example:: + + os> source=accounts | stats count(), c(), count, c; + fetched rows / total rows = 1/1 + +---------+-----+-------+---+ + | count() | c() | count | c | + |---------+-----+-------+---| + | 4 | 4 | 4 | 4 | + +---------+-----+-------+---+ + +Example of filtered counting:: + + os> source=accounts | stats count(eval(age > 30)) as mature_users; + fetched rows / total rows = 1/1 + +--------------+ + | mature_users | + |--------------| + | 3 | + +--------------+ + +SUM +--- + +Description +>>>>>>>>>>> + +Usage: SUM(expr). Returns the sum of expr. + +Example:: + + os> source=accounts | stats sum(age) by gender; + fetched rows / total rows = 2/2 + +----------+--------+ + | sum(age) | gender | + |----------+--------| + | 28 | F | + | 101 | M | + +----------+--------+ + +AVG +--- + +Description +>>>>>>>>>>> + +Usage: AVG(expr). Returns the average value of expr. + +Example:: + + os> source=accounts | stats avg(age) by gender; + fetched rows / total rows = 2/2 + +--------------------+--------+ + | avg(age) | gender | + |--------------------+--------| + | 28.0 | F | + | 33.666666666666664 | M | + +--------------------+--------+ + +MAX +--- + +Description +>>>>>>>>>>> + +Usage: MAX(expr). Returns the maximum value of expr. + +For non-numeric fields, values are sorted lexicographically. + +Example:: + + os> source=accounts | stats max(age); + fetched rows / total rows = 1/1 + +----------+ + | max(age) | + |----------| + | 36 | + +----------+ + +Example with text field:: + + os> source=accounts | stats max(firstname); + fetched rows / total rows = 1/1 + +----------------+ + | max(firstname) | + |----------------| + | Nanette | + +----------------+ + +MIN +--- + +Description +>>>>>>>>>>> + +Usage: MIN(expr). Returns the minimum value of expr. + +For non-numeric fields, values are sorted lexicographically. + +Example:: + + os> source=accounts | stats min(age); + fetched rows / total rows = 1/1 + +----------+ + | min(age) | + |----------| + | 28 | + +----------+ + +Example with text field:: + + os> source=accounts | stats min(firstname); + fetched rows / total rows = 1/1 + +----------------+ + | min(firstname) | + |----------------| + | Amber | + +----------------+ + +VAR_SAMP +-------- + +Description +>>>>>>>>>>> + +Usage: VAR_SAMP(expr). Returns the sample variance of expr. + +Example:: + + os> source=accounts | stats var_samp(age); + fetched rows / total rows = 1/1 + +--------------------+ + | var_samp(age) | + |--------------------| + | 10.916666666666666 | + +--------------------+ + +VAR_POP +------- + +Description +>>>>>>>>>>> + +Usage: VAR_POP(expr). Returns the population standard variance of expr. + +Example:: + + os> source=accounts | stats var_pop(age); + fetched rows / total rows = 1/1 + +--------------+ + | var_pop(age) | + |--------------| + | 8.1875 | + +--------------+ + +STDDEV_SAMP +----------- + +Description +>>>>>>>>>>> + +Usage: STDDEV_SAMP(expr). Return the sample standard deviation of expr. + +Example:: + + os> source=accounts | stats stddev_samp(age); + fetched rows / total rows = 1/1 + +-------------------+ + | stddev_samp(age) | + |-------------------| + | 3.304037933599835 | + +-------------------+ + +STDDEV_POP +---------- + +Description +>>>>>>>>>>> + +Usage: STDDEV_POP(expr). Return the population standard deviation of expr. + +Example:: + + os> source=accounts | stats stddev_pop(age); + fetched rows / total rows = 1/1 + +--------------------+ + | stddev_pop(age) | + |--------------------| + | 2.8613807855648994 | + +--------------------+ + +DISTINCT_COUNT, DC +------------------ + +Description +>>>>>>>>>>> + +Usage: DISTINCT_COUNT(expr), DC(expr). Returns the approximate number of distinct values using the HyperLogLog++ algorithm. Both functions are equivalent. + +For details on algorithm accuracy and precision control, see the `OpenSearch Cardinality Aggregation documentation `_. + +Example:: + + os> source=accounts | stats dc(state) as distinct_states, distinct_count(state) as dc_states_alt by gender; + fetched rows / total rows = 4/4 + +-----------------+---------------+--------+ + | distinct_states | dc_states_alt | gender | + |-----------------+---------------+--------| + | 3 | 3 | M | + | 1 | 1 | F | + +-----------------+---------------+--------| + +DISTINCT_COUNT_APPROX +--------------------- + +Description +>>>>>>>>>>> + +Usage: DISTINCT_COUNT_APPROX(expr). Return the approximate distinct count value of the expr, using the hyperloglog++ algorithm. + +Example:: + + PPL> source=accounts | stats distinct_count_approx(gender); + fetched rows / total rows = 1/1 + +-------------------------------+ + | distinct_count_approx(gender) | + |-------------------------------| + | 2 | + +-------------------------------+ + +EARLIEST +-------- + +Description +>>>>>>>>>>> + +Usage: EARLIEST(field [, time_field]). Return the earliest value of a field based on timestamp ordering. + +* field: mandatory. The field to return the earliest value for. +* time_field: optional. The field to use for time-based ordering. Defaults to @timestamp if not specified. + +Example:: + + os> source=events | stats earliest(message) by host | sort host; + fetched rows / total rows = 2/2 + +-------------------+---------+ + | earliest(message) | host | + |-------------------+---------| + | Starting up | server1 | + | Initializing | server2 | + +-------------------+---------+ + +Example with custom time field:: + + os> source=events | stats earliest(status, event_time) by category | sort category; + fetched rows / total rows = 2/2 + +------------------------------+----------+ + | earliest(status, event_time) | category | + |------------------------------+----------| + | pending | orders | + | active | users | + +------------------------------+----------+ + +LATEST +------ + +Description +>>>>>>>>>>> + +Usage: LATEST(field [, time_field]). Return the latest value of a field based on timestamp ordering. + +* field: mandatory. The field to return the latest value for. +* time_field: optional. The field to use for time-based ordering. Defaults to @timestamp if not specified. + +Example:: + + os> source=events | stats latest(message) by host | sort host; + fetched rows / total rows = 2/2 + +------------------+---------+ + | latest(message) | host | + |------------------+---------| + | Shutting down | server1 | + | Maintenance mode | server2 | + +------------------+---------+ + +Example with custom time field:: + + os> source=events | stats latest(status, event_time) by category | sort category; + fetched rows / total rows = 2/2 + +----------------------------+----------+ + | latest(status, event_time) | category | + |----------------------------+----------| + | cancelled | orders | + | inactive | users | + +----------------------------+----------+ + +TAKE +---- + +Description +>>>>>>>>>>> + +Usage: TAKE(field [, size]). Return original values of a field. It does not guarantee on the order of values. + +* field: mandatory. The field must be a text field. +* size: optional integer. The number of values should be returned. Default is 10. + +Example:: + + os> source=accounts | stats take(firstname); + fetched rows / total rows = 1/1 + +-----------------------------+ + | take(firstname) | + |-----------------------------| + | [Amber,Hattie,Nanette,Dale] | + +-----------------------------+ + +PERCENTILE or PERCENTILE_APPROX +------------------------------- + +Description +>>>>>>>>>>> + +Usage: PERCENTILE(expr, percent) or PERCENTILE_APPROX(expr, percent). Return the approximate percentile value of expr at the specified percentage. + +* percent: The number must be a constant between 0 and 100. + +Note: From 3.1.0, the percentile implementation is switched to MergingDigest from AVLTreeDigest. Ref `issue link `_. + +Example:: + + os> source=accounts | stats percentile(age, 90) by gender; + fetched rows / total rows = 2/2 + +---------------------+--------+ + | percentile(age, 90) | gender | + |---------------------+--------| + | 28 | F | + | 36 | M | + +---------------------+--------+ + +Percentile Shortcut Functions +>>>>>>>>>>>>>>>>>>>>>>>>>>>>> + +For convenience, OpenSearch PPL provides shortcut functions for common percentiles: + +- ``PERC(expr)`` - Equivalent to ``PERCENTILE(expr, )`` +- ``P(expr)`` - Equivalent to ``PERCENTILE(expr, )`` + +Both integer and decimal percentiles from 0 to 100 are supported (e.g., ``PERC95``, ``P99.5``). + +Example:: + + ppl> source=accounts | stats perc99.5(age); + fetched rows / total rows = 1/1 + +---------------+ + | perc99.5(age) | + |---------------| + | 36 | + +---------------+ + + ppl> source=accounts | stats p50(age); + fetched rows / total rows = 1/1 + +---------+ + | p50(age) | + |---------| + | 32 | + +---------+ + +MEDIAN +------ + +Description +>>>>>>>>>>> + +Usage: MEDIAN(expr). Returns the median (50th percentile) value of `expr`. This is equivalent to ``PERCENTILE(expr, 50)``. + +Example:: + + os> source=accounts | stats median(age); + fetched rows / total rows = 1/1 + +-------------+ + | median(age) | + |-------------| + | 33 | + +-------------+ + +FIRST +----- + +Description +>>>>>>>>>>> + +Usage: FIRST(field). Return the first non-null value of a field based on natural document order. Returns NULL if no records exist, or if all records have NULL values for the field. + +* field: mandatory. The field to return the first value for. + +Example:: + + os> source=accounts | stats first(firstname) by gender; + fetched rows / total rows = 2/2 + +------------------+--------+ + | first(firstname) | gender | + |------------------+--------| + | Nanette | F | + | Amber | M | + +------------------+--------+ + +LAST +---- + +Description +>>>>>>>>>>> + +Usage: LAST(field). Return the last non-null value of a field based on natural document order. Returns NULL if no records exist, or if all records have NULL values for the field. + +* field: mandatory. The field to return the last value for. + +Example:: + + os> source=accounts | stats last(firstname) by gender; + fetched rows / total rows = 2/2 + +-----------------+--------+ + | last(firstname) | gender | + |-----------------+--------| + | Nanette | F | + | Dale | M | + +-----------------+--------+ + +LIST +---- + +Description +>>>>>>>>>>> + +Usage: LIST(expr). Collects all values from the specified expression into an array. Values are converted to strings, nulls are filtered, and duplicates are preserved. +The function returns up to 100 values with no guaranteed ordering. + +* expr: The field expression to collect values from. +* This aggregation function doesn't support Array, Struct, Object field types. + +Example with string fields:: + + PPL> source=accounts | stats list(firstname); + fetched rows / total rows = 1/1 + +-------------------------------------+ + | list(firstname) | + |-------------------------------------| + | ["Amber","Hattie","Nanette","Dale"] | + +-------------------------------------+ + +VALUES +------ + +Description +>>>>>>>>>>> + +Usage: VALUES(expr). Collects all unique values from the specified expression into a sorted array. Values are converted to strings, nulls are filtered, and duplicates are removed. + +The maximum number of unique values returned is controlled by the ``plugins.ppl.values.max.limit`` setting: + +* Default value is 0, which means unlimited values are returned +* Can be configured to any positive integer to limit the number of unique values +* See the `PPL Settings <../admin/settings.rst#plugins-ppl-values-max-limit>`_ documentation for more details + +Example with string fields:: + + PPL> source=accounts | stats values(firstname); + fetched rows / total rows = 1/1 + +-------------------------------------+ + | values(firstname) | + |-------------------------------------| + | ["Amber","Dale","Hattie","Nanette"] | + +-------------------------------------+ \ No newline at end of file diff --git a/docs/user/ppl/functions/condition.rst b/docs/user/ppl/functions/condition.rst index c4d52f74913..b4ede99a3a7 100644 --- a/docs/user/ppl/functions/condition.rst +++ b/docs/user/ppl/functions/condition.rst @@ -14,9 +14,9 @@ ISNULL Description >>>>>>>>>>> -Usage: isnull(field) return true if field is null. +Usage: isnull(field) returns true if field is null. -Argument type: all the supported data type. +Argument type: all the supported data types. Return type: BOOLEAN @@ -39,9 +39,9 @@ ISNOTNULL Description >>>>>>>>>>> -Usage: isnotnull(field) return true if field is not null. +Usage: isnotnull(field) returns true if field is not null. -Argument type: all the supported data type. +Argument type: all the supported data types. Return type: BOOLEAN @@ -60,7 +60,7 @@ Example:: EXISTS ------ -`Because OpenSearch doesn't differentiate null and missing `_. so we can't provide function like ismissing/isnotmissing to test field exist or not. But you can still use isnull/isnotnull for such purpose. +`Since OpenSearch doesn't differentiate null and missing `_, we can't provide functions like ismissing/isnotmissing to test if a field exists or not. But you can still use isnull/isnotnull for such purpose. Example, the account 13 doesn't have email field:: @@ -78,9 +78,9 @@ IFNULL Description >>>>>>>>>>> -Usage: ifnull(field1, field2) return field2 if field1 is null. +Usage: ifnull(field1, field2) returns field2 if field1 is null. -Argument type: all the supported data type, (NOTE : if two parameters has different type, you will fail semantic check.) +Argument type: all the supported data types (NOTE : if two parameters have different types, you will fail semantic check). Return type: any @@ -123,9 +123,9 @@ NULLIF Description >>>>>>>>>>> -Usage: nullif(field1, field2) return null if two parameters are same, otherwise return field1. +Usage: nullif(field1, field2) returns null if two parameters are same, otherwise returns field1. -Argument type: all the supported data type, (NOTE : if two parameters has different type, if two parameters has different type, you will fail semantic check) +Argument type: all the supported data types (NOTE : if two parameters have different types, you will fail semantic check). Return type: any @@ -149,9 +149,9 @@ ISNULL Description >>>>>>>>>>> -Usage: isnull(field1, field2) return null if two parameters are same, otherwise return field1. +Usage: isnull(field1, field2) returns null if two parameters are same, otherwise returns field1. -Argument type: all the supported data type +Argument type: all the supported data types. Return type: any @@ -174,9 +174,9 @@ IF Description >>>>>>>>>>> -Usage: if(condition, expr1, expr2) return expr1 if condition is true, otherwise return expr2. +Usage: if(condition, expr1, expr2) returns expr1 if condition is true, otherwise returns expr2. -Argument type: all the supported data type, (NOTE : if expr1 and expr2 are different type, you will fail semantic check +Argument type: all the supported data types (NOTE : if expr1 and expr2 are different types, you will fail semantic check). Return type: any @@ -221,9 +221,9 @@ CASE Description >>>>>>>>>>> -Usage: case(condition1, expr1, condition2, expr2, ... conditionN, exprN else default) return expr1 if condition1 is true, or return expr2 if condition2 is true, ... if no condition is true, then return the value of ELSE clause. If the ELSE clause is not defined, it returns NULL. +Usage: case(condition1, expr1, condition2, expr2, ... conditionN, exprN else default) returns expr1 if condition1 is true, or returns expr2 if condition2 is true, ... if no condition is true, then returns the value of ELSE clause. If the ELSE clause is not defined, returns NULL. -Argument type: all the supported data type, (NOTE : there is no comma before "else") +Argument type: all the supported data types (NOTE : there is no comma before "else"). Return type: any @@ -274,11 +274,9 @@ COALESCE Description >>>>>>>>>>> -Version: 3.1.0 +Usage: coalesce(field1, field2, ...) returns the first non-null, non-missing value in the argument list. -Usage: coalesce(field1, field2, ...) return the first non-null, non-missing value in the argument list. - -Argument type: all the supported data type. Supports mixed data types with automatic type coercion. +Argument type: all the supported data types. Supports mixed data types with automatic type coercion. Return type: determined by the least restrictive common type among all arguments, with fallback to string if no common type can be determined @@ -372,11 +370,9 @@ ISPRESENT Description >>>>>>>>>>> -Version: 3.1.0 - -Usage: ispresent(field) return true if the field exists. +Usage: ispresent(field) returns true if the field exists. -Argument type: all the supported data type. +Argument type: all the supported data types. Return type: BOOLEAN @@ -400,11 +396,9 @@ ISBLANK Description >>>>>>>>>>> -Version: 3.1.0 - Usage: isblank(field) returns true if the field is null, an empty string, or contains only white space. -Argument type: all the supported data type. +Argument type: all the supported data types. Return type: BOOLEAN @@ -428,11 +422,9 @@ ISEMPTY Description >>>>>>>>>>> -Version: 3.1.0 - Usage: isempty(field) returns true if the field is null or is an empty string. -Argument type: all the supported data type. +Argument type: all the supported data types. Return type: BOOLEAN @@ -455,9 +447,7 @@ EARLIEST Description >>>>>>>>>>> -Version: 3.1.0 - -Usage: earliest(relative_string, field) returns true if the value of field is after the timestamp derived from relative_string relative to the current time. Otherwise, return false. +Usage: earliest(relative_string, field) returns true if the value of field is after the timestamp derived from relative_string relative to the current time. Otherwise, returns false. relative_string: The relative string can be one of the following formats: @@ -511,9 +501,7 @@ LATEST Description >>>>>>>>>>> -Version: 3.1.0 - -Usage: latest(relative_string, field) returns true if the value of field is before the timestamp derived from relative_string relative to the current time. Otherwise, return false. +Usage: latest(relative_string, field) returns true if the value of field is before the timestamp derived from relative_string relative to the current time. Otherwise, returns false. Argument type: relative_string:STRING, field: TIMESTAMP @@ -543,8 +531,6 @@ REGEX_MATCH Description >>>>>>>>>>> -Version: 3.3.0 - Usage: regex_match(string, pattern) returns true if the regular expression pattern finds a match against any substring of the string value, otherwise returns false. The function uses Java regular expression syntax for the pattern. diff --git a/docs/user/ppl/functions/json.rst b/docs/user/ppl/functions/json.rst index 0114cf5ef9b..7686f624f98 100644 --- a/docs/user/ppl/functions/json.rst +++ b/docs/user/ppl/functions/json.rst @@ -39,10 +39,6 @@ JSON Description >>>>>>>>>>> -Version: 3.1.0 - -Limitation: Only works when plugins.calcite.enabled=true - Usage: `json(value)` Evaluates whether a string can be parsed as a json-encoded string. Returns the value if valid, null otherwise. Argument type: STRING @@ -68,10 +64,6 @@ JSON_OBJECT Description >>>>>>>>>>> -Version: 3.1.0 - -Limitation: Only works when plugins.calcite.enabled=true - Usage: `json_object(key1, value1, key2, value2...)` create a json object string with key value pairs. The key must be string. Argument type: key1: STRING, value1: ANY, key2: STRING, value2: ANY ... @@ -94,10 +86,6 @@ JSON_ARRAY Description >>>>>>>>>>> -Version: 3.1.0 - -Limitation: Only works when plugins.calcite.enabled=true - Usage: `json_array(element1, element2, ...)` create a json array string with elements. Argument type: element1: ANY, element2: ANY ... @@ -120,10 +108,6 @@ JSON_ARRAY_LENGTH Description >>>>>>>>>>> -Version: 3.1.0 - -Limitation: Only works when plugins.calcite.enabled=true - Usage: `json_array_length(value)` parse the string to json array and return size,, null is returned in case of any other valid JSON string, null or an invalid JSON. Argument type: value: A JSON STRING @@ -154,10 +138,6 @@ JSON_EXTRACT Description >>>>>>>>>>> -Version: 3.1.0 - -Limitation: Only works when plugins.calcite.enabled=true - Usage: `json_extract(json_string, path1, path2, ...)` Extracts values using the specified JSON paths. If only one path is provided, it returns a single value. If multiple paths are provided, it returns a JSON Array in the order of the paths. If one path cannot find value, return null as the result for this path. The path use "{}" to represent index for array, "{}" means "{*}". Argument type: json_string: STRING, path1: STRING, path2: STRING ... @@ -188,10 +168,6 @@ JSON_DELETE Description >>>>>>>>>>> -Version: 3.1.0 - -Limitation: Only works when plugins.calcite.enabled=true - Usage: `json_delete(json_string, path1, path2, ...)` Delete values using the specified JSON paths. Return the json string after deleting. If one path cannot find value, do nothing. Argument type: json_string: STRING, path1: STRING, path2: STRING ... @@ -230,10 +206,6 @@ JSON_SET Description >>>>>>>>>>> -Version: 3.1.0 - -Limitation: Only works when plugins.calcite.enabled=true - Usage: `json_set(json_string, path1, value1, path2, value2...)` Set values to corresponding paths using the specified JSON paths. If one path's parent node is not a json object, skip the path. Return the json string after setting. Argument type: json_string: STRING, path1: STRING, value1: ANY, path2: STRING, value2: ANY ... @@ -264,10 +236,6 @@ JSON_APPEND Description >>>>>>>>>>> -Version: 3.1.0 - -Limitation: Only works when plugins.calcite.enabled=true - Usage: `json_append(json_string, path1, value1, path2, value2...)` Append values to corresponding paths using the specified JSON paths. If one path's target node is not an array, skip the path. Return the json string after setting. Argument type: json_string: STRING, path1: STRING, value1: ANY, path2: STRING, value2: ANY ... @@ -306,10 +274,6 @@ JSON_EXTEND Description >>>>>>>>>>> -Version: 3.1.0 - -Limitation: Only works when plugins.calcite.enabled=true - Usage: `json_extend(json_string, path1, value1, path2, value2...)` Extend values to corresponding paths using the specified JSON paths. If one path's target node is not an array, skip the path. The function will try to parse the value as an array. If it can be parsed, extend it to the target array. Otherwise, regard the value a single one. Return the json string after setting. Argument type: json_string: STRING, path1: STRING, value1: ANY, path2: STRING, value2: ANY ... @@ -348,10 +312,6 @@ JSON_KEYS Description >>>>>>>>>>> -Version: 3.1.0 - -Limitation: Only works when plugins.calcite.enabled=true - Usage: `json_keys(json_string)` Return the key list of the Json object as a Json array. Otherwise, return null. Argument type: json_string: A JSON STRING diff --git a/docs/user/ppl/functions/statistical.rst b/docs/user/ppl/functions/statistical.rst index f87cc104872..3729c1991ca 100644 --- a/docs/user/ppl/functions/statistical.rst +++ b/docs/user/ppl/functions/statistical.rst @@ -17,7 +17,7 @@ Description Usage: max(x, y, ...) returns the maximum value from all provided arguments. Strings are treated as greater than numbers, so if provided both strings and numbers, it will return the maximum string value (lexicographically ordered) -Note: This function is only available in the eval command context and requires Calcite engine to be enabled. +Note: This function is only available in the eval command context. Argument type: Variable number of INTEGER/LONG/FLOAT/DOUBLE/STRING arguments @@ -67,7 +67,7 @@ Description Usage: min(x, y, ...) returns the minimum value from all provided arguments. Strings are treated as greater than numbers, so if provided both strings and numbers, it will return the minimum numeric value. -Note: This function is only available in the eval command context and requires Calcite engine to be enabled. +Note: This function is only available in the eval command context. Argument type: Variable number of INTEGER/LONG/FLOAT/DOUBLE/STRING arguments diff --git a/docs/user/ppl/index.rst b/docs/user/ppl/index.rst index 17b4797df39..c042764f590 100644 --- a/docs/user/ppl/index.rst +++ b/docs/user/ppl/index.rst @@ -48,111 +48,83 @@ The query start with search command and then flowing a set of command delimited * **Commands** - - `Syntax `_ + The following commands are available in PPL: + + **Note:** Experimental commands are ready for use, but specific parameters may change based on feedback. + + ============================================================== ================== ======================== ============================================================================================== + Command Name Version Introduced Current Status Command Description + ============================================================== ================== ======================== ============================================================================================== + `search command `_ 1.0 stable (since 1.0) Retrieve documents from the index. + `where command `_ 1.0 stable (since 1.0) Filter the search result using boolean expressions. + `subquery command `_ 3.0 experimental (since 3.0) Embed one PPL query inside another for complex filtering and data retrieval operations. + `fields command `_ 1.0 stable (since 1.0) Keep or remove fields from the search result. + `rename command `_ 1.0 stable (since 1.0) Rename one or more fields in the search result. + `eval command `_ 1.0 stable (since 1.0) Evaluate an expression and append the result to the search result. + `replace command `_ 3.4 experimental (since 3.4) Replace text in one or more fields in the search result + `fillnull command `_ 3.0 experimental (since 3.0) Fill null with provided value in one or more fields in the search result. + `expand command `_ 3.1 experimental (since 3.1) Transform a single document into multiple documents by expanding a nested array field. + `flatten command `_ 3.1 experimental (since 3.1) Flatten a struct or an object field into separate fields in a document. + `table command `_ 3.3 experimental (since 3.3) Keep or remove fields from the search result using enhanced syntax options. + `stats command `_ 1.0 stable (since 1.0) Calculate aggregation from search results. + `eventstats command `_ 3.1 experimental (since 3.1) Calculate aggregation statistics and add them as new fields to each event. + `bin command `_ 3.3 experimental (since 3.3) Group numeric values into buckets of equal intervals. + `timechart command `_ 3.3 experimental (since 3.3) Create time-based charts and visualizations. + `trendline command `_ 3.0 experimental (since 3.0) Calculate moving averages of fields. + `sort command `_ 1.0 stable (since 1.0) Sort all the search results by the specified fields. + `reverse command `_ 3.2 experimental (since 3.2) Reverse the display order of search results. + `head command `_ 1.0 stable (since 1.0) Return the first N number of specified results after an optional offset in search order. + `dedup command `_ 1.0 stable (since 1.0) Remove identical documents defined by the field from the search result. + `top command `_ 1.0 stable (since 1.0) Find the most common tuple of values of all fields in the field list. + `rare command `_ 1.0 stable (since 1.0) Find the least common tuple of values of all fields in the field list. + `parse command `_ 1.3 stable (since 1.3) Parse a text field with a regular expression and append the result to the search result. + `grok command `_ 2.4 stable (since 2.4) Parse a text field with a grok pattern and append the results to the search result. + `rex command `_ 3.3 experimental (since 3.3) Extract fields from a raw text field using regular expression named capture groups. + `regex command `_ 3.3 experimental (since 3.3) Filter search results by matching field values against a regular expression pattern. + `spath command `_ 3.3 experimental (since 3.3) Extract fields from structured text data. + `patterns command `_ 2.4 stable (since 2.4) Extract log patterns from a text field and append the results to the search result. + `join command `_ 3.0 stable (since 3.0) Combine two datasets together. + `append command `_ 3.3 experimental (since 3.3) Append the result of a sub-search to the bottom of the input search results. + `appendcol command `_ 3.1 experimental (since 3.1) Append the result of a sub-search and attach it alongside the input search results. + `lookup command `_ 3.0 experimental (since 3.0) Add or replace data from a lookup index. + `multisearch command `_ 3.4 experimental (since 3.4) Execute multiple search queries and combine their results. + `ml command `_: 2.5 stable (since 2.5) Apply machine learning algorithms to analyze data. + `kmeans command `_ 1.3 stable (since 1.3) Apply the kmeans algorithm on the search result returned by a PPL command. + `ad command `_ 1.3 deprecated (since 2.5) Apply Random Cut Forest algorithm on the search result returned by a PPL command. + `describe command `_ 2.1 stable (since 2.1) Query the metadata of an index. + `explain command `_ 3.1 stable (since 3.1) Explain the plan of query. + `show datasources command `_ 2.4 stable (since 2.4) Query datasources configured in the PPL engine. + ============================================================== ================== ======================== ============================================================================================== + + - `Syntax `_ - PPL query structure and command syntax formatting - - `ad command `_ - - - `append command `_ - - - `appendcol command `_ - - - `bin command `_ - - - `dedup command `_ - - - `describe command `_ - - - `eval command `_ - - - `eventstats command `_ - - - `expand command `_ - - - `explain command `_ - - - `fields command `_ - - - `fillnull command `_ - - - `flatten command `_ - - - `grok command `_ - - - `head command `_ - - - `join command `_ - - - `kmeans command `_ - - - `lookup command `_ - - - `ml command `_ - - - `multisearch command `_ - - - `parse command `_ - - - `patterns command `_ - - - `rare command `_ - - - `rename command `_ - - - `regex command `_ - - - `rex command `_ - - - `search command `_ - - - `show datasources command `_ - - - `sort command `_ - - - `spath command `_ - - - `stats command `_ - - - `subquery (aka subsearch) command `_ - - - `reverse command `_ - - - `table command `_ - - - `timechart command `_ +* **Functions** - - `top command `_ + - `Aggregation Functions `_ - - `trendline command `_ + - `Collection Functions `_ - - `replace command `_ + - `Condition Functions `_ - - `where command `_ + - `Cryptographic Functions `_ -* **Functions** + - `Date and Time Functions `_ - `Expressions `_ - - `Math Functions `_ - - - `Date and Time Functions `_ + - `IP Address Functions `_ - - `String Functions `_ + - `JSON Functions `_ - - `Condition Functions `_ + - `Math Functions `_ - `Relevance Functions `_ - - `Type Conversion Functions `_ + - `String Functions `_ - `System Functions `_ - - `IP Address Functions `_ - - - `Collection Functions `_ - - - `Cryptographic Functions `_ - - - `JSON Functions `_ + - `Type Conversion Functions `_ * **Optimization**