Skip to content

[FEATURE] Automatic Type Conversion for REX/SPATH/PARSE Command Extractions #4356

@penghuo

Description

@penghuo

Is your feature request related to a problem?
Yes. When using the REX command to extract numeric values and then performing arithmetic operations on them, the system fails because the extracted values are treated as strings rather than numbers.

For example, the following query fails:

source=log00001 | rex field=v "value=(?<digits>\d*)" | eval multi=digits * 10

With the error:

{
  "error": {
    "reason": "Invalid Query",
    "details": "MULTIPLY function expects {[INTEGER,INTEGER]|[INTEGER,DOUBLE]|[DOUBLE,INTEGER]|[DOUBLE,DOUBLE]|[INTERVAL,INTEGER]|[INTERVAL,DOUBLE]|[INTEGER,INTERVAL]|[DOUBLE,INTERVAL]}, but got [STRING,INTEGER]",
    "type": "ExpressionEvaluationException"
  },
  "status": 400
}

The issue is that the rex command extracts patterns as strings, but users often need to perform numeric operations on these extracted values.

What solution would you like?
Automatic type inference: The system should attempt to automatically convert extracted values to appropriate types when used in operations. For example, if a string can be converted as a number and used in a numeric operation, it should be automatically converted.

What alternatives have you considered?

  1. Explicit conversion function - Current workaround

    source=log00001 | rex field=v \"value=(?<digits>\\d*)\" | eval multi=cast(digits as int) * 10
    
  2. Type specification in rex command: Allow users to specify the expected type of the extracted field directly in the rex command:

    source=log00001 | rex field=v "value=(?<digits:int>\d*)" | eval multi=digits * 10
    

Do you have any additional context?
This feature would significantly improve the user experience when working with log data, where numeric values are often embedded within text strings. Similar functionality exists in other log processing languages, which automatically handles type conversions in many contexts.

The ability to seamlessly extract and perform operations on numeric values would reduce query complexity and make the system more intuitive for users coming from other log analysis platforms.

Test dataset

###
PUT {{baseUrl}}/log00001
Content-Type: application/x-ndjson

{
  "mappings": {
    "properties": {
      "v": {
        "type": "text"
      }
    }
  }
}

###
POST {{baseUrl}}/log00001/_bulk
Content-Type: application/x-ndjson

{"index": { "_id": 1 }}
{"v": "value=1000"}
{"index": { "_id": 2 }}
{"v": "value=2000"}
{"index": { "_id": 3 }}
{"v": "value=3000"}
{"index": { "_id": 4 }}
{"v": "value=4000"}

Sub-issues

Metadata

Metadata

Assignees

Labels

PPLPiped processing languageenhancementNew feature or request

Type

No type

Projects

Status

Not Started

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions