Skip to content
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 30 additions & 6 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

42 changes: 39 additions & 3 deletions docs/configuration/index-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,11 +84,11 @@ Today, only the s3 storage is available when running several searcher nodes.

## Doc mapping

The doc mapping defines how a document and the fields it contains are stored and indexed for a given index. A document is a collection of named fields, each having its own data type (text, binary, i64, u64, f64).
The doc mapping defines how a document and the fields it contains are stored and indexed for a given index. A document is a collection of named fields, each having its own data type (text, binary, datetime, i64, u64, f64).

| Variable | Description | Default value |
| ------------- | ------------- | ------------- |
| `field_mappings` | Collection of field mapping, each having its own data type (text, binary, i64, u64, f64). | [] |
| `field_mappings` | Collection of field mapping, each having its own data type (text, binary, datetime, i64, u64, f64). | [] |
| `mode` | Defines how quickwit should handle document fields that are not present in the `field_mappings`. In particular, the "dynamic" mode makes it possible to use quickwit in a schemaless manner. (See [mode](#mode)) | `lenient`
| `dynamic_mapping` | This parameter is only allowed when `mode` is set to `dynamic`. It then defines whether dynamically mapped fields should be indexed, stored, etc. | (See [mode](#mode))
| `tag_fields` | Collection of fields already defined in `field_mappings` whose values will be stored in a dedicated `tags` (1) | [] |
Expand All @@ -99,7 +99,7 @@ The doc mapping defines how a document and the fields it contains are stored and
### Field types

Each field has a type that indicates the kind of data it contains, such as integer on 64 bits or text.
Quickwit supports the following raw types `text`, `i64`, `u64`, `f64`, and `bytes`, and also supports composite types such as array and object. Behind the scenes, Quickwit is using tantivy field types, don't hesitate to look at [tantivy documentation](https://github.com/tantivy-search/tantivy) if you want to go into the details.
Quickwit supports the following raw types `text`, `i64`, `u64`, `f64`, `datetime`, and `bytes`, and also supports composite types such as array and object. Behind the scenes, Quickwit is using tantivy field types, don't hesitate to look at [tantivy documentation](https://github.com/tantivy-search/tantivy) if you want to go into the details.

### Raw types

Expand Down Expand Up @@ -172,6 +172,42 @@ fast: true
| `indexed` | Whether value is indexed | `true` |
| `fast` | Whether value is stored in a fast field | `false` |

#### `datetime` type

The `datetime` type can accepts multiple formats and a storage precision. The following formats are supported but need to be explicitly requested via configuration.
- `rfc3339`, `rfc2822`, `iso8601`: Parsing dates using standard specified formats.
- `strftime`: Parsing dates using the Unix [strftime](https://man7.org/linux/man-pages/man3/strftime.3.html) format.
- `unix_ts_secs`, `unix_ts_millis`, `unix_ts_micros`: Parsing dates from numbers (timestamp). Only one can be used in configuration. `unix_ts_secs` is added to the list by default if none is specified.

:::info
When accepting multiple formats, the corresponding parsers are tried in order they are declared.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When accepting multiple formats, the corresponding parsers are tried in order they are declared.
When specifying multiple input formats, the corresponding parsers are tried in the order they are declared.

:::

Example of a mapping for a datetime field:

```yaml
name: timestamp
type: datetime
input_formats:
- "rfc3339"
- "unix_ts_millis"
- "%Y %m %d %H:%M:%S %z"
precision: "milliseconds"
stored: true
indexed: true
fast: true
```

**Parameters for datetime field**

| Variable | Description | Default value |
| ------------- | ------------- | ------------- |
| `input_formats` | Formats used to parse input document datetime fields | [`rfc3339`, `unix_ts_secs`] |
| `precision` | The precision used to store the underlying fast value | `seconds` |
| `stored` | Whether value is stored in the document store | `true` |
| `indexed` | Whether value is indexed | `true` |
| `fast` | Whether value is stored in a fast field | `false` |

#### `bytes` type
The `bytes` type accepts a binary value as a `Base64` encoded string.

Expand Down
2 changes: 1 addition & 1 deletion quickwit-core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ quickwit-storage = { version = "0.3.1", path = "../quickwit-storage" }
rand = "0.8"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
tantivy = { git = "https://github.com/quickwit-oss/tantivy/", rev = "2406d92", default-features = false, features = [
tantivy = { git = "https://github.com/quickwit-oss/tantivy/", rev = "d24f31f", default-features = false, features = [
"mmap",
"lz4-compression",
"zstd-compression",
Expand Down
2 changes: 1 addition & 1 deletion quickwit-directories/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ quickwit-storage = { version = "0.3.1", path = "../quickwit-storage" }
serde = "1"
serde_cbor = "0.11"
serde_json = "1"
tantivy = { git = "https://github.com/quickwit-oss/tantivy/", rev = "2406d92", default-features = false, features = [
tantivy = { git = "https://github.com/quickwit-oss/tantivy/", rev = "d24f31f", default-features = false, features = [
"mmap",
"lz4-compression",
"zstd-compression",
Expand Down
8 changes: 4 additions & 4 deletions quickwit-directories/src/bundle_directory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -86,15 +86,15 @@ fn split_footer(file_slice: FileSlice) -> io::Result<(FileSlice, FileSlice)> {

/// Return two slices for given split: `[body and bundle meta data] [hotcache]`
pub fn get_hotcache_from_split(data: OwnedBytes) -> io::Result<OwnedBytes> {
let split_file = FileSlice::new(Box::new(data));
let split_file = FileSlice::new(Arc::new(data));
let (_, hotcache) = split_footer(split_file)?;
hotcache.read_bytes()
}

impl BundleDirectory {
/// Get files and their sizes in a split.
pub fn get_stats_split(data: OwnedBytes) -> io::Result<Vec<(PathBuf, u64)>> {
let split_file = FileSlice::new(Box::new(data));
let split_file = FileSlice::new(Arc::new(data));
let (body_and_bundle_metadata, hot_cache) = split_footer(split_file)?;
let file_offsets = BundleStorageFileOffsets::open(body_and_bundle_metadata)?;

Expand Down Expand Up @@ -128,9 +128,9 @@ impl BundleDirectory {
}

impl Directory for BundleDirectory {
fn get_file_handle(&self, path: &Path) -> Result<Box<dyn FileHandle>, OpenReadError> {
fn get_file_handle(&self, path: &Path) -> Result<Arc<dyn FileHandle>, OpenReadError> {
let file_slice = self.open_read(path)?;
Ok(Box::new(file_slice))
Ok(Arc::new(file_slice))
}

fn open_read(&self, path: &Path) -> Result<FileSlice, OpenReadError> {
Expand Down
6 changes: 3 additions & 3 deletions quickwit-directories/src/caching_directory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ impl fmt::Debug for CachingDirectory {
struct CachingFileHandle {
path: PathBuf,
cache: Arc<MemorySizedCache>,
underlying_filehandle: Box<dyn FileHandle>,
underlying_filehandle: Arc<dyn FileHandle>,
}

impl fmt::Debug for CachingFileHandle {
Expand Down Expand Up @@ -139,14 +139,14 @@ impl Directory for CachingDirectory {
fn get_file_handle(
&self,
path: &Path,
) -> std::result::Result<Box<dyn FileHandle>, OpenReadError> {
) -> std::result::Result<Arc<dyn FileHandle>, OpenReadError> {
let underlying_filehandle = self.underlying.get_file_handle(path)?;
let caching_file_handle = CachingFileHandle {
path: path.to_path_buf(),
cache: self.cache.clone(),
underlying_filehandle,
};
Ok(Box::new(caching_file_handle))
Ok(Arc::new(caching_file_handle))
}

fn atomic_read(&self, path: &Path) -> std::result::Result<Vec<u8>, OpenReadError> {
Expand Down
6 changes: 3 additions & 3 deletions quickwit-directories/src/debug_proxy_directory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ impl<D: Directory> DebugProxyDirectory<D> {

struct DebugProxyFileHandle<D: Directory> {
directory: DebugProxyDirectory<D>,
underlying: Box<dyn FileHandle>,
underlying: Arc<dyn FileHandle>,
path: PathBuf,
}

Expand Down Expand Up @@ -199,9 +199,9 @@ impl<D: Directory> HasLen for DebugProxyFileHandle<D> {
}

impl<D: Directory> Directory for DebugProxyDirectory<D> {
fn get_file_handle(&self, path: &Path) -> Result<Box<dyn FileHandle>, OpenReadError> {
fn get_file_handle(&self, path: &Path) -> Result<Arc<dyn FileHandle>, OpenReadError> {
let underlying = self.underlying.get_file_handle(path)?;
Ok(Box::new(DebugProxyFileHandle {
Ok(Arc::new(DebugProxyFileHandle {
underlying,
directory: self.clone(),
path: path.to_owned(),
Expand Down
5 changes: 2 additions & 3 deletions quickwit-directories/src/hot_directory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -414,7 +414,7 @@ impl fmt::Debug for HotDirectory {
}

impl Directory for HotDirectory {
fn get_file_handle(&self, path: &Path) -> Result<Box<dyn FileHandle>, OpenReadError> {
fn get_file_handle(&self, path: &Path) -> Result<Arc<dyn FileHandle>, OpenReadError> {
let file_length = self
.inner
.cache
Expand All @@ -427,7 +427,7 @@ impl Directory for HotDirectory {
static_cache: self.inner.cache.get_slice(path),
file_length,
};
Ok(Box::new(file_slice_with_cache))
Ok(Arc::new(file_slice_with_cache))
}

fn exists(&self, path: &std::path::Path) -> Result<bool, OpenReadError> {
Expand Down Expand Up @@ -474,7 +474,6 @@ pub fn write_hotcache<D: Directory>(
let schema = index.schema();
let reader: IndexReader = index
.reader_builder()
.num_searchers(1)
.reload_policy(ReloadPolicy::Manual)
.try_into()?;
let searcher = reader.searcher();
Expand Down
4 changes: 2 additions & 2 deletions quickwit-directories/src/storage_directory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,8 @@ fn unsupported_operation(path: &Path) -> io::Error {
}

impl Directory for StorageDirectory {
fn get_file_handle(&self, path: &Path) -> Result<Box<dyn FileHandle>, OpenReadError> {
Ok(Box::new(StorageDirectoryFileHandle {
fn get_file_handle(&self, path: &Path) -> Result<Arc<dyn FileHandle>, OpenReadError> {
Ok(Arc::new(StorageDirectoryFileHandle {
storage_directory: self.clone(),
path: path.to_path_buf(),
}))
Expand Down
2 changes: 1 addition & 1 deletion quickwit-directories/src/union_directory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ fn convert_open_to_delete_error(open_err: OpenReadError) -> DeleteError {
}

impl Directory for UnionDirectory {
fn get_file_handle(&self, path: &Path) -> Result<Box<dyn FileHandle>, OpenReadError> {
fn get_file_handle(&self, path: &Path) -> Result<Arc<dyn FileHandle>, OpenReadError> {
let directory = self.find_directory_for_path(path)?;
directory.get_file_handle(path)
}
Expand Down
15 changes: 11 additions & 4 deletions quickwit-doc-mapper/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,24 +12,31 @@ documentation = "https://quickwit.io/docs/"
[dependencies]
anyhow = "1"
base64 = "0.13"
chrono = "0.4.19"
derivative = "2.2.0"
dyn-clone = "1.0.4"
fnv = "1"
indexmap = { version = "1.9.1", features = ["serde"] }
itertools = "0.10"
mockall = { version = "0.11", optional = true }
once_cell = "1.13"
quickwit-proto = { version = "0.3.1", path = "../quickwit-proto" }
regex = "1"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
tantivy = { git = "https://github.com/quickwit-oss/tantivy/", rev = "2406d92", default-features = false, features = [
tantivy = { git = "https://github.com/quickwit-oss/tantivy/", rev = "d24f31f", default-features = false, features = [
"mmap",
"lz4-compression",
"zstd-compression",
"quickwit"
"quickwit",
] }
tantivy-query-grammar = { git = "https://github.com/quickwit-oss/tantivy/", rev = "2406d92" }
tantivy-query-grammar = { git = "https://github.com/quickwit-oss/tantivy/", rev = "d24f31f" }
thiserror = "1.0"
time = { version = "0.3.10", features = ["std", "macros"] }
tracing = "0.1.29"
typetag = "0.2"
quickwit-proto = { version = "0.3", path = "../quickwit-proto" }
unwrap-infallible = "0.1.5"


[dev-dependencies]
criterion = "0.3"
Expand Down
Loading