You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* This method infers the field type by examining the _source documents. For a given value, type inference is similar to the dynamic mapping type guessing logic.
29
+
* Instead of guessing the type based on the first document, it generates a random sample of documents to make a more accurate inference.
30
+
* This approach is particularly useful when dealing with missing fields, which is common in nested fields within derived fields of object types.
31
+
*
32
+
* <p>The sample size should be selected carefully to ensure a high probability of selecting at least one document where the field is present.
33
+
* However, it's important to maintain a balance as a large sample size can lead to performance issues as for each sample document _source field is loaded and examined.
34
+
*
35
+
* <p>The problem of determining the sample size (S) is akin to deciding how many balls to draw from a bin,
36
+
* ensuring a high probability (>=P) of drawing at least one green ball (documents with the field) from a mixture of
37
+
* R red balls (documents without the field) and G green balls -
38
+
* <pre>
39
+
* P >= 1 - C(R, S) / C(R + G, S)
40
+
* </pre>
41
+
* Where C() is the binomial coefficient
42
+
* For a high confidence, we want the P >= 0.95
43
+
*/
44
+
27
45
publicclassFieldTypeInference {
28
46
privatefinalIndexReaderindexReader;
29
47
privatefinalStringindexName;
30
48
privatefinalMapperServicemapperService;
31
49
// TODO expose using a index setting?
32
50
privateintsampleSize;
33
-
34
-
// this will lead to the probability of more than 0.95 to select on the document containing this field,
35
-
// when at least 5% of the overall documents contain the field
0 commit comments