Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Dec 5, 2018

This change adds rolling-upgrade and full-cluster-restart tests with soft-deletes enabled.

Note that these tests are currently broken because we do not allow update DocValues of old segments (see https://issues.apache.org/jira/browse/LUCENE-8461). I think we need to remove this restriction.

@dnhatn dnhatn added >test Issues or PRs that are addressing/adding tests :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 labels Dec 5, 2018
@dnhatn dnhatn requested review from bleskes, jpountz and s1monw December 5, 2018 23:29
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dnhatn
Copy link
Member Author

dnhatn commented Dec 5, 2018

Below is the stack trace of the full cluster restart test.

|    [2018-12-06T00:26:32,420][WARN ][o.e.c.r.a.AllocationService] [node-1] failing shard [failed shard, shard [testsoftdeletes][0], node[RgRgfC1OQeCSThD4uugJwg], [P], recovery_source[existing store recovery; bootstrap_history_uuid=false], s[INITIALIZING], a[id=zZzX_JD4SmSW8qUgd2Beqg], unassigned_info[[reason=ALLOCATION_FAILED], at[2018-12-05T23:26:32.330Z], failed_attempts[4], delayed=false, details[failed shard on node [7XyQZMeFSYmCqmCUl62NBQ]: shard failure, reason [lucene commit failed], failure IllegalStateException[This codec should only be used for reading, not writing]], allocation_status[fetching_shard_data]], message [shard failure, reason [lucene commit failed]], failure [IllegalStateException[This codec should only be used for reading, not writing]], markAsStale [true]]
|    java.lang.IllegalStateException: This codec should only be used for reading, not writing
|       at org.apache.lucene.codecs.lucene70.Lucene70Codec$2.getDocValuesFormatForField(Lucene70Codec.java:69) ~[lucene-backward-codecs-8.0.0-snapshot-c78429a554.jar:8.0.0-snapshot-c78429a554 c78429a554d28611dacd90c388e6c34039b228d1 - romseygeek - 2018-12-04 10:17:44]
|       at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.getInstance(PerFieldDocValuesFormat.java:168) ~[lucene-core-8.0.0-snapshot-c78429a554.jar:8.0.0-snapshot-c78429a554 c78429a554d28611dacd90c388e6c34039b228d1 - romseygeek - 2018-12-04 10:17:44]
|       at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addNumericField(PerFieldDocValuesFormat.java:109) ~[lucene-core-8.0.0-snapshot-c78429a554.jar:8.0.0-snapshot-c78429a554 c78429a554d28611dacd90c388e6c34039b228d1 - romseygeek - 2018-12-04 10:17:44]
|       at org.apache.lucene.index.ReadersAndUpdates.handleDVUpdates(ReadersAndUpdates.java:368) ~[lucene-core-8.0.0-snapshot-c78429a554.jar:8.0.0-snapshot-c78429a554 c78429a554d28611dacd90c388e6c34039b228d1 - romseygeek - 2018-12-04 10:17:44]
|       at org.apache.lucene.index.ReadersAndUpdates.writeFieldUpdates(ReadersAndUpdates.java:570) ~[lucene-core-8.0.0-snapshot-c78429a554.jar:8.0.0-snapshot-c78429a554 c78429a554d28611dacd90c388e6c34039b228d1 - romseygeek - 2018-12-04 10:17:44]
|       at org.apache.lucene.index.ReaderPool.commit(ReaderPool.java:325) ~[lucene-core-8.0.0-snapshot-c78429a554.jar:8.0.0-snapshot-c78429a554 c78429a554d28611dacd90c388e6c34039b228d1 - romseygeek - 2018-12-04 10:17:44]
|       at org.apache.lucene.index.IndexWriter.writeReaderPool(IndexWriter.java:3328) ~[lucene-core-8.0.0-snapshot-c78429a554.jar:8.0.0-snapshot-c78429a554 c78429a554d28611dacd90c388e6c34039b228d1 - romseygeek - 2018-12-04 10:17:44]
|       at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3239) ~[lucene-core-8.0.0-snapshot-c78429a554.jar:8.0.0-snapshot-c78429a554 c78429a554d28611dacd90c388e6c34039b228d1 - romseygeek - 2018-12-04 10:17:44]
|       at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3466) ~[lucene-core-8.0.0-snapshot-c78429a554.jar:8.0.0-snapshot-c78429a554 c78429a554d28611dacd90c388e6c34039b228d1 - romseygeek - 2018-12-04 10:17:44]
|       at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3431) ~[lucene-core-8.0.0-snapshot-c78429a554.jar:8.0.0-snapshot-c78429a554 c78429a554d28611dacd90c388e6c34039b228d1 - romseygeek - 2018-12-04 10:17:44]
|       at org.elasticsearch.index.engine.InternalEngine.commitIndexWriter(InternalEngine.java:2324) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
|       at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslogInternal(InternalEngine.java:451) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
|       at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:411) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
|       at org.elasticsearch.index.engine.InternalEngine.recoverFromTranslog(InternalEngine.java:109) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
|       at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1348) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
|       at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:424) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
|       at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
|       at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:302) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
|       at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
|       at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1622) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
|       at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$6(IndexShard.java:2115) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
|       at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:660) ~[elasticsearch-7.0.0-SNAPSHOT.jar:7.0.0-SNAPSHOT]
|       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
|       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
|       at java.lang.Thread.run(Thread.java:834) [?:?]

@s1monw
Copy link
Contributor

s1monw commented Dec 6, 2018

test LGTM - we need to fix this in lucene updateable DV need this behavior too. @jpountz any objections?

@jpountz
Copy link
Contributor

jpountz commented Dec 6, 2018

Agreed, this needs fixing (Simon and I just had a discussion, I was surprised that this fails in spite of the fact that we have tests that dv updates work on old indices, but we only seem to test pre-existing fields).

@s1monw
Copy link
Contributor

s1monw commented Dec 6, 2018

@dnhatn I fixed the issue in lucene. Can you add a new snapshot build?

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the test passes

dnhatn added a commit that referenced this pull request Dec 7, 2018
Includes:

LUCENE-8594: DV update are broken for updates on new field
LUCENE-8590: Optimize DocValues update datastructures
LUCENE-8593: Specialize single value numeric DV updates

Relates #36286
@dnhatn
Copy link
Member Author

dnhatn commented Dec 7, 2018

Thanks @s1monw and @jpountz.

@dnhatn dnhatn merged commit 968b0b1 into elastic:master Dec 7, 2018
@dnhatn dnhatn deleted the soft-deletes-upgrade-test branch December 7, 2018 08:01
dnhatn added a commit that referenced this pull request Dec 9, 2018
This change adds a rolling-upgrade and full-cluster-restart test with
jasontedor added a commit to liketic/elasticsearch that referenced this pull request Dec 9, 2018
* elastic/6.x: (37 commits)
  [HLRC] Added support for Follow Stats API (elastic#36253)
  Exposed engine must have all ops below gcp during rollback (elastic#36159)
  TEST: Always enable soft-deletes in ShardChangesTests
  Use delCount of SegmentInfos to calculate numDocs (elastic#36323)
  Add soft-deletes upgrade tests (elastic#36286)
  Remove LocalCheckpointTracker#resetCheckpoint (elastic#34667)
  Option to use endpoints starting with _security (elastic#36379)
  [CCR] Restructured QA modules (elastic#36404)
  RestClient: on retry timeout add root exception (elastic#25576)
  [HLRC] Add support for put privileges API (elastic#35679)
  HLRC: Add rollup search (elastic#36334)
  Explicitly recommend to forceMerge before freezing (elastic#36376)
  Rename internal repository actions to be internal (elastic#36377)
  Core: Remove parseDefaulting from DateFormatter (elastic#36386)
  [ML] Prevent stack overflow while copying ML jobs and datafeeds (elastic#36370)
  Docs: Fix Jackson reference (elastic#36366)
  [ILM] Fix issue where index may not yet be in 'hot' phase (elastic#35716)
  Undeprecate /_watcher endpoints (elastic#36269)
  Docs: Fix typo in bool query (elastic#36350)
  HLRC: Add delete template API (elastic#36320)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >test Issues or PRs that are addressing/adding tests v6.6.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants