Skip to content

Conversation

@herkolategan
Copy link
Collaborator

@herkolategan herkolategan commented Oct 21, 2023

This PR contain several commits that ultimately enables the probabilistic selection of a shared process test virtual cluster that tests run against, iff the test also probabilistically decided to run with a test tenant. Previously only an external process virtual cluster would be available as an option when deciding to run with a test virtual cluster.

Additionally some API changes have been made to serverutils to encourage the use of the test interface layers when working with a test server or cluster. Tests that use other utility methods to connect or create connection URLs may not be aware of a test virtual cluster and lack required information when interacting with the cluster resulting in flakiness and failures.

Finally this change also disables selecting a shared process virtual cluster instead of an external process virtual cluster for tests which fail with a shared process virtual cluster. An issue has been created to track these tests: #112857
Issues linked to this issue will contain more detail relating to the failure. And the issue has been referenced in the code itself when disabling a shared process virtual cluster for a specific test.

Resolves: #111134

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@herkolategan herkolategan added the do-not-merge bors won't merge a PR with this label. label Oct 21, 2023
@herkolategan herkolategan force-pushed the hbl/vc-prob-shared-process-ext branch 9 times, most recently from d2cddbd to 3181940 Compare October 23, 2023 12:06
@herkolategan herkolategan changed the title serverutils: probabilistic shared mode process serverutils: probabilistic shared process virtual cluster Oct 23, 2023
@herkolategan herkolategan force-pushed the hbl/vc-prob-shared-process-ext branch from 2c67f9a to 53ceef5 Compare October 23, 2023 12:39
@herkolategan herkolategan force-pushed the hbl/vc-prob-shared-process-ext branch from ae31890 to 8796cd9 Compare October 24, 2023 13:05
@herkolategan herkolategan removed the do-not-merge bors won't merge a PR with this label. label Oct 24, 2023
@herkolategan herkolategan force-pushed the hbl/vc-prob-shared-process-ext branch 2 times, most recently from b5b461b to a03af6d Compare October 24, 2023 16:31
@herkolategan herkolategan marked this pull request as ready for review October 24, 2023 17:08
@herkolategan herkolategan requested a review from a team October 24, 2023 17:08
@herkolategan herkolategan requested review from a team as code owners October 24, 2023 17:08
@herkolategan herkolategan requested a review from a team October 24, 2023 17:08
@herkolategan herkolategan requested review from a team as code owners October 24, 2023 17:08
@herkolategan herkolategan force-pushed the hbl/vc-prob-shared-process-ext branch from a03af6d to 30f8feb Compare October 26, 2023 15:41
@herkolategan herkolategan requested a review from a team as a code owner October 27, 2023 12:51
Copy link
Collaborator Author

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @DarrylWong, @dhartunian, @jayshrivastava, @srosenberg, and @yuzefovich)


pkg/server/testserver.go line 743 at r3 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

I agree and I had the same thought, my only rotational was that I didn't fully comprehend the differences in capabilities between external and shared. But it's clear now that it's the same, except for I think can debug process works on external without explicitly setting it here. Busy merging these two, as well as taking the comment around selective capabilities into account.

Merged the two, but there is still a bit of work left I intend to complete in a follow-up PR: #107695, #113187


pkg/server/testserver.go line 763 at r3 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

Will be done in addition to removing the duplication of process capabilities. I'll just ask for a second look once I'm done.

I've added the check you have mentioned as well.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few more comments, but it :lgtm: It might still be worth it to wait for @stevendanna to take a look at least at the testserver changes.

Reviewed 48 of 48 files at r6, 4 of 4 files at r7, 1 of 1 files at r8, 25 of 25 files at r9, 21 of 21 files at r10, 1 of 1 files at r11, 2 of 2 files at r12, 2 of 2 files at r13, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @DarrylWong, @dhartunian, @herkolategan, @jayshrivastava, and @srosenberg)


pkg/server/testserver.go line 762 at r12 (raw file):

	}

	// Waiting for capabilities can time To avoid paying this cost in all

nit: s/can time/can time./.


pkg/testutils/serverutils/test_server_shim.go line 105 at r6 (raw file):

		shared = false
	default:
		shared = util.ConstantWithMetamorphicTestBoolWithoutLogging("test-tenant-shared-process", false)

nit: what's the rationale for using metamorphic helper here? I'd have expected that here we'd give each mode 50% probability, and the current way will pick the external process more frequently.


pkg/testutils/serverutils/test_server_shim.go line 115 at r6 (raw file):

	// Obey the env override if present.
	if str, present := envutil.EnvString("COCKROACH_TEST_TENANT", 0); present {

It might be nice to have a way to force the default test tenant with a particular process mode. We could either adjust the existing env var to take in a string or something, or introduce another bool env var that controls whether the shared process is picked (I'd probably prefer the latter option slightly), or perhaps in some other fashion. This doesn't have to be in this PR, but please leave a TODO for this if it's not done.


pkg/sql/tests/show_commit_timestamp_test.go line 80 at r9 (raw file):

	testutils.RunTrueAndFalse(t, "pgx batch; simple", func(t *testing.T, simple bool) {
		resetTable(t)
		pgURL, cleanup := s.PGUrl(t)

nit: add .ApplicationLayer() here and below.

@dhartunian dhartunian removed their request for review October 30, 2023 04:40
@blathers-crl
Copy link

blathers-crl bot commented Oct 30, 2023

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link
Collaborator Author

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @adityamaru, @DarrylWong, @jayshrivastava, @srosenberg, and @yuzefovich)


pkg/server/testserver.go line 762 at r12 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/can time/can time./.

done.


pkg/testutils/serverutils/test_server_shim.go line 105 at r6 (raw file):

metamorphicBuildProbability = 0.8

Ah right, metamorphic eligible builds don't always run metamorphic, so the 50% bool selection is slanted ~40% for the default case.
I'll update this to use the randutil.


pkg/testutils/serverutils/test_server_shim.go line 115 at r6 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

It might be nice to have a way to force the default test tenant with a particular process mode. We could either adjust the existing env var to take in a string or something, or introduce another bool env var that controls whether the shared process is picked (I'd probably prefer the latter option slightly), or perhaps in some other fashion. This doesn't have to be in this PR, but please leave a TODO for this if it's not done.

Good idea; adding a TODO.


pkg/sql/tests/show_commit_timestamp_test.go line 80 at r9 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: add .ApplicationLayer() here and below.

good catch, done.

Previously probabilistic test behaviour args did not discern if a virtual
cluster should be started as an external or shared process.

This change updates `testBehaviour` to account for permutations of the various
possible scenarios. It has been updated to a bitmask to allow probabilistic
enablement and probabilistic process modes. Any bit groups not set are assumed
to be probabilistic.

Explicit modes will have both an enablement bit (`ttEnabled`, `ttDisabled`) and
process mode (`ttSharedProcess`, `ttExternalProcess`) bit set. If the process
mode bit is left unset it will be selected probabilistically in
`ShouldStartDefaultTestTenant`.

Additionally, functions have been added to disable probabilistically selecting a
shared process virtual cluster for tests that do not work with a shared process
virtual cluster yet.

Epic: CRDB-31933
Release note: None
Previously some tests used `sqlutils.PGUrl` as a means to form a connection.
This worked well before the introduction of probabilistically testing with
shared virtual clusters. The URL for a shared process requires a cluster option
to be passed in this case.

In order to avoid branching logic in tests when a test virtual cluster is
present, this change adds a `PGUrl` method to the `ApplicationLayerInterface` so
that a decision can be made to add the cluster option to the URL without the
test having to specify it. A new option is added to be able to specify the dir
prefix for the certs directory as well.

Epic: CRDB-31933
Release note: None
Previously `maybeStartDefaultTestTenant` was only able to start an external
process virtual cluster. This change updates the method to be able to start a
shared process virtual cluster if the test params specify it.

The shared process test virtual cluster is granted default capabilities, similar
to the external process test virtual cluster.

Epic: CRDB-31933
Release note: None
Tests using `sqlutils.PGUrl` should be updated to rather use `PGUrl` or
`SQLConn` on the `ApplicationLayerInterface` to take advantage of the URL
automatically including the cluster option in the event the connection is to a
shared process virtual cluster.

Previously these tests would have failed if a shared process virtual cluster is
probabilistically started, since the URL would not contain the required cluster
option, when trying to connect.

Epic: CRDB-31933
Release note: None
This change disables probabilistically selecting a shared process test virtual
cluster for tests which fail when running against a shared process virtual
cluster.

See also: cockroachdb#112857

Epic: CRDB-26687
Release note: None
Previously a CCL license was required for the default test tenant to be started.
This requirement no longer holds true, hence the comment has been updated.

See also: cockroachdb#108762

Epic: CRDB-31933
Release note: None
Previously shared and external process tenants had separate sections for tenant
capabilities and cluster settings set during initialization. This change
combines the code to avoid duplication.

Epic: CRDB-31933
Release note: None
…est tenant

Disable tests which require specific capabilities for the default shared process
test tenant. The tests disabled in this change specifically require the
`can_debug_process` capability that is available with an external process
tenant, but not with a shared process tenant.

The shared process default test tenant can be re-enabled after the issue
mentioned below is addressed.
See also: cockroachdb#113187

Epic: CRDB-31933
Release note: None
Previously `StartSharedProcessTenant` did not have any default cluster settings
or capabilities. It now does, hence we need to remove this expectation from the
test as it uses the same logic to start a shared process virtual cluster.

See also: cockroachdb#113187

Epic: CRDB-31933
Release note: None
Use `randutil` to select a process mode probabilistically instead of a
metamorphic constant, as the latter will not yield an even selection when the
test virtual cluster is enabled.

Additionally, a comment has been added to note that an env var should be added
to be able to override the process mode if required.

See also: cockroachdb#113294

Epic: CRDB-31933
Release note: None
Epic: CRDB-31933
Release note: None
Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like where this landed. Thanks for pushing this forward and working to integrate it nicely into the existing test-server setup.

@herkolategan
Copy link
Collaborator Author

TFTRs!

bors r=yuzefovich,stevendanna

@craig
Copy link
Contributor

craig bot commented Nov 1, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

multitenant: Randomly test over all three tenancy configurations

4 participants