Skip to content

Conversation

@bmtcril
Copy link
Contributor

@bmtcril bmtcril commented Dec 11, 2023

Apologies for the massive PR, this has been getting heavy iteration while I've been load testing for the last several weeks. The culmination of this work is a much faster, more reproducible, and more realistic set of data that exercises all of the existing Aspects Instructor Dashboard reports.

The changes fall into a few buckets:

  • Move settings to a config file format, there were so many it was becoming impossible to deal with on the command line
  • Remove schema management, you must now pre-create the schema using Aspects
  • Refactor settings to use explicit numbers of actors/enrollments/courses/blocks instead of randomly assigning them, and make them configurable for explicitly testing different factors of load
  • Refactor actor usage, actors now have enrollment dates for each course, are shared across courses more effectively, and have fake PII / external ids.
  • Update course block data, blocks now contain a fair approximation of section/subsection/unit as Aspects expects
  • Update some xAPI statements to match current ERB output
  • Add ability to write CSVs directly to S3 (1B events is ~213GB!)
  • Add ability to import CSVs directly from S3 to ClickHouse for faster and more stable iteration on schema changes
  • Add forum post statement type

Works in CH and CSV, fails in Ralph right now. Learners in course size markup not respected yet.
Actors are now assigned to courses with a registration date
less than the course end date. Their events should all be
between their registration date and the course end date. All
actors are registered as part of setup.
Small updates to existing events to be more realistic and match
what is actually being produced.
Also refactor to remove "known_" prefixes since they're all
generated up front now.
Works for local files still, but also allows direct write to
(for example) S3, so we don't write gigs of files locally,
then have to upload them. Seems to take about 30% longer than
just writing locally, but if you include the upload time it saves
about 50%.
Given a location and credentials, this can do a direct import
from S3 to ClickHouse, which is very fast.
That wasn't used in most cases, and since we now explicitly
define the distributions we no longer need it.
@bmtcril bmtcril force-pushed the bmtcril/config_from_file branch from 460bf74 to a70b0fb Compare December 11, 2023 21:18
Copy link
Contributor

@Ian2012 Ian2012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, still pending a testing with s3 and k8s

@Ian2012
Copy link
Contributor

Ian2012 commented Dec 11, 2023

@bmtcril To avoid issues with updates on ERB, should we make obligatory to update this repo too whenever a contribution or change is done there?

@bmtcril
Copy link
Contributor Author

bmtcril commented Dec 19, 2023

@Ian2012 I don't think we need to block ERB on this project, it's generally ok for it to fall a little behind since it's not a validation tool, just a best approximation load test tool.

Copy link
Contributor

@Ian2012 Ian2012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally with s3, CSV, Clickhouse and Ralph backends

@bmtcril bmtcril merged commit 69d1c9f into main Dec 20, 2023
@bmtcril bmtcril deleted the bmtcril/config_from_file branch December 20, 2023 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants