|
1 | | -Scripts for generating and loading test xAPI events |
2 | | -*************************************************** |
3 | | - |
4 | | -|pypi-badge| |ci-badge| |codecov-badge| |doc-badge| |pyversions-badge| |
5 | | -|license-badge| |status-badge| |
6 | | - |
| 1 | +Scripts for generating Aspects xAPI events |
| 2 | +****************************************** |
7 | 3 |
|
8 | 4 | Purpose |
9 | 5 | ======= |
| 6 | +This package generates a variety of test data used for integration and |
| 7 | +performance testing of Open edX Aspects. Currently it populates the following |
| 8 | +datasets: |
10 | 9 |
|
11 | | -Some test scripts to help make apples-to-apples comparisons of different |
12 | | -database backends for xAPI events. Supports direct database connections to |
13 | | -ClickHouse, and batch loading data to the Ralph Learning Record Store with the |
14 | | -ClickHouse backend. It also can create gzipped CSV files for bulk import to |
15 | | -other databases. |
| 10 | +- xAPI statements, simulating those generated by event-routing-backends |
| 11 | +- Course and learner data, simulating that generated by event-sink-clickhouse |
16 | 12 |
|
17 | | -xAPI events generated match the specifications of the Open edX |
| 13 | +The xAPI events generated match the current specifications of the Open edX |
18 | 14 | event-routing-backends package, but are not yet maintained to advance alongside |
19 | | -them. |
| 15 | +them so may be expected to fall out of sync over time. Almost all current |
| 16 | +statements are simulated, but statements that not yet used in Aspects reporting |
| 17 | +have been skipped. |
| 18 | + |
| 19 | +Features |
| 20 | +======== |
| 21 | +Once an appropriate database has been created using Aspects, data can be |
| 22 | +generated in the following ways: |
| 23 | + |
| 24 | +Ralph to ClickHouse |
| 25 | +------------------- |
| 26 | +Useful for testing configuration, integration, and permissions, this uses batch |
| 27 | +POSTs to Ralph for xAPI statements, but still writes directly to ClickHouse for |
| 28 | +course and actor data. This is the slowest method, but exercises the largest |
| 29 | +surface area of the project. |
| 30 | + |
| 31 | +Direct to ClickHouse |
| 32 | +-------------------- |
| 33 | +Useful for getting a medium to large amount of data into the database to test |
| 34 | +configuration and view reports. xAPI statements are batched, other data is |
| 35 | +currently inserted one row at a time. |
| 36 | + |
| 37 | +CSV files |
| 38 | +--------- |
| 39 | +Useful for creating datasets that can be reused for checking performance |
| 40 | +changes with the exact same data, and for extremely large tests. The files can |
| 41 | +be generated locally or on any service supported by smart_open. They can then |
| 42 | +optionally be imported to ClickHouse if written locally or to S3. They can also |
| 43 | +be directly imported from S3 to ClickHouse at any time using the |
| 44 | +``load-db-from-s3`` subcommand. This is by far the fastest method for large |
| 45 | +scale tests. |
| 46 | + |
20 | 47 |
|
21 | 48 | Getting Started |
22 | 49 | =============== |
23 | 50 |
|
24 | 51 | Usage |
25 | 52 | ----- |
26 | 53 |
|
27 | | -Details of how to run the current version of the script can be found by |
28 | | -executing: |
| 54 | +A configuration file is required to run a test. If no file is given, a small |
| 55 | +test will be run using the `default_config.yaml` included in the project: |
29 | 56 |
|
30 | 57 | :: |
31 | 58 |
|
32 | | - ❯ xapi-db-load --help |
| 59 | + ❯ xapi-db-load load-db |
| 60 | + |
| 61 | +To specify a config file: |
| 62 | + |
| 63 | +:: |
| 64 | + |
| 65 | + ❯ xapi-db-load load-db --config_file private_configs/my_huge_test.yaml |
| 66 | + |
| 67 | +There is also a sub-command for just performing a load of previously generated |
| 68 | +CSV data from S3: |
33 | 69 |
|
| 70 | +:: |
| 71 | + |
| 72 | + ❯ xapi-db-load load-db-from-s3 --config_file private_configs/my_s3_test.yaml |
| 73 | + |
| 74 | + |
| 75 | +Configuration Format |
| 76 | +-------------------- |
| 77 | +There are a number of different configuration options for tuning the output. |
| 78 | +In addition to the documentation below, there are example settings files to |
| 79 | +review in the ``example_configs`` directory. |
| 80 | + |
| 81 | +Common Settings |
| 82 | +^^^^^^^^^^^^^^^ |
| 83 | +These settings apply to all backends, and determine the size and makeup of the |
| 84 | +test:: |
| 85 | + |
| 86 | + # Location where timing logs will be saved |
| 87 | + log_dir: logs |
| 88 | + |
| 89 | + # xAPI statements will be generated in batches, the total number of |
| 90 | + # statements is ``num_batches * batch_size``. The batch size is the number |
| 91 | + # of statements sent to the backend (Ralph POST, ClickHouse insert, etc.) |
| 92 | + num_batches: 3 |
| 93 | + batch_size: 100 |
| 94 | + |
| 95 | + # Overall start and end date for the entire run. All xAPI statements |
| 96 | + # will fall within these dates. Different courses will have different start |
| 97 | + # and end dates between these days, based on course_length_days below. |
| 98 | + start_date: 2014-01-01 |
| 99 | + end_date: 2023-11-27 |
| 100 | + |
| 101 | + # All courses will be this long, they will be fit between start_date and |
| 102 | + # end_date, therefore this must be less than end_date - start_date days. |
| 103 | + course_length_days: 120 |
| 104 | + |
| 105 | + # The number of organizations, courses will be evenly spread among these |
| 106 | + num_organizations: 3 |
| 107 | + |
| 108 | + # The number of learners to create, random subsets of these will be |
| 109 | + # "registered" for each course and have statements generated for them |
| 110 | + # between their registration date and the end of the course |
| 111 | + num_actors: 10 |
| 112 | + |
| 113 | + # How many of each size course to create. The sum of these is the total |
| 114 | + # number of courses created for the test. The keys are arbitrary, you can |
| 115 | + # name them whatever you like and have as many or few sizes as you like. |
| 116 | + # The keys must exactly match the definitions in course_size_makeup below. |
| 117 | + num_course_sizes: |
| 118 | + small: 1 |
| 119 | + medium: 1 |
| 120 | + ... |
| 121 | + |
| 122 | + # Course type configurations, how many of each type of object are created |
| 123 | + # for each course of this size. "actors" must be less than or equal to |
| 124 | + # "num_actors". Keys here must exactly match the keys in num_course_sizes. |
| 125 | + course_size_makeup: |
| 126 | + small: |
| 127 | + actors: 5 |
| 128 | + problems: 20 |
| 129 | + videos: 10 |
| 130 | + chapters: 3 |
| 131 | + sequences: 10 |
| 132 | + verticals: 20 |
| 133 | + forum_posts: 20 |
| 134 | + medium: |
| 135 | + actors: 7 |
| 136 | + problems: 40 |
| 137 | + videos: 20 |
| 138 | + chapters: 4 |
| 139 | + sequences: 20 |
| 140 | + verticals: 30 |
| 141 | + forum_posts: 40 |
| 142 | + ... |
| 143 | + |
| 144 | +CSV Backend, Local Files |
| 145 | +^^^^^^^^^^^^^^^^^^^^^^^^ |
| 146 | +Generates gzipped CSV files to a local directory:: |
| 147 | + |
| 148 | + backend: csv_file |
| 149 | + csv_output_destination: logs/ |
| 150 | + |
| 151 | +CSV Backend, S3 Compatible Destination |
| 152 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 153 | +Generates gzipped CSV files to remote location:: |
| 154 | + |
| 155 | + backend: csv_file |
| 156 | + # This can be anything smart-open can handle (ex. a local directory or |
| 157 | + # an S3 bucket etc.) but importing to ClickHouse using this tool only |
| 158 | + # supports S3 or compatible services like MinIO right now. |
| 159 | + # Note that this *must* be an s3:// link, https links will not work |
| 160 | + # https://pypi.org/project/smart-open/ |
| 161 | + csv_output_destination: s3://openedx-aspects-loadtest/logs/large_test/ |
| 162 | + |
| 163 | + # These settings are shared with the ClickHouse backend |
| 164 | + s3_key: |
| 165 | + s3_secret: |
| 166 | + |
| 167 | +CSV Backend, S3 Compatible Destination, Load to ClickHouse |
| 168 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 169 | +Generates gzipped CSV files to a remote location, then automatically loads |
| 170 | +them to ClickHouse:: |
| 171 | + |
| 172 | + backend: csv_file |
| 173 | + # csv_output_destination can be anything smart_open can handle, a local |
| 174 | + # directory or an S3 bucket etc., but importing to ClickHouse using this |
| 175 | + # tool only supports S3 or compatible services (ex: MinIO) right now |
| 176 | + # https://pypi.org/project/smart-open/ |
| 177 | + csv_output_destination: s3://openedx-aspects-loadtest/logs/large_test/ |
| 178 | + csv_load_from_s3_after: true |
| 179 | + |
| 180 | + # Note that this *must* be an https link, s3:// links will not work, |
| 181 | + # this must point to the same location as csv_output_destination. |
| 182 | + s3_source_location: https://openedx-aspects-loadtest.s3.amazonaws.com/logs/large_test/ |
| 183 | + |
| 184 | + # This also requires all of the ClickHouse backend variables! |
| 185 | + |
| 186 | +ClickHouse Backend |
| 187 | +^^^^^^^^^^^^^^^^^^ |
| 188 | +Backend is only necessary if you are writing directly to ClickHouse, for |
| 189 | +integrations with Ralph or CSV, use their ``backend`` instead:: |
| 190 | + |
| 191 | + backend: clickhouse |
| 192 | + |
| 193 | +Variables necessary to connect to ClickHouse, whether directly, through Ralph, or |
| 194 | +as part of loading CSV files:: |
| 195 | + |
| 196 | + # ClickHouse connection variables |
| 197 | + db_host: localhost |
| 198 | + # db_port is also used to determine the "secure" parameter. If the port |
| 199 | + # ends in 443 or 440, the "secure" flag will be set on the connection. |
| 200 | + db_port: 8443 |
| 201 | + db_username: ch_admin |
| 202 | + db_password: secret |
| 203 | + |
| 204 | + # Schema name for the xAPI schema |
| 205 | + db_name: xapi |
| 206 | + |
| 207 | + # Schema name for the event sink schema |
| 208 | + db_event_sink_name: event_sink |
| 209 | + |
| 210 | + # These S3 settings are shared with the CSV backend, but passed to |
| 211 | + # ClickHouse when loading files from S3 |
| 212 | + s3_key: <...> |
| 213 | + s3_secret: <...> |
| 214 | + |
| 215 | +Ralph / ClickHouse Backend |
| 216 | +^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 217 | +Variables necessary to send xAPI statements via Ralph:: |
| 218 | + |
| 219 | + backend: ralph_clickhouse |
| 220 | + lrs_url: http://ralph.tutor-nightly-local.orb.local/xAPI/statements |
| 221 | + lrs_username: ralph |
| 222 | + lrs_password: secret |
| 223 | + |
| 224 | + # This also requires all of the ClickHouse backend variables! |
| 225 | + |
| 226 | +Load from S3 configuration |
| 227 | +^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 228 | +Variables necessary to run ``xapi-db-load load-db-from-s3``, which skips the |
| 229 | +event generation process and just loads pre-existing CSV files from S3:: |
| 230 | + |
| 231 | + # Note that this must be an https link, s3:// links will not work |
| 232 | + s3_source_location: https://openedx-aspects-loadtest.s3.amazonaws.com/logs/large_test/ |
| 233 | + |
| 234 | + # This also requires all of the ClickHouse backend variables! |
34 | 235 |
|
35 | 236 | Developing |
36 | 237 | ---------- |
@@ -162,29 +363,3 @@ Reporting Security Issues |
162 | 363 | ************************* |
163 | 364 |
|
164 | 365 | Please do not report security issues in public. Please email [email protected]. |
165 | | - |
166 | | -.. |pypi-badge| image:: https://img.shields.io/pypi/v/xapi-db-load.svg |
167 | | - :target: https://pypi.python.org/pypi/xapi-db-load/ |
168 | | - :alt: PyPI |
169 | | - |
170 | | -.. |ci-badge| image:: https://github.com/openedx/xapi-db-load/workflows/Python%20CI/badge.svg?branch=main |
171 | | - :target: https://github.com/openedx/xapi-db-load/actions |
172 | | - :alt: CI |
173 | | - |
174 | | -.. |codecov-badge| image:: https://codecov.io/github/openedx/xapi-db-load/coverage.svg?branch=main |
175 | | - :target: https://codecov.io/github/openedx/xapi-db-load?branch=main |
176 | | - :alt: Codecov |
177 | | - |
178 | | -.. |doc-badge| image:: https://readthedocs.org/projects/xapi-db-load/badge/?version=latest |
179 | | - :target: https://xapi-db-load.readthedocs.io/en/latest/ |
180 | | - :alt: Documentation |
181 | | - |
182 | | -.. |pyversions-badge| image:: https://img.shields.io/pypi/pyversions/xapi-db-load.svg |
183 | | - :target: https://pypi.python.org/pypi/xapi-db-load/ |
184 | | - :alt: Supported Python versions |
185 | | - |
186 | | -.. |license-badge| image:: https://img.shields.io/github/license/openedx/xapi-db-load.svg |
187 | | - :target: https://github.com/openedx/xapi-db-load/blob/main/LICENSE.txt |
188 | | - :alt: License |
189 | | - |
190 | | -.. |status-badge| image:: https://img.shields.io/badge/Status-Experimental-yellow |
|
0 commit comments