Skip to content

Commit 460bf74

Browse files
committed
docs: Update docs to match current state
1 parent edc8a93 commit 460bf74

File tree

1 file changed

+217
-42
lines changed

1 file changed

+217
-42
lines changed

README.rst

Lines changed: 217 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,237 @@
1-
Scripts for generating and loading test xAPI events
2-
***************************************************
3-
4-
|pypi-badge| |ci-badge| |codecov-badge| |doc-badge| |pyversions-badge|
5-
|license-badge| |status-badge|
6-
1+
Scripts for generating Aspects xAPI events
2+
******************************************
73

84
Purpose
95
=======
6+
This package generates a variety of test data used for integration and
7+
performance testing of Open edX Aspects. Currently it populates the following
8+
datasets:
109

11-
Some test scripts to help make apples-to-apples comparisons of different
12-
database backends for xAPI events. Supports direct database connections to
13-
ClickHouse, and batch loading data to the Ralph Learning Record Store with the
14-
ClickHouse backend. It also can create gzipped CSV files for bulk import to
15-
other databases.
10+
- xAPI statements, simulating those generated by event-routing-backends
11+
- Course and learner data, simulating that generated by event-sink-clickhouse
1612

17-
xAPI events generated match the specifications of the Open edX
13+
The xAPI events generated match the current specifications of the Open edX
1814
event-routing-backends package, but are not yet maintained to advance alongside
19-
them.
15+
them so may be expected to fall out of sync over time. Almost all current
16+
statements are simulated, but statements that not yet used in Aspects reporting
17+
have been skipped.
18+
19+
Features
20+
========
21+
Once an appropriate database has been created using Aspects, data can be
22+
generated in the following ways:
23+
24+
Ralph to ClickHouse
25+
-------------------
26+
Useful for testing configuration, integration, and permissions, this uses batch
27+
POSTs to Ralph for xAPI statements, but still writes directly to ClickHouse for
28+
course and actor data. This is the slowest method, but exercises the largest
29+
surface area of the project.
30+
31+
Direct to ClickHouse
32+
--------------------
33+
Useful for getting a medium to large amount of data into the database to test
34+
configuration and view reports. xAPI statements are batched, other data is
35+
currently inserted one row at a time.
36+
37+
CSV files
38+
---------
39+
Useful for creating datasets that can be reused for checking performance
40+
changes with the exact same data, and for extremely large tests. The files can
41+
be generated locally or on any service supported by smart_open. They can then
42+
optionally be imported to ClickHouse if written locally or to S3. They can also
43+
be directly imported from S3 to ClickHouse at any time using the
44+
``load-db-from-s3`` subcommand. This is by far the fastest method for large
45+
scale tests.
46+
2047

2148
Getting Started
2249
===============
2350

2451
Usage
2552
-----
2653

27-
Details of how to run the current version of the script can be found by
28-
executing:
54+
A configuration file is required to run a test. If no file is given, a small
55+
test will be run using the `default_config.yaml` included in the project:
2956

3057
::
3158

32-
❯ xapi-db-load --help
59+
❯ xapi-db-load load-db
60+
61+
To specify a config file:
62+
63+
::
64+
65+
❯ xapi-db-load load-db --config_file private_configs/my_huge_test.yaml
66+
67+
There is also a sub-command for just performing a load of previously generated
68+
CSV data from S3:
3369

70+
::
71+
72+
❯ xapi-db-load load-db-from-s3 --config_file private_configs/my_s3_test.yaml
73+
74+
75+
Configuration Format
76+
--------------------
77+
There are a number of different configuration options for tuning the output.
78+
In addition to the documentation below, there are example settings files to
79+
review in the ``example_configs`` directory.
80+
81+
Common Settings
82+
^^^^^^^^^^^^^^^
83+
These settings apply to all backends, and determine the size and makeup of the
84+
test::
85+
86+
# Location where timing logs will be saved
87+
log_dir: logs
88+
89+
# xAPI statements will be generated in batches, the total number of
90+
# statements is ``num_batches * batch_size``. The batch size is the number
91+
# of statements sent to the backend (Ralph POST, ClickHouse insert, etc.)
92+
num_batches: 3
93+
batch_size: 100
94+
95+
# Overall start and end date for the entire run. All xAPI statements
96+
# will fall within these dates. Different courses will have different start
97+
# and end dates between these days, based on course_length_days below.
98+
start_date: 2014-01-01
99+
end_date: 2023-11-27
100+
101+
# All courses will be this long, they will be fit between start_date and
102+
# end_date, therefore this must be less than end_date - start_date days.
103+
course_length_days: 120
104+
105+
# The number of organizations, courses will be evenly spread among these
106+
num_organizations: 3
107+
108+
# The number of learners to create, random subsets of these will be
109+
# "registered" for each course and have statements generated for them
110+
# between their registration date and the end of the course
111+
num_actors: 10
112+
113+
# How many of each size course to create. The sum of these is the total
114+
# number of courses created for the test. The keys are arbitrary, you can
115+
# name them whatever you like and have as many or few sizes as you like.
116+
# The keys must exactly match the definitions in course_size_makeup below.
117+
num_course_sizes:
118+
small: 1
119+
medium: 1
120+
...
121+
122+
# Course type configurations, how many of each type of object are created
123+
# for each course of this size. "actors" must be less than or equal to
124+
# "num_actors". Keys here must exactly match the keys in num_course_sizes.
125+
course_size_makeup:
126+
small:
127+
actors: 5
128+
problems: 20
129+
videos: 10
130+
chapters: 3
131+
sequences: 10
132+
verticals: 20
133+
forum_posts: 20
134+
medium:
135+
actors: 7
136+
problems: 40
137+
videos: 20
138+
chapters: 4
139+
sequences: 20
140+
verticals: 30
141+
forum_posts: 40
142+
...
143+
144+
CSV Backend, Local Files
145+
^^^^^^^^^^^^^^^^^^^^^^^^
146+
Generates gzipped CSV files to a local directory::
147+
148+
backend: csv_file
149+
csv_output_destination: logs/
150+
151+
CSV Backend, S3 Compatible Destination
152+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
153+
Generates gzipped CSV files to remote location::
154+
155+
backend: csv_file
156+
# This can be anything smart-open can handle (ex. a local directory or
157+
# an S3 bucket etc.) but importing to ClickHouse using this tool only
158+
# supports S3 or compatible services like MinIO right now.
159+
# Note that this *must* be an s3:// link, https links will not work
160+
# https://pypi.org/project/smart-open/
161+
csv_output_destination: s3://openedx-aspects-loadtest/logs/large_test/
162+
163+
# These settings are shared with the ClickHouse backend
164+
s3_key:
165+
s3_secret:
166+
167+
CSV Backend, S3 Compatible Destination, Load to ClickHouse
168+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
169+
Generates gzipped CSV files to a remote location, then automatically loads
170+
them to ClickHouse::
171+
172+
backend: csv_file
173+
# csv_output_destination can be anything smart_open can handle, a local
174+
# directory or an S3 bucket etc., but importing to ClickHouse using this
175+
# tool only supports S3 or compatible services (ex: MinIO) right now
176+
# https://pypi.org/project/smart-open/
177+
csv_output_destination: s3://openedx-aspects-loadtest/logs/large_test/
178+
csv_load_from_s3_after: true
179+
180+
# Note that this *must* be an https link, s3:// links will not work,
181+
# this must point to the same location as csv_output_destination.
182+
s3_source_location: https://openedx-aspects-loadtest.s3.amazonaws.com/logs/large_test/
183+
184+
# This also requires all of the ClickHouse backend variables!
185+
186+
ClickHouse Backend
187+
^^^^^^^^^^^^^^^^^^
188+
Backend is only necessary if you are writing directly to ClickHouse, for
189+
integrations with Ralph or CSV, use their ``backend`` instead::
190+
191+
backend: clickhouse
192+
193+
Variables necessary to connect to ClickHouse, whether directly, through Ralph, or
194+
as part of loading CSV files::
195+
196+
# ClickHouse connection variables
197+
db_host: localhost
198+
# db_port is also used to determine the "secure" parameter. If the port
199+
# ends in 443 or 440, the "secure" flag will be set on the connection.
200+
db_port: 8443
201+
db_username: ch_admin
202+
db_password: secret
203+
204+
# Schema name for the xAPI schema
205+
db_name: xapi
206+
207+
# Schema name for the event sink schema
208+
db_event_sink_name: event_sink
209+
210+
# These S3 settings are shared with the CSV backend, but passed to
211+
# ClickHouse when loading files from S3
212+
s3_key: <...>
213+
s3_secret: <...>
214+
215+
Ralph / ClickHouse Backend
216+
^^^^^^^^^^^^^^^^^^^^^^^^^^
217+
Variables necessary to send xAPI statements via Ralph::
218+
219+
backend: ralph_clickhouse
220+
lrs_url: http://ralph.tutor-nightly-local.orb.local/xAPI/statements
221+
lrs_username: ralph
222+
lrs_password: secret
223+
224+
# This also requires all of the ClickHouse backend variables!
225+
226+
Load from S3 configuration
227+
^^^^^^^^^^^^^^^^^^^^^^^^^^
228+
Variables necessary to run ``xapi-db-load load-db-from-s3``, which skips the
229+
event generation process and just loads pre-existing CSV files from S3::
230+
231+
# Note that this must be an https link, s3:// links will not work
232+
s3_source_location: https://openedx-aspects-loadtest.s3.amazonaws.com/logs/large_test/
233+
234+
# This also requires all of the ClickHouse backend variables!
34235

35236
Developing
36237
----------
@@ -162,29 +363,3 @@ Reporting Security Issues
162363
*************************
163364

164365
Please do not report security issues in public. Please email [email protected].
165-
166-
.. |pypi-badge| image:: https://img.shields.io/pypi/v/xapi-db-load.svg
167-
:target: https://pypi.python.org/pypi/xapi-db-load/
168-
:alt: PyPI
169-
170-
.. |ci-badge| image:: https://github.com/openedx/xapi-db-load/workflows/Python%20CI/badge.svg?branch=main
171-
:target: https://github.com/openedx/xapi-db-load/actions
172-
:alt: CI
173-
174-
.. |codecov-badge| image:: https://codecov.io/github/openedx/xapi-db-load/coverage.svg?branch=main
175-
:target: https://codecov.io/github/openedx/xapi-db-load?branch=main
176-
:alt: Codecov
177-
178-
.. |doc-badge| image:: https://readthedocs.org/projects/xapi-db-load/badge/?version=latest
179-
:target: https://xapi-db-load.readthedocs.io/en/latest/
180-
:alt: Documentation
181-
182-
.. |pyversions-badge| image:: https://img.shields.io/pypi/pyversions/xapi-db-load.svg
183-
:target: https://pypi.python.org/pypi/xapi-db-load/
184-
:alt: Supported Python versions
185-
186-
.. |license-badge| image:: https://img.shields.io/github/license/openedx/xapi-db-load.svg
187-
:target: https://github.com/openedx/xapi-db-load/blob/main/LICENSE.txt
188-
:alt: License
189-
190-
.. |status-badge| image:: https://img.shields.io/badge/Status-Experimental-yellow

0 commit comments

Comments
 (0)