Skip to content

Commit 70e29f5

Browse files
committed
fixes core setup documentation
1 parent 8f99426 commit 70e29f5

File tree

2 files changed

+46
-21
lines changed

2 files changed

+46
-21
lines changed

build.sbt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ lazy val core = (project in file("dq-core"))
7878
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = true),
7979
test in assembly := {},
8080
mappings in Universal += {
81+
// TODO: Add paths application configuration files
8182
val confFile = buildEnv.value match {
8283
case BuildEnv.Dev => "path to application.conf"
8384
case BuildEnv.Test => "path to application.conf"
@@ -86,6 +87,7 @@ lazy val core = (project in file("dq-core"))
8687
((resourceDirectory in Compile).value / confFile) -> "conf/application.conf"
8788
},
8889
mappings in Universal ++= {
90+
// TODO: Add paths application integration files
8991
val integrationFolder = integrationEnv.value match {
9092
case IntegrationEnv.local => "path to integration directory"
9193
}

docs/installation/core-setup.md

Lines changed: 44 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,53 @@
1-
Using DQ
2-
------------
1+
## Data Quality core module setup and usage
32

4-
DQ is written in Scala, and the build is managed with SBT.
3+
DQ main application is written in Scala, and the build is managed with SBT.
54

6-
Before starting:
7-
- Install JDK
8-
- Install SBT
9-
- Install Git
5+
> **Before starting:** Install JDK, Scala, sbt and Git.
106
11-
The steps to getting DQ up and running for development are pretty simple:
7+
First of all, clone this repository:
8+
```
9+
git clone https://github.com/agile-lab-dev/DataQuality.git
10+
```
1211

13-
- Clone this repository:
12+
Then you have 2 options:
13+
- Run DQ on local
14+
- Create an archive with setup to run in your distributed environment
1415

15-
`git clone https://github.com/agile-lab-dev/DataQuality.git`
16+
#### Local run
1617

17-
- Start DQ. You can either run DQ in local or cluster mode:
18+
Simply run `DQMasterBatch` class using your IDE or Java tools with the following arguments
1819

19-
- local: default setting
20-
- cluster: set isLocal = false calling makeSparkContext() in `DQ/utils/DQMainClass`
20+
- __-a__: Path to application configuration file.
21+
> **Example:** ./Agile.DataQuality/dq-core/src/main/resources/conf/dev.conf
2122
22-
- Run DQ. You can either run DQ via scheduled or provided mode (shell):
23+
- __-c__: Path to run configuration file.
24+
> **Example:** ./Agile.DataQuality/docs/examples/conf/full-prostprocess-example.conf
2325
24-
- `run.sh`, takes parameters from command line:
25-
**-n**, Spark job name
26-
**-c**, Path to configuration file
27-
**-r**, Indicates the date at which the DataQuality checks will be performed
28-
**-d**, Specifies whether the application is operating under debugging conditions
29-
**-h**, Path to hadoop configuration
30-
---
26+
- __-d__: Run date.
27+
> **Example:** 2019-01-01
28+
29+
- __-l__: _Optional._ Flag to run in local mode.
30+
31+
- __-r__: _Optional._ Flag to repartition sources after reading.
32+
33+
#### Distributed environment
34+
35+
##### Deployment
36+
Primarily you'll need to deploy your application to the cluster. You can assemble the jar on your own using sbt
37+
or you can use some of our predefined utilities.
38+
39+
To use our `deploy.sh` script follow the following steps:
40+
- Setup REMOTE_HOST and REMOTE_USERNAME in the `deploy.sh`.
41+
- Create an `application.conf` for your environment.
42+
- Create a directory with the internal directories `bin` and `conf`. In the corresponding directories put your
43+
run scripts and configuration files.
44+
> **Tip:** You can use `run-default.sh` as a base for your run script.
45+
- Link `application.conf` file and directory with run scripts and confs to the correnspontig parameter values
46+
in the `build.sbt`.
47+
- Run `deploy.sh` with your parameters.
48+
49+
##### Submitting
50+
In distributed environment Data Quality application is being treated as a standard Spark Job, submitted
51+
by `submit.sh` script.
52+
53+
You can submit your job manually to leverage it on a run script. This is completely up to you.

0 commit comments

Comments
 (0)