fixes core setup documentation

emakhov · emakhov · commit 70e29f5b98ed · 2019-04-04T15:59:58.000+02:00
diff --git a/build.sbt b/build.sbt
@@ -78,6 +78,7 @@ lazy val core = (project in file("dq-core"))
     assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = true),
     test in assembly := {},
     mappings in Universal += {
+      // TODO: Add paths application configuration files
       val confFile = buildEnv.value match {
         case BuildEnv.Dev => "path to application.conf"
         case BuildEnv.Test => "path to application.conf"
@@ -86,6 +87,7 @@ lazy val core = (project in file("dq-core"))
       ((resourceDirectory in Compile).value / confFile) -> "conf/application.conf"
     },
     mappings in Universal ++= {
+      // TODO: Add paths application integration files
       val integrationFolder = integrationEnv.value match {
         case IntegrationEnv.local => "path to integration directory"
       }
diff --git a/docs/installation/core-setup.md b/docs/installation/core-setup.md
@@ -1,30 +1,53 @@
-Using DQ
-------------
+## Data Quality core module setup and usage
 
-DQ is written in Scala, and the build is managed with SBT.
+DQ main application is written in Scala, and the build is managed with SBT.
 
-Before starting:
-- Install JDK
-- Install SBT
-- Install Git
+> **Before starting:** Install JDK, Scala, sbt and Git.
 
-The steps to getting DQ up and running for development are pretty simple:
+First of all, clone this repository:
+```
+git clone https://github.com/agile-lab-dev/DataQuality.git
+```
 
-- Clone this repository:
+Then you have 2 options:
+- Run DQ on local
+- Create an archive with setup to run in your distributed environment
 
-    `git clone https://github.com/agile-lab-dev/DataQuality.git`
+#### Local run
 
-- Start DQ. You can either run DQ in local or cluster mode:
+Simply run `DQMasterBatch` class using your IDE or Java tools with the following arguments
 
-    - local: default setting
-    - cluster: set isLocal = false calling makeSparkContext() in `DQ/utils/DQMainClass`
+- __-a__: Path to application configuration file.
+> **Example:** ./Agile.DataQuality/dq-core/src/main/resources/conf/dev.conf
 
-- Run DQ. You can either run DQ via scheduled or provided mode (shell):
+- __-c__: Path to run configuration file.
+> **Example:** ./Agile.DataQuality/docs/examples/conf/full-prostprocess-example.conf
 
-    - `run.sh`, takes parameters from command line:
-        **-n**, Spark job name
-        **-c**, Path to configuration file
-        **-r**, Indicates the date at which the DataQuality checks will be performed
-        **-d**, Specifies whether the application is operating under debugging conditions
-        **-h**, Path to hadoop configuration
----
+- __-d__: Run date.
+> **Example:** 2019-01-01
+
+- __-l__: _Optional._ Flag to run in local mode.
+
+- __-r__: _Optional._ Flag to repartition sources after reading.
+
+#### Distributed environment
+
+##### Deployment
+Primarily you'll need to deploy your application to the cluster. You can assemble the jar on your own using sbt
+ or you can use some of our predefined utilities.
+ 
+To use our `deploy.sh` script follow the following steps:
+- Setup REMOTE_HOST and REMOTE_USERNAME in the `deploy.sh`.
+- Create an `application.conf` for your environment.
+- Create a directory with the internal directories `bin` and `conf`. In the corresponding directories put your
+ run scripts and configuration files.
+ > **Tip:** You can use `run-default.sh` as a base for your run script.
+- Link `application.conf` file and directory with run scripts and confs to the correnspontig parameter values
+in the `build.sbt`.
+- Run `deploy.sh` with your parameters.
+
+##### Submitting
+In distributed environment Data Quality application is being treated as a standard Spark Job, submitted
+ by `submit.sh` script.
+ 
+You can submit your job manually to leverage it on a run script. This is completely up to you.