LIPN CNRS UMR 7030, Machine Learning team (https://github.com/Spark-clustering-notebook/coliseum/wiki) docker image for demonstration purpose.
See (https://hub.docker.com/r/spartakus/coliseum/) for online image.
First you need to pull the docker image locally.
docker pull spartakus/coliseum:0.3.2Then you can run the container with these parameters:
--rmcleans the container at shutdown-itstarts the container in iteractive mode-m 8gallocates8Gbto the container--net=hostmeans that the container isn't using a dedicated (virtalized) network, but the current host one (on Mac this is the networking used by the virtual machine though)
docker run --rm -it -m 8g --net=host spartakus/coliseum:0.3.2 bashNote for developers, check the Development section below.
In the started shell, use the following commands
source var.sh
source start.sh
source create.shFor Mac, Msft users:
You're running docker via a VM then you need to replace
localhostby the VM's IP which can be retrieved this way.boot2docker ipOr
See : https://docs.docker.com/machine/migrate-to-machine/ docker-machine start docker-vm docker-machine env docker-vm eval $(docker-machine env docker-vm) docker-machine ip docker-vm
Open browser at http://localhost:9000/tree/coliseum.
docker build -t spartakus/coliseum:0.3.2 .Until libraries are deployed publicly, we'll have to build them locally in
- ivy2 if scala 2.10
- m2 if scala 2.11
Then refer them from the notebooks using the artifact id in the metadata, and we mount the local repository in the docker container (see below section).
On the host machine (that runs docker), we need to deploy the dependency locally, then we'll make it available in docker (using folder mounting).
git clone https://github.com/Spark-clustering-notebook/Mean-Shift-LSH.git
cd Mean-Shift-LSH
sbt publishM2
sbt publishLocalWhen ready to release on Bintray, use publish instead.
git clone https://github.com/Spark-clustering-notebook/G-stream.git
cd G-stream
sbt publishM2
sbt publishLocalgit clone https://github.com/Spark-clustering-notebook/SOM-MR-2.git
cd SOM-MR-2
sbt publishM2
sbt publishLocalWhen ready to release on Bintray, use publish instead.
This uses a variable LOCAL_NOTEBOOKS which refers to a local directory containing the notebooks you want to include and keep up to date during the session.
Another folder you might want to sync is the data dir, which uses LOCAL_DATA then.
Also, it's recommended to use your own ivy repository, especially because some libs aren't available online (like mean shift lsh), hence you can publishLocal any libs on the host machine then point you .ivy2 to the docker container's ones. This will use the $HOST_REPO.
export LOCAL_NOTEBOOKS=<path to local notebooks dir>
export LOCAL_DATA=<path to local data dir>
export HOST_REPO=$(realpath $HOME/.ivy2)
docker run \
-v $LOCAL_NOTEBOOKS:/root/spark-notebook/notebooks/coliseum \
-v $HOST_REPO:/root/.ivy2 \
-v $LOCAL_DATA:/root/data/coliseum \
--rm -it -m 8g \
-p 19000:9000 \
-p 14040:4040 -p 14041:14041 -p 14042:4042 -p 14043:14043 \
spartakus/coliseum:0.3.2 \
bash