Skip to content

Conversation

@romainx
Copy link
Collaborator

@romainx romainx commented May 28, 2020

Some changes to the Spark documentation following the PR #911
for local and standalone use cases with the following drivers

  • Simplify some of them (removing options, etc.)
  • Use the same code as much as possible in each example to be consistent (only kept R different from the others)
  • Add Sparklyr as an option for R
  • Add some notes about prerequisites (same version of Python, R installed on workers)

Best

Some changes to the Spark documentation
for local and standalone use cases with the following drivers

* Simplify some of them (removing options, etc.)
* Use the same code as much as possible in each example to be consistent (only kept R different from the others)
* Add Sparklyr as an option for R
* Add some notes about prerequisites (same version of Python, R installed on workers)
@romainx romainx requested a review from parente May 28, 2020 10:29
@romainx romainx added the tag:Documentation Related to user, developer, and maintainer documentation label May 28, 2020
Copy link
Member

@parente parente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this cleanup, @romainx. The restructuring and simplification look good to me.

Were you able to prove that all the samples are functional by running them in the latest image?

Separate from this PR, we should consider putting all these snippets into tests for the all-spark-notebook image.

* Test added for all kernels
* Same examples as provided in the documentation (`specifics.md`)
* Used the same use case for all examples: sum of the first 100 whole numbers

Note: I've not automatically tested `local_sparklyr.ipynb` since it creates by default the `metastore_db` dir and the `derby.log` file in the working directory. Since I mount it in `RO` it's not working. I'm struggling to set it elsewhere...
@romainx
Copy link
Collaborator Author

romainx commented May 29, 2020

Thank you @parente

Yes all the examples are working. I've reworked them a bit to use the same use case for all: "sum of the first 100 whole numbers".

  • Test added for all kernels
  • Same examples as provided in the documentation (specifics.md)

Note: I've not automatically tested local_sparklyr.ipynb since it creates by default the metastore_db dir and the derby.log file in the working directory. Since I mount it in RO in the container it's not working (works in RW but it's not the solution). I'm struggling to set it elsewhere...

I let you decide but I think we could merge it and fix this after since it's not mandatory and since all kernels are tested (sparklyr is just another way to use the R kernel).

Best

@parente
Copy link
Member

parente commented May 29, 2020

It's great that you got most of the tests working already. Let's merge it! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tag:Documentation Related to user, developer, and maintainer documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants