In this tutorial, we will study the very basics in cluster computing using the Scala/Python programming language
and the exciting Spark engine. Spark steadily becomes the state-of-the-art in cluster computing and big data processing and analytics
due to the excellent support it provides for several domains such as: SQL processing, Streaming, Machine Learning and Graphs.
In addition, Spark supports four programming languages: Scala, Java, Python and R.
|
To gain as much as possible from this lecture, you are advised to install the required software components in your machines.
In the provided notes you may find guidelines for installing Spark and Scala/Python in order to be able to
create Spark applications. If you are not feeling confortable with this, then you should provide yourself access
to a machine where Spark, Scala and Python are already installed and configured properly.
|
In this lecture, we are going to discuss general issues related to Spark and its basic architecture, the supported libraries and
the most important topics towards the design and implementation of efficient applications. Knowledge of Scala/Python is not required,
but for sure it is helpful. However, programming experience in any language is a plus. Anyway, this is about application development,
therefore you WILL make your hands dirty with code.
|