Spark in Action Video Edition

$ cd ~
$ git clone https://github.com/spark-in-action/first-edition.git
$ mkdir spark_project
$ cd spark_project
$ wget https://raw.githubusercontent.com/spark-in-action/first-edition/master/spark-in-action-box.json

$ vagrant box add spark-in-action-box.json

$ vagrant init manning/spark-in-action

$ vagrant up


$ vagrant ssh

password: vagrant

// Чтобы остановить
$ vagrant halt


$ git clone https://github.com/spark-in-action/first-edition


$ /usr/locl/hadoop/sbin/start-dfs.sh
$ /usr/locl/hadoop/sbin/stop-dfs.sh


Eclipse

Устанавливаем jdk8, scala

Загружаем: javaee enterprise

Help –> Install new Software



Add

scala-ide

http://download.scala-ide.org/sdk/lithium/e44/scala211/stable/site

Инсталлируем только: Scala IDE for Eclipse


Add

m2eclipse-scala

http://alchim31.free.fr/m2e-scala/update-site



Import Maven Remote Archetype Catalogs in Eclipse

Step 1 : Open maven preferences in eclipse. Go to Windows -> Preferences -> Maven -> Archetypes . Maven archetype option in eclipse.
Step 2 : Add remote catalog file. Click on Add Remote Catalog button. ...
Step 3 : Verify remote archetypes. To verify that you can now access to all archetypes, create a new maven project.


File –> New –> Project –> Maven –> Maven Project


Catalog File: https://raw.githubusercontent.com/spark-in-action/scala-archetype-sparkinaction/master/archetype-catalog.xml

Description: Spart in Action

Spark in Action Video Edition

Spark in Action Video Edition

Spark in Action Video Edition


Preparing the GitHub archive dataset (1.1 GB)

$ mkdir -p ~/sia/github-archive
$ cd ~/sia/github-archive
$ wget http://data.githubarchive.org/2015-03-01-{0..23}.json.gz
$ gunzip *

$ sudo apt install -y jq
$ head -n 1 2015-03-01-0.json | jq '.'



Spark in Action Video Edition


Spark in Action Video Edition