Spark in Action Video Edition
$ cd ~
$ git clone https://github.com/spark-in-action/first-edition.git
$ mkdir spark_project
$ cd spark_project
$ wget https://raw.githubusercontent.com/spark-in-action/first-edition/master/spark-in-action-box.json
$ vagrant box add spark-in-action-box.json
$ vagrant init manning/spark-in-action
$ vagrant up
$ vagrant ssh
password: vagrant
// Чтобы остановить
$ vagrant halt
$ git clone https://github.com/spark-in-action/first-edition
$ /usr/locl/hadoop/sbin/start-dfs.sh
$ /usr/locl/hadoop/sbin/stop-dfs.sh
Eclipse
Устанавливаем jdk8, scala
Загружаем: javaee enterprise
Help –> Install new Software
Add
scala-ide
http://download.scala-ide.org/sdk/lithium/e44/scala211/stable/site
Инсталлируем только: Scala IDE for Eclipse
Add
m2eclipse-scala
http://alchim31.free.fr/m2e-scala/update-site
Import Maven Remote Archetype Catalogs in Eclipse
Step 1 : Open maven preferences in eclipse. Go to Windows -> Preferences -> Maven -> Archetypes . Maven archetype option in eclipse.
Step 2 : Add remote catalog file. Click on Add Remote Catalog button. ...
Step 3 : Verify remote archetypes. To verify that you can now access to all archetypes, create a new maven project.
File –> New –> Project –> Maven –> Maven Project
Catalog File: https://raw.githubusercontent.com/spark-in-action/scala-archetype-sparkinaction/master/archetype-catalog.xml
Description: Spart in Action
Preparing the GitHub archive dataset (1.1 GB)
$ mkdir -p ~/sia/github-archive
$ cd ~/sia/github-archive
$ wget http://data.githubarchive.org/2015-03-01-{0..23}.json.gz
$ gunzip *
$ sudo apt install -y jq
$ head -n 1 2015-03-01-0.json | jq '.'