Scala and Spark for Big Data and Machine Learning
src:
https://bitbucket.org/matematika/scala-and-spark-for-big-data-and-machine-learning.git
DataFrames Overview
$ cd Spark_Dataframes
$ spark-shell
scala> :load DataFrame_Overview.scala
Spark DataFrame Operations
scala> :load DataFrame_Operations.scala
GroupBy and Aggregate Functions
scala> :load GroupBy_and_Agg.scala
Missing data
scala> :load MissingData.scala
Date and Timestamps
scala> :load Dates_and_Timestamps.scala
DataFrame Project - Solutions
scala> :load DataFrame_Project_Solution.scala
$ cd Machine_Learning_Sections/Regression/
scala> :load LinRegDocExample.scala
Linear Regression Walkthrough
scala> :load LinRegWalkthrough.scala
Linear Regression Project Solutions
scala> :load LinRegExerciseSolution.scala
Classification with Spark
$ cd Machine_Learning_Sections/Classification/
scala> :load Doc_LogReg.scala
scala> :load LogRegExample.scala
scala> :load LogRegProject_Solutions.scala
Model Evaluation
- Evaluators
- ParamGridBuildres
- TrainValidationSplit
$ cd Model_Validation/
scala> :load DocModelEvalEx.scala
scala> :load TrainTestandGrids.scala
Clustering with Spark
Example of KMeans with Spark
scala> :load Kmeans_Example.scala
scala> :load Kmeans_Project_Solutions.scala
Princilpal Component Analysis (PCA) with Spark
$ cd PCA/
scala> :load PCA_Doc_Example.scala
scala> :load PCA_Project_Exercise_Solutions.scala
DataBricks and Spark (Recommender system)
cd /Recommender_Systems/Recommender_Example.scala
Делает в облаке DataBricks.
Zeppelin Notebooks on AWS Elastic MapReduce (Платное решение от Amazon)
А сам zeppelin.apache.org очень похож на jupyter notebook.