Scala and Spark for Big Data and Machine Learning

src:
https://bitbucket.org/matematika/scala-and-spark-for-big-data-and-machine-learning.git


DataFrames Overview

$ cd Spark_Dataframes

$ spark-shell
scala> :load DataFrame_Overview.scala


Spark DataFrame Operations

scala> :load DataFrame_Operations.scala


GroupBy and Aggregate Functions

scala> :load GroupBy_and_Agg.scala


Missing data

scala> :load MissingData.scala


Date and Timestamps

scala> :load Dates_and_Timestamps.scala


DataFrame Project - Solutions

scala> :load DataFrame_Project_Solution.scala


$ cd Machine_Learning_Sections/Regression/


scala> :load LinRegDocExample.scala


Linear Regression Walkthrough

scala> :load LinRegWalkthrough.scala


Linear Regression Project Solutions

scala> :load LinRegExerciseSolution.scala


Classification with Spark


$ cd Machine_Learning_Sections/Classification/


scala> :load Doc_LogReg.scala


scala> :load LogRegExample.scala


scala> :load LogRegProject_Solutions.scala


Model Evaluation

  • Evaluators
  • ParamGridBuildres
  • TrainValidationSplit


$ cd Model_Validation/


scala> :load DocModelEvalEx.scala


scala> :load TrainTestandGrids.scala


Clustering with Spark


Example of KMeans with Spark


scala> :load Kmeans_Example.scala


scala> :load Kmeans_Project_Solutions.scala


Princilpal Component Analysis (PCA) with Spark

$ cd PCA/


scala> :load PCA_Doc_Example.scala


scala> :load PCA_Project_Exercise_Solutions.scala


DataBricks and Spark (Recommender system)

cd /Recommender_Systems/Recommender_Example.scala

Делает в облаке DataBricks.


Zeppelin Notebooks on AWS Elastic MapReduce (Платное решение от Amazon)

А сам zeppelin.apache.org очень похож на jupyter notebook.