Scala and Spark for Big Data and Machine Learning


DataFrames Overview

$ cd Spark_Dataframes

$ spark-shell
scala> :load DataFrame_Overview.scala

Spark DataFrame Operations

scala> :load DataFrame_Operations.scala

GroupBy and Aggregate Functions

scala> :load GroupBy_and_Agg.scala

Missing data

scala> :load MissingData.scala

Date and Timestamps

scala> :load Dates_and_Timestamps.scala

DataFrame Project - Solutions

scala> :load DataFrame_Project_Solution.scala

$ cd Machine_Learning_Sections/Regression/

scala> :load LinRegDocExample.scala

Linear Regression Walkthrough

scala> :load LinRegWalkthrough.scala

Linear Regression Project Solutions

scala> :load LinRegExerciseSolution.scala

Classification with Spark

$ cd Machine_Learning_Sections/Classification/

scala> :load Doc_LogReg.scala

scala> :load LogRegExample.scala

scala> :load LogRegProject_Solutions.scala

Model Evaluation

  • Evaluators
  • ParamGridBuildres
  • TrainValidationSplit

$ cd Model_Validation/

scala> :load DocModelEvalEx.scala

scala> :load TrainTestandGrids.scala

Clustering with Spark

Example of KMeans with Spark

scala> :load Kmeans_Example.scala

scala> :load Kmeans_Project_Solutions.scala

Princilpal Component Analysis (PCA) with Spark

$ cd PCA/

scala> :load PCA_Doc_Example.scala

scala> :load PCA_Project_Exercise_Solutions.scala

DataBricks and Spark (Recommender system)

cd /Recommender_Systems/Recommender_Example.scala

Делает в облаке DataBricks.

Zeppelin Notebooks on AWS Elastic MapReduce (Платное решение от Amazon)

А сам очень похож на jupyter notebook.