----------------------------------------------
hadoop fs -put $SPARK_HOME/data data
Basic Statistics:
--------------------------
$SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.MultivariateSummarizer --input $SPARK_HOME/data/mllib/sample_linear_regression_data.txt
$SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.Correlations --input $SPARK_HOME/data/mllib/sample_linear_regression_data.txt
$SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.RandomRDDGeneration
Classification and regression:
--------------------------
---------------------------------------------------------------------------
Problem Type ==> Supported Methods
---------------------------------------------------------------------------
Binary Classification ==> linear SVMs, logistic regression, decision trees, random forests, gradient-boosted trees, naive Bayes
Multiclass Classification ==> logistic regression, decision trees, random forests, naive Bayes
Regression ==> linear least squares, Lasso, ridge regression, decision trees, random forests, gradient-boosted trees, isotonic regression
---------------------------------------------------------------------------
linear SVMs:- $SPARK_HOME/bin/run-example mllib.BinaryClassificationMetricsExample
logistic regression:- $SPARK_HOME/bin/run-example mllib.MulticlassMetricsExample
naive Bayes:- $SPARK_HOME/bin/run-example mllib.NaiveBayesExample
decision trees:- $SPARK_HOME/bin/run-example mllib.DecisionTreeRegressionExample
decision trees:- $SPARK_HOME/bin/run-example mllib.DecisionTreeClassificationExample
random forests:- $SPARK_HOME/bin/run-example mllib.RandomForestClassificationExample
random forests:- $SPARK_HOME/bin/run-example mllib.RandomForestRegressionExample
gradient-boosted trees:- $SPARK_HOME/bin/run-example mllib.GradientBoostingRegressionExample
gradient-boosted trees:- $SPARK_HOME/bin/run-example mllib.GradientBoostingClassificationExample
isotonic regression:- $SPARK_HOME/bin/run-example mllib.IsotonicRegressionExample
$SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.BinaryClassification $SPARK_HOME/data/mllib/sample_binary_classification_data.txt
$SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.LinearRegression $SPARK_HOME/data/mllib/sample_linear_regression_data.txt
Collaborative filtering:
--------------------------
$SPARK_HOME/bin/run-example mllib.RecommendationExample
Clustering:
--------------------------
kmeans:- $SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.DenseKMeans $SPARK_HOME/data/mllib/kmeans_data.txt --k 3 --numIterations 5
Gaussian mixture:- $SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.DenseGaussianMixture $SPARK_HOME/data/mllib/kmeans_data.txt 3 5
power iteration clustering (PIC):- $SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.PowerIterationClusteringExample
latent Dirichlet allocation (LDA):- $SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.LDAExample $SPARK_HOME/data/mllib/sample_lda_data.txt
streaming k-means:- $SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.StreamingKMeansExample
bisecting k-means:-
---------------------------------------------------------
$SPARK_HOME/bin/run-example mllib.CosineSimilarity --threshold 0.1 $SPARK_HOME/data/mllib/sample_svm_data.txt
$SPARK_HOME/bin/run-example mllib.FPGrowthExample --minSupport 0.8 --numPartition 2 $SPARK_HOME/data/mllib/sample_fpgrowth.txt
$SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.MovieLensALS $SPARK_HOME/data/mllib/sample_movielens_data.txt
$SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.SampledRDDs --input $SPARK_HOME/data/mllib/sample_linear_regression_data.txt
$SPARK_HOME/bin/run-example mllib.StreamingTestExample file:/home/orienit/spark/input/test
----------------------------------------------
$SPARK_HOME/bin/run-example ml.CrossValidatorExample
$SPARK_HOME/bin/run-example org.apache.spark.examples.ml.DataFrameExample --input $SPARK_HOME/data/mllib/sample_libsvm_data.txt
$SPARK_HOME/bin/run-example org.apache.spark.examples.ml.KMeansExample $SPARK_HOME/data/mllib/kmeans_data.txt 3
$SPARK_HOME/bin/run-example org.apache.spark.examples.ml.LinearRegressionExample --regParam 0.15 --elasticNetParam 1.0 $SPARK_HOME/data/mllib/sample_linear_regression_data.txt
$SPARK_HOME/bin/run-example org.apache.spark.examples.ml.LogisticRegressionExample --regParam 0.3 --elasticNetParam 0.8 $SPARK_HOME/data/mllib/sample_libsvm_data.txt
$SPARK_HOME/bin/run-example org.apache.spark.examples.ml.MovieLensALS --rank 10 --maxIter 15 --regParam 0.1 --movies $SPARK_HOME/data/mllib/als/sample_movielens_movies.txt --ratings $SPARK_HOME/data/mllib/als/sample_movielens_ratings.txt
Estimator, Transformer, and Param
----------------------------------------------
$SPARK_HOME/bin/run-example org.apache.spark.examples.ml.SimpleParamsExample
model selection via train validation split
----------------------------------------------
$SPARK_HOME/bin/run-example org.apache.spark.examples.mlTrainValidationSplitExample
----------------------------------------------
cd /home/orienit/spark/machine_learning_examples/ml-100k
val rawData = sc.textFile("file:/home/orienit/spark/input/ml-100k/u.data")
rawData.first()
val rawRatings = rawData.map(_.split("\t").take(3))
rawRatings.first()
import org.apache.spark.mllib.recommendation.ALS
import org.apache.spark.mllib.recommendation.Rating
val ratings = rawRatings.map { case Array(user, movie, rating) => Rating(user.toInt, movie.toInt, rating.toDouble) }
ratings.first()
val model = ALS.train(ratings, 50, 10, 0.01)
model.userFeatures
model.userFeatures.count
model.productFeatures.count
val predictedRating = model.predict(789, 123)
val userId = 789
val K = 10
val topKRecs = model.recommendProducts(userId, K)
println(topKRecs.mkString("\n"))
----------------------------------------------
val movies = sc.textFile("file:/home/orienit/spark/input/ml-100k/u.item")
val titles = movies.map(line => line.split("\\|").take(2)).map(array => (array(0).toInt, array(1))).collectAsMap()
titles(123)
val moviesForUser = ratings.keyBy(_.user).lookup(789)
hadoop fs -put $SPARK_HOME/data data
Basic Statistics:
--------------------------
$SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.MultivariateSummarizer --input $SPARK_HOME/data/mllib/sample_linear_regression_data.txt
$SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.Correlations --input $SPARK_HOME/data/mllib/sample_linear_regression_data.txt
$SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.RandomRDDGeneration
Classification and regression:
--------------------------
---------------------------------------------------------------------------
Problem Type ==> Supported Methods
---------------------------------------------------------------------------
Binary Classification ==> linear SVMs, logistic regression, decision trees, random forests, gradient-boosted trees, naive Bayes
Multiclass Classification ==> logistic regression, decision trees, random forests, naive Bayes
Regression ==> linear least squares, Lasso, ridge regression, decision trees, random forests, gradient-boosted trees, isotonic regression
---------------------------------------------------------------------------
linear SVMs:- $SPARK_HOME/bin/run-example mllib.BinaryClassificationMetricsExample
logistic regression:- $SPARK_HOME/bin/run-example mllib.MulticlassMetricsExample
naive Bayes:- $SPARK_HOME/bin/run-example mllib.NaiveBayesExample
decision trees:- $SPARK_HOME/bin/run-example mllib.DecisionTreeRegressionExample
decision trees:- $SPARK_HOME/bin/run-example mllib.DecisionTreeClassificationExample
random forests:- $SPARK_HOME/bin/run-example mllib.RandomForestClassificationExample
random forests:- $SPARK_HOME/bin/run-example mllib.RandomForestRegressionExample
gradient-boosted trees:- $SPARK_HOME/bin/run-example mllib.GradientBoostingRegressionExample
gradient-boosted trees:- $SPARK_HOME/bin/run-example mllib.GradientBoostingClassificationExample
isotonic regression:- $SPARK_HOME/bin/run-example mllib.IsotonicRegressionExample
$SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.BinaryClassification $SPARK_HOME/data/mllib/sample_binary_classification_data.txt
$SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.LinearRegression $SPARK_HOME/data/mllib/sample_linear_regression_data.txt
Collaborative filtering:
--------------------------
$SPARK_HOME/bin/run-example mllib.RecommendationExample
Clustering:
--------------------------
kmeans:- $SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.DenseKMeans $SPARK_HOME/data/mllib/kmeans_data.txt --k 3 --numIterations 5
Gaussian mixture:- $SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.DenseGaussianMixture $SPARK_HOME/data/mllib/kmeans_data.txt 3 5
power iteration clustering (PIC):- $SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.PowerIterationClusteringExample
latent Dirichlet allocation (LDA):- $SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.LDAExample $SPARK_HOME/data/mllib/sample_lda_data.txt
streaming k-means:- $SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.StreamingKMeansExample
bisecting k-means:-
---------------------------------------------------------
$SPARK_HOME/bin/run-example mllib.CosineSimilarity --threshold 0.1 $SPARK_HOME/data/mllib/sample_svm_data.txt
$SPARK_HOME/bin/run-example mllib.FPGrowthExample --minSupport 0.8 --numPartition 2 $SPARK_HOME/data/mllib/sample_fpgrowth.txt
$SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.MovieLensALS $SPARK_HOME/data/mllib/sample_movielens_data.txt
$SPARK_HOME/bin/run-example org.apache.spark.examples.mllib.SampledRDDs --input $SPARK_HOME/data/mllib/sample_linear_regression_data.txt
$SPARK_HOME/bin/run-example mllib.StreamingTestExample file:/home/orienit/spark/input/test
----------------------------------------------
$SPARK_HOME/bin/run-example ml.CrossValidatorExample
$SPARK_HOME/bin/run-example org.apache.spark.examples.ml.DataFrameExample --input $SPARK_HOME/data/mllib/sample_libsvm_data.txt
$SPARK_HOME/bin/run-example org.apache.spark.examples.ml.KMeansExample $SPARK_HOME/data/mllib/kmeans_data.txt 3
$SPARK_HOME/bin/run-example org.apache.spark.examples.ml.LinearRegressionExample --regParam 0.15 --elasticNetParam 1.0 $SPARK_HOME/data/mllib/sample_linear_regression_data.txt
$SPARK_HOME/bin/run-example org.apache.spark.examples.ml.LogisticRegressionExample --regParam 0.3 --elasticNetParam 0.8 $SPARK_HOME/data/mllib/sample_libsvm_data.txt
$SPARK_HOME/bin/run-example org.apache.spark.examples.ml.MovieLensALS --rank 10 --maxIter 15 --regParam 0.1 --movies $SPARK_HOME/data/mllib/als/sample_movielens_movies.txt --ratings $SPARK_HOME/data/mllib/als/sample_movielens_ratings.txt
Estimator, Transformer, and Param
----------------------------------------------
$SPARK_HOME/bin/run-example org.apache.spark.examples.ml.SimpleParamsExample
model selection via train validation split
----------------------------------------------
$SPARK_HOME/bin/run-example org.apache.spark.examples.mlTrainValidationSplitExample
----------------------------------------------
cd /home/orienit/spark/machine_learning_examples/ml-100k
val rawData = sc.textFile("file:/home/orienit/spark/input/ml-100k/u.data")
rawData.first()
val rawRatings = rawData.map(_.split("\t").take(3))
rawRatings.first()
import org.apache.spark.mllib.recommendation.ALS
import org.apache.spark.mllib.recommendation.Rating
val ratings = rawRatings.map { case Array(user, movie, rating) => Rating(user.toInt, movie.toInt, rating.toDouble) }
ratings.first()
val model = ALS.train(ratings, 50, 10, 0.01)
model.userFeatures
model.userFeatures.count
model.productFeatures.count
val predictedRating = model.predict(789, 123)
val userId = 789
val K = 10
val topKRecs = model.recommendProducts(userId, K)
println(topKRecs.mkString("\n"))
----------------------------------------------
val movies = sc.textFile("file:/home/orienit/spark/input/ml-100k/u.item")
val titles = movies.map(line => line.split("\\|").take(2)).map(array => (array(0).toInt, array(1))).collectAsMap()
titles(123)
val moviesForUser = ratings.keyBy(_.user).lookup(789)
good content
ReplyDeleteBig Data and Hadoop Online Training