Building Recommender Systems with Machine Learning and AI

https://sundog-education.com/recsys/

Установили anaconda

https://www.anaconda.com/

$ conda install -c anaconda anaconda-navigator
$ conda install -c conda-forge scikit-surprise
$ anaconda-navigator


Запускаем Spyder

$ anaconda-navigator > Spyder

http://media.sundog-soft.com/RecSys/RecSys-Materials.zip

http://media.sundog-soft.com/RecSys/ml-latest-small.zip

Разархивизовать каталоги и ml-latest-small поместить в RecSys-Materials

GettingStarted/GettingStarted.py


03 Evaluating Recommender Systems

RUN --> Evaluating/TestMetrics.py


04 A Recommender Engine Framework

SVD - Singular Value Decomposition

RUN --> Framework/RecsBakeOff.py

Loading movie ratings...

Computing movie popularity ranks so we can measure novelty later...
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Evaluating  SVD ...
Evaluating accuracy...
Evaluating top-N with leave-one-out...
Computing hit-rate and rank metrics...
Computing recommendations with full data set...
Analyzing coverage, diversity, and novelty...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Analysis complete.
Evaluating  Random ...
Evaluating accuracy...
Evaluating top-N with leave-one-out...
Computing hit-rate and rank metrics...
Computing recommendations with full data set...
Analyzing coverage, diversity, and novelty...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Analysis complete.


Algorithm  RMSE       MAE        HR         cHR        ARHR       Coverage   Diversity  Novelty
SVD        0.9034     0.6978     0.0298     0.0298     0.0112     0.9553     0.0445     491.5768
Random     1.4385     1.1478     0.0089     0.0089     0.0015     1.0000     0.0719     557.8365

Legend:

RMSE:      Root Mean Squared Error. Lower values mean better accuracy.
MAE:       Mean Absolute Error. Lower values mean better accuracy.
HR:        Hit Rate; how often we are able to recommend a left-out rating. Higher is better.
cHR:       Cumulative Hit Rate; hit rate, confined to ratings above a certain threshold. Higher is better.
ARHR:      Average Reciprocal Hit Rank - Hit rate that takes the ranking into account. Higher is better.
Coverage:  Ratio of users for whom recommendations above a certain threshold exist. Higher is better.
Diversity: 1-S, where S is the average similarity score between every possible pair of recommendations
           for a given user. Higher means more diverse.
Novelty:   Average popularity rank of recommended items. Higher means more novel.


05 Content-Based Filtering

RUN --> ContentBased/ContentRecs.py
Loading movie ratings...

Computing movie popularity ranks so we can measure novelty later...
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Evaluating  ContentKNN ...
Evaluating accuracy...
Computing content-based similarity matrix...
0  of  8211
100  of  8211
200  of  8211
300  of  8211
400  of  8211
500  of  8211
600  of  8211
700  of  8211
800  of  8211
900  of  8211
1000  of  8211
1100  of  8211
1200  of  8211
1300  of  8211
1400  of  8211
1500  of  8211
1600  of  8211
1700  of  8211
1800  of  8211
1900  of  8211
2000  of  8211
2100  of  8211
2200  of  8211
2300  of  8211
2400  of  8211
2500  of  8211
2600  of  8211
2700  of  8211
2800  of  8211
2900  of  8211
3000  of  8211
3100  of  8211
3200  of  8211
3300  of  8211
3400  of  8211
3500  of  8211
3600  of  8211
3700  of  8211
3800  of  8211
3900  of  8211
4000  of  8211
4100  of  8211
4200  of  8211
4300  of  8211
4400  of  8211
4500  of  8211
4600  of  8211
4700  of  8211
4800  of  8211
4900  of  8211
5000  of  8211
5100  of  8211
5200  of  8211
5300  of  8211
5400  of  8211
5500  of  8211
5600  of  8211
5700  of  8211
5800  of  8211
5900  of  8211
6000  of  8211
6100  of  8211
6200  of  8211
6300  of  8211
6400  of  8211
6500  of  8211
6600  of  8211
6700  of  8211
6800  of  8211
6900  of  8211
7000  of  8211
7100  of  8211
7200  of  8211
7300  of  8211
7400  of  8211
7500  of  8211
7600  of  8211
7700  of  8211
7800  of  8211
7900  of  8211
8000  of  8211
8100  of  8211
8200  of  8211
...done.
Analysis complete.
Evaluating  Random ...
Evaluating accuracy...
Analysis complete.


Algorithm  RMSE       MAE
ContentKNN 0.9375     0.7263
Random     1.4385     1.1478

Legend:

RMSE:      Root Mean Squared Error. Lower values mean better accuracy.
MAE:       Mean Absolute Error. Lower values mean better accuracy.

Using recommender  ContentKNN

Building recommendation model...
Computing content-based similarity matrix...
0  of  9066
100  of  9066
200  of  9066
300  of  9066
400  of  9066
500  of  9066
600  of  9066
700  of  9066
800  of  9066
900  of  9066
1000  of  9066
1100  of  9066
1200  of  9066
1300  of  9066
1400  of  9066
1500  of  9066
1600  of  9066
1700  of  9066
1800  of  9066
1900  of  9066
2000  of  9066
2100  of  9066
2200  of  9066
2300  of  9066
2400  of  9066
2500  of  9066
2600  of  9066
2700  of  9066
2800  of  9066
2900  of  9066
3000  of  9066
3100  of  9066
3200  of  9066
3300  of  9066
3400  of  9066
3500  of  9066
3600  of  9066
3700  of  9066
3800  of  9066
3900  of  9066
4000  of  9066
4100  of  9066
4200  of  9066
4300  of  9066
4400  of  9066
4500  of  9066
4600  of  9066
4700  of  9066
4800  of  9066
4900  of  9066
5000  of  9066
5100  of  9066
5200  of  9066
5300  of  9066
5400  of  9066
5500  of  9066
5600  of  9066
5700  of  9066
5800  of  9066
5900  of  9066
6000  of  9066
6100  of  9066
6200  of  9066
6300  of  9066
6400  of  9066
6500  of  9066
6600  of  9066
6700  of  9066
6800  of  9066
6900  of  9066
7000  of  9066
7100  of  9066
7200  of  9066
7300  of  9066
7400  of  9066
7500  of  9066
7600  of  9066
7700  of  9066
7800  of  9066
7900  of  9066
8000  of  9066
8100  of  9066
8200  of  9066
8300  of  9066
8400  of  9066
8500  of  9066
8600  of  9066
8700  of  9066
8800  of  9066
8900  of  9066
9000  of  9066
...done.
Computing recommendations...

We recommend:
Presidio, The (1988) 3.841314676872932
Femme Nikita, La (Nikita) (1990) 3.839613347087336
Wyatt Earp (1994) 3.8125061475551796
Shooter, The (1997) 3.8125061475551796
Bad Girls (1994) 3.8125061475551796
The Hateful Eight (2015) 3.812506147555179
True Grit (2010) 3.812506147555179
Open Range (2003) 3.812506147555179
Big Easy, The (1987) 3.7835412549266985
Point Break (1991) 3.764158410102279

Using recommender  Random

Building recommendation model...
Computing recommendations...

We recommend:
Sleepers (1996) 5
Beavis and Butt-Head Do America (1996) 5
Fear and Loathing in Las Vegas (1998) 5
Happiness (1998) 5
Summer of Sam (1999) 5
Bowling for Columbine (2002) 5
Babe (1995) 5
Birdcage, The (1996) 5
Carlito's Way (1993) 5
Wizard of Oz, The (1939) 5