举荐算法是会常常遇到的技术。次要解决的是问题是:如果你喜爱书 A,那么你可能会喜爱书 B。

本文咱们应用 MySQL ,基于数据统计,拆解实现了一个简略的举荐算法。

首先,创立一个 用户喜爱的书数据表,所示意的是 user\_id 喜爱 book\_id。

CREATE TABLE user_likes (    user_id INT NOT NULL,    book_id VARCHAR(10) NOT NULL,    PRIMARY KEY (user_id,book_id),    UNIQUE KEY book_id (book_id, user_id));CREATE TABLE user_likes_similar (    user_id INT NOT NULL,    liked_user_id INT NOT NULL,    rank INT NOT NULL,    KEY book_id (user_id, liked_user_id));

插入4条测试数据

INSERT INTO user_likes VALUES (1, 'A'), (1, 'B'), (1, 'C');INSERT INTO user_likes VALUES (2, 'A'), (2, 'B'), (2, 'C'), (2,'D');INSERT INTO user_likes VALUES (3, 'X'), (3, 'Y'), (3, 'C'), (3,'Z');INSERT INTO user_likes VALUES (4, 'W'), (4, 'Q'), (4, 'C'), (4,'Z');

代表的含意为:用户 1 喜爱A、B、C,用户 2 喜爱 A、B、C、D,用户 3 喜爱 X、Y、C、Z,用户 4 喜爱 W、Q、C、Z。

认为用户 1 计算举荐书籍为例,咱们须要计算用户 1 和其余用户的类似度,而后依据类似度排序。

清空类似度数据表

DELETE FROM user_likes_similar WHERE user_id = 1;

计算用户类似度数据表

INSERT INTO user_likes_similarSELECT 1 AS user_id, similar.user_id AS liked_user_id, COUNT(*) AS rank    FROM user_likes target    JOIN user_likes similar ON target.book_id= similar.book_id AND target.user_id != similar.user_id    WHERE target.user_id = 1    GROUP BY similar.user_id ;

能够看到查找到的类似度后果为

user_id, liked_user_id, rank1, 2, 21, 3, 11, 4, 1

而后依据类似度排序,取前 10 个,就是举荐的书籍了。

SELECT similar.book_id, SUM(user_likes_similar.rank) AS total_rank    FROM user_likes_similar    JOIN user_likes similar ON user_likes_similar.liked_user_id = similar.user_id    LEFT JOIN user_likes target ON target.user_id = 1 AND target.book_id = similar.book_id    WHERE user_likes_similar.user_id = 1 AND target.book_id IS NULL    GROUP BY similar.book_id    ORDER BY total_rank desc    LIMIT 10;