HashMap 常见应用：实现 SQL JOIN

jiezi

6 年前

在我的上一篇文章中，讲到了我自己初步认识 HashMap 的一个经验分享：HashMap 浅析 —— LeetCode Two Sum 刷题总结。作为一个 CRUD 工程师，平时很少接触到基础组件的涉及，那么是不是很难有机会用到 HashMap 呢？
今天，就举一个常见的查询例子，来看看我们如何使用 HashMap 来提高代码的效率。
已知一个 Student 类：
public class Student {
private Long id;

private String name;

public Student(Long id, String name) {
this.id = id;
this.name = name;
}

// —Getters And Setters—
}
和一个 Score 类：
public class Score {
private Long studentId;

private String mathScore;

private String englishScore;

public Score(Long studentId, String mathScore, String englishScore) {
this.studentId = studentId;
this.mathScore = mathScore;
this.englishScore = englishScore;
}

// —Getters And Setters—
}
我们需要把 Student 和 Score 合并到一起，即类 Report：
public class Report {
private Long studentId;

private String studentName;

private String mathScore;

private String englishScore;

public Report(Long studentId, String studentName, String mathScore, String englishScore) {
this.studentId = studentId;
this.studentName = studentName;
this.mathScore = mathScore;
this.englishScore = englishScore;
}
}
看类的属性我们就明白了，这里其实相当于在 Student 和 Score 之间做一个 Join，得到 Report。这是我们在编程中常见的场景（例如查询了订单中心，用户中心，支付中心，合并各个中心返回的结果形成一个表单，因为各个中心是独立的微服务，无法使用 SQL JOIN）。
现有两个 List：List<Student> 和 List<Score>：
List<Student> students = Arrays.asList(
new Student(1L, “Angle”),
new Student(2L, “Baby”)
);

List<Score> scores = Arrays.asList(
new Score(1L, “90”, “87”),
new Score(2L, “92”, “78”)
);
在学会使用 HashMap 之前，我可能会做一次双重循环：
List<Report> reports = new ArrayList<>();
for (Student student : students) {
for (Score score : scores) {
if (!student.getId().equals(score.getStudentId())) {
continue;
}

reports.add(
new Report(student.getId(), student.getName(), score.getMathScore(), score.getEnglishScore())
);
break;
}
}
时间复杂度最差的情况下是 O(n * m)。
但是使用 HashMap 来改善程序，就能得到不错的效果：
Map<Long, Student> map = new HashMap<>();
for (Student student : students) {
map.put(student.getId(), student);
}

List<Report> reports = new ArrayList<>();
for (Score score : scores) {
Student student = map.get(score.getStudentId());
if(student == null){// 避免 NPE
continue;
}
reports.add(
new Report(student.getId(), student.getName(), score.getMathScore(), score.getEnglishScore())
);
}
双重循环，变成了两次循环，时间复杂度是 O(n + m)。
显然要比前面的方法效果要好一些。笔者写了测试代码分别测试两个方法的效率，在 10w 数据下，执行时间如下：

差距好像挺大。想了解为什么 HashMap 能够得到如此好的效果，可以看我的这篇文章：HashMap 浅析 —— LeetCode Two Sum 刷题总结。如果读者有更好的解法欢迎留言交流，笔者水平有限，在算法上研究不多。
10w 数据的测试源码见下方，各位读者可以自行试验下效果：
package com.xiangyu.demo.hashmap;

import com.xiangyu.java.hashmap.Report;
import com.xiangyu.java.hashmap.Score;
import com.xiangyu.java.hashmap.Student;
import org.junit.Before;
import org.junit.Test;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class HashMapTest {
private List<Student> students = new ArrayList<>();

private List<Score> scores = new ArrayList<>();

@Before
public void before() {
// 每个 list 里放 10w 数据
for (long i = 0; i < 100000; i++) {
students.add(new Student(i, “test”));
scores.add(new Score(i, “95”, “95”));
}
}

@Test
public void TestHashMap() {
Map<Long, Student> map = new HashMap<>();
for (Student student : students) {
map.put(student.getId(), student);
}

List<Report> reports = new ArrayList<>();
for (Score score : scores) {
Student student = map.get(score.getStudentId());
if (student == null) {
continue;
}
reports.add(
new Report(student.getId(), student.getName(), score.getMathScore(), score.getEnglishScore())
);
}
System.out.println(reports.size());
}

@Test
public void testFor2() {
List<Report> reports = new ArrayList<>();
for (Student student : students) {
for (Score score : scores) {
if (!student.getId().equals(score.getStudentId())) {
continue;
}

reports.add(
new Report(student.getId(), student.getName(), score.getMathScore(), score.getEnglishScore())
);
break;
}
}
System.out.println(reports.size());
}
}