乐趣区

HashMap 常见应用:实现 SQL JOIN

在我的上一篇文章中,讲到了我自己初步认识 HashMap 的一个经验分享:HashMap 浅析 —— LeetCode Two Sum 刷题总结。作为一个 CRUD 工程师,平时很少接触到基础组件的涉及,那么是不是很难有机会用到 HashMap 呢?
今天,就举一个常见的查询例子,来看看我们如何使用 HashMap 来提高代码的效率。
已知一个 Student 类:
public class Student {
private Long id;

private String name;

public Student(Long id, String name) {
this.id = id;
this.name = name;
}

// —Getters And Setters—
}
和一个 Score 类:
public class Score {
private Long studentId;

private String mathScore;

private String englishScore;

public Score(Long studentId, String mathScore, String englishScore) {
this.studentId = studentId;
this.mathScore = mathScore;
this.englishScore = englishScore;
}

// —Getters And Setters—
}
我们需要把 Student 和 Score 合并到一起,即类 Report:
public class Report {
private Long studentId;

private String studentName;

private String mathScore;

private String englishScore;

public Report(Long studentId, String studentName, String mathScore, String englishScore) {
this.studentId = studentId;
this.studentName = studentName;
this.mathScore = mathScore;
this.englishScore = englishScore;
}
}
看类的属性我们就明白了,这里其实相当于在 Student 和 Score 之间做一个 Join,得到 Report。这是我们在编程中常见的场景(例如查询了订单中心,用户中心,支付中心,合并各个中心返回的结果形成一个表单,因为各个中心是独立的微服务,无法使用 SQL JOIN)。
现有两个 List:List<Student> 和 List<Score>:
List<Student> students = Arrays.asList(
new Student(1L, “Angle”),
new Student(2L, “Baby”)
);

List<Score> scores = Arrays.asList(
new Score(1L, “90”, “87”),
new Score(2L, “92”, “78”)
);
在学会使用 HashMap 之前,我可能会做一次双重循环:
List<Report> reports = new ArrayList<>();
for (Student student : students) {
for (Score score : scores) {
if (!student.getId().equals(score.getStudentId())) {
continue;
}

reports.add(
new Report(student.getId(), student.getName(), score.getMathScore(), score.getEnglishScore())
);
break;
}
}
时间复杂度最差的情况下是 O(n * m)。
但是使用 HashMap 来改善程序,就能得到不错的效果:
Map<Long, Student> map = new HashMap<>();
for (Student student : students) {
map.put(student.getId(), student);
}

List<Report> reports = new ArrayList<>();
for (Score score : scores) {
Student student = map.get(score.getStudentId());
if(student == null){// 避免 NPE
continue;
}
reports.add(
new Report(student.getId(), student.getName(), score.getMathScore(), score.getEnglishScore())
);
}
双重循环,变成了两次循环,时间复杂度是 O(n + m)。
显然要比前面的方法效果要好一些。笔者写了测试代码分别测试两个方法的效率,在 10w 数据下,执行时间如下:

差距好像挺大。想了解为什么 HashMap 能够得到如此好的效果,可以看我的这篇文章:HashMap 浅析 —— LeetCode Two Sum 刷题总结。如果读者有更好的解法欢迎留言交流,笔者水平有限,在算法上研究不多。
10w 数据的测试源码见下方,各位读者可以自行试验下效果:
package com.xiangyu.demo.hashmap;

import com.xiangyu.java.hashmap.Report;
import com.xiangyu.java.hashmap.Score;
import com.xiangyu.java.hashmap.Student;
import org.junit.Before;
import org.junit.Test;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class HashMapTest {
private List<Student> students = new ArrayList<>();

private List<Score> scores = new ArrayList<>();

@Before
public void before() {
// 每个 list 里放 10w 数据
for (long i = 0; i < 100000; i++) {
students.add(new Student(i, “test”));
scores.add(new Score(i, “95”, “95”));
}
}

@Test
public void TestHashMap() {
Map<Long, Student> map = new HashMap<>();
for (Student student : students) {
map.put(student.getId(), student);
}

List<Report> reports = new ArrayList<>();
for (Score score : scores) {
Student student = map.get(score.getStudentId());
if (student == null) {
continue;
}
reports.add(
new Report(student.getId(), student.getName(), score.getMathScore(), score.getEnglishScore())
);
}
System.out.println(reports.size());
}

@Test
public void testFor2() {
List<Report> reports = new ArrayList<>();
for (Student student : students) {
for (Score score : scores) {
if (!student.getId().equals(score.getStudentId())) {
continue;
}

reports.add(
new Report(student.getId(), student.getName(), score.getMathScore(), score.getEnglishScore())
);
break;
}
}
System.out.println(reports.size());
}
}

退出移动版