Leetcode139拆分词句

jiezi

5 年前

给定一个非空字符串 s 和一个包含非空单词列表的字典 wordDict，判定 s 是否可以被空格拆分为一个或多个在字典中出现的单词。

说明：

拆分时可以重复使用字典中的单词。
你可以假设字典中没有重复的单词。
示例 1：

输入: s = "leetcode", wordDict = ["leet", "code"]
输出: true
解释: 返回 true 因为 "leetcode" 可以被拆分成 "leet code"。

示例 2：

输入: s = "applepenapple", wordDict = ["apple", "pen"]
输出: true
解释: 返回 true 因为 "applepenapple" 可以被拆分成 "apple pen apple"。注意你可以重复使用字典中的单词。

示例 3：

输入: s = "catsandog", wordDict = ["cats", "dog", "sand", "and", "cat"]
输出: false

这道题最开始我们想的肯定是每个字符遍历，然后去看是不是在 wordDict 里面。而 wordDict 是一个 list，查找是 o(N)的时间复杂度，需要把这个时间复杂度先降下来，用 Set 把每次查找的时间复杂度降到 o(1)。

怎么去 check 一个字符串 wordDict 能不能被组成，一个很朴素的想法就是把每个字符串分作两段，然后递归。比如如下代码。

public class Solution {public boolean wordBreak(String s, List<String> wordDict) {return helper(s, new HashSet<>(wordDict), 0);
    }
    public boolean helper(String s, Set<String> wordDict, int start) {if (start == s.length()) {return true;}
        for (int end = start + 1; end <= s.length(); end++) {if (wordDict.contains(s.substring(start, end)) 
                && helper(s, wordDict, end)) {return true;}
        }
        return false;
    }
}

很显然这种思路是行不通的，因为时间复杂度太高，有兴趣的同学可以试一下。

时间复杂度: O(n^n)
空间复杂度：O(n)

重复计算太多。哪里重复了？举个例子：

输入: s = "AAAleetcodeB", wordDict = ["leet", "code","A", "AA", "AAA"]
for 循环中：首次递归: s = 'A' + helper('AAleetcodeB'), 最终检查不符合;
二次递归: s = 'AA' + helper('AleetcodeB'), 最终检查不符合;
三次递归: s = 'AAA' + helper('leetcodeB'), 最终检查不符合;

发现没, 上面每一次都重复计算了helper('leetcodeB')。

节省时间的办法也很自然：要是我们能把搜索过的内容记下来就好了。记忆有两种办法可供参考：

动态规划
记忆化数组进行搜索

我们先看动态规划，动态规划其实很好理解，最重要的是 状态转移方程。不懂的同学，可以手动模拟一遍基本就理解了。

dp[i]表示[0, i] 子串是否能够由 wordDict 组成
dp[i] = 对于任意 j, dp[j] && wordDict 包含 s[j + 1, i]，其中 j 属于区间 [0, i]。

模拟一下动态规划的过程是:

输入: s = "AAAleetcodeB", wordDict = ["leet", "code","A", "AA", "AAA"]
dp[0] = true // 初始化.
首次 dp: dp[1] = true, wordDict 包含 'A';
二次 dp: dp[2] = true, 
        dp[1] = true, wordDict 包含 'A';
三次 dp: dp[3] = true,
        dp[1] = true, wordDict 包含 'AA';
...
最后一次 dp: dp[12] = false,
        dp[1]= true wordDict 不包含 'AAleetcodeB';
        dp[2]= true wordDict 不包含 'AleetcodeB';
        dp[3]= true wordDict 不包含 'leetcodeB';
        dp[7]= true wordDict 不包含 'codeB';
        dp[11]= true wordDict 不包含 'B'
        故,dp[12] = false.

java 版本代码如下:

class Solution {public boolean wordBreak(String s, List<String> wordDict) {if (s == null) return false;
        Set<String> wordSet = new HashSet<>();
        for (int i = 0; i < wordDict.size(); i++) {wordSet.add(wordDict.get(i));
        }
        boolean[] dp = new boolean[s.length() + 1];
        // 状态开始
        dp[0] = true;
        // dp[i]表示能不能到达第 i 个字母的时候
        for (int i = 1; i <= s.length(); i++) {for (int j = 0; j < i; j++) {String current = s.substring(j, i);
                if (dp[j] && wordSet.contains(current)) {dp[i] = true;
                    break;
                }
            }
        }
        return dp[s.length()];
    }
}

时间复杂度：O(n^2)。两层 for 循环。
空间复杂度: O(n)。dp 数组长度是 n。

动态规划和记忆化搜索都是很常用的解法，本题我们可以用一个数组memoToEndContain 记下位置 i 到字符串结束能不能够由 wordDict 组成。还是我们最开始的例子：

输入: s = "AAAleetcodeB", wordDict = ["leet", "code","A", "AA", "AAA"]
首次 for 循环: 'A' 可以被 wordDict 组成
        'AA' 可以被 wordDict 组成
        ...
        'AAAleetcodeB' 不可以被 wordDict 组成
        这次深搜后记住：从第一个字母 'A' 第二个字母 'A', 第三个字母 'A' ... 开始的子串都不能由 wordDict 组成;

java 代码：

public class Solution {public boolean wordBreak(String s, List<String> wordDict) {if (s == null) return false;
        Set<String> wordSet = new HashSet<>(wordDict);
        // 记忆从 i 到字符串结束能不能搜索到.
        Boolean[] memoToEndContain = new Boolean[s.length()];
        return dfs(s, 0, wordSet, memoToEndContain);
    }

    public boolean dfs(String s,
                       int start,
                       Set<String> wordSet,
                       Boolean[] memoToEndContain) {
        // 搜索到最后.
        if (start == s.length()) {return true;}
        // 之前已经搜索过.
        if (memoToEndContain[start] != null) {return memoToEndContain[start];
        }

        for (int end = start + 1; end <= s.length(); end++) {if (wordSet.contains(s.substring(start, end)) 
                && dfs(s, end, wordSet, memoToEndContain)) {return memoToEndContain[start] = true;
            }
        }
        memoToEndContain[start] = false;
        return memoToEndContain[start];
    }
}

时间复杂度：O(n^2)。搜索树的大小最多达到 n^2。
空间复杂度: O(n)。深度优先二叉搜索树深度最多是 n。

Leetcode139拆分词句

题目

思路分析

暴力搜索

记忆化搜索

动态规划

DFS