这是leetcode上面的第十题，前面的第五题和第六题没有做，因为感觉字符串处理还是弱项，打算之后一起做，而再后面的第七题，第八题，第九题，比较简单无脑，就不记录了。

问题描述

. Matches any single character.

* Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:

bool isMatch(const char *s, const char *p)

Some examples:

isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

想法与实现

首先这道题其实不是正宗的正则表达式匹配问题，一开始我是把它当正则来做的，但是在处理*的时候遇到了一些问题。这里的*的意思并不是可以匹配0+个字符，而是对于前面的那个字符，可以匹配0+个。比如在example里面的最后一个用例，isMatch("aab", "c*a*b") → true，其中的c*a*b，他的意思是这样的：一个有N个c，N个a，一个b的字符串。这其实处理起来，比正规的正则还是要更加麻烦一些的。

标准正则

那就先看看如何实现对于标准的正则的匹配吧。第一个想法是，先正则表达式转成NFA，然后转成DFA，用DFA去识别字符串。是受编译原理荼毒太深= =这样实现太复杂了。后来的想法就是，利用这种状态转换的思想，在字符串p中的一个字符就是一种状态，然后让字符串s一个个去匹配p，如果字符不是*，那就状态转移至下一个字符，如果是的话，就穷举所有情况，看有没有情况可以匹配到p的最后一个状态的同时也匹配到s中的最后一个字符，那么可以就说明匹配成功，匹配的过程有点类似深度优先搜索。实现的代码大概是这样的~

bool isMatch(const char *s, const char *p) {
    string source = s;
    string pattern = p;
    return match(source, pattern, 0, 0);
}

bool match(string s, string p, int posS, int posP)
{
	if (posS == s.size() && posP == p.size())
		return true;
	else if ((posS == s.size() && posP < p.size()) || (posS < s.size() && posP == p.size()))
		return false;

	if (p[posP] != '*')
	{
		if (p[posP] == '.' || p[posP] == s[posS])
		{
			posS++;
			posP++;
			return match(s, p, posS, posP);
		}
	}
	else
	{
		bool stayInTheState = match(s, p, posS + 1, posP);
		bool jumpToNextSatte = match(s, p, posS + 1, posP + 1);
		if (stayInTheState == true || jumpToNextSatte == true)
			return true;
		else
			return false;
	}
}

代码大致是这样，整体应该还是比较容易理解的~可能会有小bug，但是大题思路是这样的。

本题的做法

本题中，跟正则唯一有不同的地方在于*的不同，处理起来会稍微麻烦一些，但是整体的思路还是跟标准的正则差不多~现在的情况下，处理*需要考虑两个字符，那么就把这两个字符当成一个状态，就可以了。

bool isMatch(const char *s, const char *p) {
    string source = s;
    string pattern = p;
    return match(source, pattern, 0, 0);
}

bool match(string s, string p, int posS, int posP)
{
	if (posP == p.size())
        return posS == s.size();

    if (posP + 1 == p.size() && p[posP + 1] != '*')
    {
        if (p[posP] == '.' || p[posP] == s[posS])
        {
            posS++;
            posP++;
            return match(s, p, posS, posP);
        }
        else
            return false;
    }
	else if (p[posP + 1] != '*')
	{
		if (p[posP] == '.' || p[posP] == s[posS])
		{
			posS++;
			posP++;
			return match(s, p, posS, posP);
		}
        else
            return false;
	}
	else
	{
        while ((p[posP] == s[posS]) || (p[posP] == '.' && posS < s.size())) 
        {
            if (match(s, p, posS, posP + 2)) 
                return true;
            posS++;
        }
        return match(s, p, posS, posP + 2);
	}
}

上面的实现，用时还是比较长的，会有440MS，可以做的提高是Cache，缓存每次的结果，因为这些结果都是不会变化的，所以这样就不需要每次都递归求解。在官方给的Tag里还有DP，不知道DP是怎么一种解法。

LeetCode-10 Regular Expression Matching记录

问题描述

想法与实现

标准正则

本题的做法

评论

高策