目标诉求,从类似"我的身份证***,我的电话是******,请不要找我。"这样的字符串提取身份证和电话。找遍中英文网络,发现都是validate(验证)而不是extract(提取)。最后在PHP官方文档的正则说明中找到解答PHP: preg_match - Manual
先给解决办法
$string="我的身份证***,我的电话是******,请不要找我。";
$string = preg_replace('/\s+/', '', $string); //remove all whitespace including tabs and line ends
$patternForPhone = '/(?:\+?86)?1(?:3\d{3}|5[^4\D]\d{2}|8\d{3}|7(?:[235-8]\d{2}|4(?:0\d|1[0-2]|9\d))|9[0-35-9]\d{2}|66\d{2})\d{6}/'; //pattern for chinese mobile phone,移动电话(手机)的正则表达式
$patternForID = '/([1-6][1-9]|50)\d{4}(18|19|20)\d{2}((0[1-9])|10|11|12)(([0-2][1-9])|10|20|30|31)\d{3}[0-9Xx]/i'; //pattern for ID, 身份证的正则表达式
preg_match($patternForPhone, $string, $phones);
preg_match($patternForID, $string, $IDs);
var_dump($phones, $IDs);
再引用原文:元字符^
和 $
是产生只能验证
而不能提取
的根本原因。
Simple regex
Regex quick reference
[abc] A single character: a, b or c
[^abc] Any single character but a, b, or c
[a-z] Any single character in the range a-z
[a-zA-Z] Any single character in the range a-z or A-Z
^ Start of line
$ End of line
\A Start of string
\z End of string
. Any single character
\s Any whitespace character
\S Any non-whitespace character
\d Any digit
\D Any non-digit
\w Any word character (letter, number, underscore)
\W Any non-word character
\b Any word boundary character
(...) Capture everything enclosed
(a|b) a or b
a? Zero or one of a
a* Zero or more of a
a+ One or more of a
a{3} Exactly 3 of a
a{3,} 3 or more of a
a{3,6} Between 3 and 6 of a
options: i case insensitive m make dot match newlines x ignore whitespace in regex o perform #{...} substitutions only once