Metacharacter

A metacharacter is a character that has a special meaning (because of a literal meaning) to a computer program, such as a shell interpreter or a regular expression (regex) engine.

In POSIX extended regular expressions,[1] there are 14 metacharacters that must be preceded by a backslash "\" in order to drop their special meaning and be treated literally inside an expression: the open/close square brackets, "[" and "]"; the backslash "\"; the caret "^"; the dollar sign "$"; the period or dot "."; the vertical bar or pipe symbol "|"; the question mark "?"; the asterisk "*"; the plus-sign "+"; open/close curly braces, "{" and "}"; and open/close parenthesis, "(" and ")".[2]

To use any of these characters as a literal in a regex, they must be escaped with a backslash. For example, to match the arithmetic expression "(1+1)*3=6" with a regex, the correct regex is "\(1\+1\)\*3=6". Otherwise, the parenthesis, plus-sign, and asterisk will have special meanings.

Examples

Escaping

The term "To escape a metacharacter" means to make the metacharacter ineffective (to strip it out of its special meaning) and hence to be used in its regular direct form. For example, in PCRE, a period (.) stands for "any single character can come here", and a more concrete example would be A.C, while the period between them can be B (or even a single spacing) or any other applicable character (a single period stands for exactly one character); If we escape the period, it will lose its potency as a metacharacter and will be just what it is - A period.

The usual way to escape characters in a regex is with the backslash symbol (\). Another way is a double hyphen (--) which makes a total escaping of a row.

See also

References


This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.