Vim Regex and Pattern Matching¶

See :help vim.re for regex through Nvim's Lua API.

Table of Contents¶

Very Magic
Vim Regex and Perl Regex
- Unique to Vim
- Unique to Perl
- Important Help Files
Metacharacters (Escaped Characters) and Character Classes
- Whitespace
- Digits
- Letters
- Special Characters
- Substitution Special Characters
Tricks
- Ignoring Case in a Pattern
Including End-of-Line (EOL) and Start-of-Line (SOL) in Pattern Matches
- Matching a Character Class and End of Line
Matching Start-of-Line after Another Atom
- Word Boundaries in Vim Regex
Overview of Multi Items
- Greedy Multis
- Non-Greedy Multis
- Non-greedy pattern-multi-items
- Matching Newlines / End-of-Line Inside a Collection
Matching Literal Key Characters
Dealing with Accents in Unicode
- Unicode Accents Example
Range of Operation
Capture Groups and Backreferences with Substitutions and Other Pattern Commands
Flags
- Two and Three Letter :substitute Commands
- Repeating Substitutions
Substitutions with Expressions
Quantifiers, Greedy and Non-Greedy
- Greedy
- Non-Greedy
Optionally Match Atoms
- Optional Matching Example
Match Inside the Visual Area
- Visual Area Matching Example
Match with the Cursor Position
Using Marks for Matching
Line Number Matching
- Using Line Numbers for Matching
- Using the Current Line for Matching
Matching with Start and End of the File
Matching with Columns
Matching After a Pattern
- Lookback (Matching after a pattern) Example
Matching After a NON-matching pattern
- Lookback (Match after a non-matching pattern) Example
Match Excluding the Preceding Atom
- Lookback (Matching the beginning of a pattern) Example
- Use Cases
Zero-Width
- Zero-Width Matching Example
Setting the Start of a Match with \zs
- Matching with \zs Example
Setting the End of a Match with \ze
- Ending match with \ze Example
Less Useful Patterns
Match if Previous Pattern Doesn't Match at the CURRENT Position
- Zero-Width Match after Non-Match Example
Match at the Current Position like a Single Pattern
- Matching with \@> Example
Matching Different Number Systems
Matching Decimal, Octal, and Hexadecimal Number Systems
- Examples
Collections / Sets
- Collection Limitations / Caveats
- Collection Examples
Good Ones to Remember
- Make it Non-Greedy

Very Magic¶

Using \v means that after it, all ASCII characters except 0-9, a-z, A-Z and _ have special meaning: "very magic".
This means that none of the special characters need to be escaped.
Using \V ("very nomagic") means that they ALL need to be escaped.

`\v`	`\m`	`\M`	`\V`	Matches
'v.Magic'	'Magic'	'NoMagic'	'v.NoMagic'
`a`	`a`	`a`	`a`	Literal `a`
`\a`	`\a`	`\a`	`\a`	Any alphabetic character
`.`	`.`	`\.`	`\.`	Any character
`\.`	`\.`	`.`	`.`	Literal dot
`$`	`$`	`$`	`\$`	End-of-line
`*`	`*`	`\*`	`\*`	Any number of the previous atom
`~`	`~`	`\~`	`\~`	Latest substitute string
`()`	``	``	``	Group as an atom
`\\|`	`\\\|`	`\\\|`	`\\\|`	Nothing: separates alternatives (logical `OR`)
`\\`	`\\`	`\\`	`\\`	Literal backslash
`\{`	`{`	`{`	`{`	Literal curly brace

Vim Regex and Perl Regex¶

Capability	Vim-speak	Perl-speak
force case insensitivity	`\c`	`(?i)`
force case sensitivity	`\C`	`(?-i)`
Non-capturing grouping	`\%(atom\)`	`(?:atom)`
0-width match	`atom\@=`	`(?=atom)`
0-width non-match	`atom\@!`	`(?!atom)`
0-width preceding match	`atom\@<=`	`(?<=atom)`
0-width preceding non-match	`atom\@<!`	`(?<!atom)`
match without retry	`atom\@>`	`(?>atom)`
conservative quantifiers	`\{-n,m}`	`*?`, `+?`, `??`, `{}?`

Vim beginnings and ends:
- Vim's ^ and $ always match at embedded newlines, and you get two separate atoms.
- With \%^ and \%$, you only match at the very start and end of the text.
Perl beginnings and ends:
- In Perl, ^ and $ only match at the beginning and end of the text by default.
  - But, you can set the m flag, which lets them match at embedded newlines as well.

Unique to Vim¶

Changing the magic-ness of a pattern: \v \V \m \M (very useful for avoiding backslashitis)
Sequence of optionally matching atoms: \%[atoms]
\& (which is to \| what "and" is to "or"; it forces several branches
to match at one spot)
Matching lines/columns by number: \%5l \%5c \%5v
Setting the start and end of the match: \zs \ze

Unique to Perl¶

Execution of arbitrary code in the regex: (?{perl code})
Conditional expressions: (?(condition)true-expr|false-expr)

Important Help Files¶

:h pattern-overview
:h ordinary-atom
:h character-classes
:syn-ext-match

Metacharacters (Escaped Characters) and Character Classes¶

`:h character-classes`¶

Whitespace:¶

Character Class	Matches
`.`	Any character except new line
`\s`	Any whitespace character
`\S`	non-whitespace character

Digits:¶

Character Class	Matches
`\d`	digit
`\D`	non-digit
`\x`	hex digit
`\X`	non-hex digit
`\o`	octal digit
`\O`	non-octal digit
`\%d`	Decimal (base10)
`\%o`	Octal (base8)
`\%x`	Hexadecimal (base16) up to 2 hexadecimal characters
`\%u`	Hexadecimal (base16) up to 4 hexadecimal characters
`\%u`	Hexadecimal (base16) up to 8 hexadecimal characters

NOTE: With %\o, Octal numbers below 0o40 must be followed by a non-octal digit or a non-digit.¶

Letters:¶

Character Class	Matches
`\h`	head of word character (`a-z`, `A-Z` and `_`)
`\H`	non-head of word character
`\p`	printable character
`\P`	like `\p`, but excluding digits
`\w`	word character
`\W`	non-word character
`\a`	alphabetic character
`\A`	non-alphabetic character
`\l`	lowercase character
`\L`	non-lowercase character
`\u`	uppercase character
`\U`	non-uppercase character

Special Characters:¶

Character Class	Matches
`\e`	matches `<Esc>`
`\t`	matches `<Tab>`
`\r`	matches `<CR>`
`\b`	matches `<BS>`
`\n`	matches an `EOL` (end-of-line)

Substitution Special Characters¶

See :h :s\= (or :h sub-replace-special)
Related: :h s/\= (or :h sub-replace-expression)

magic	nomagic	action
`&`	`\&`	Replaced with the whole matched pattern
`\&`	`&`	Replaced with `&`
	`\0`	Replaced with the whole matched pattern
	`\1`	Replaced with the matched pattern in the first capture group `()`
	`\2`	Replaced with the matched pattern in the second capture group `()`
	`...`	... (`\3`, `\4`,...)
	`\9`	Replaced with the matched pattern in the ninth capture group `()`
`~`	`\~`	Replaced with the `{string}` of the previous substitute
`\~`	`~`	Replaced with `~`
	`\u`	Next character made uppercase
	`\U`	Following characters made uppercase, until \E
	`\l`	Next character made lowercase
	`\L`	Following characters made lowercase, until \E
	`\e`	End of \u, \U, \l and \L (NOTE: not `<Esc>`!)
	`\E`	End of \u, \U, \l and \L
	`<CR>`	Split line in two at this point (Type the `<CR>` as CTRL-V `<Enter>`)
	`\r`	Same as `<CR>`. Inserts a newline.
	`\<CR>`	Insert a carriage-return (CTRL-M) (Type the `<CR>` as CTRL-V `<Enter>`)
	`\n`	Insert a `<NL>` (`<NUL>` in the file) (does NOT break the line)
	`\b`	Insert a `<BS>`
	`\t`	Insert a `<Tab>`
	`\\`	Insert a single backslash
	`\x`	Where `x` is any character not mentioned above: Reserved for future expansion

Tricks:¶

To avoid needing to escape forward slashes / in a substitution, you can use a different seperator.

" Syntax:  
:s:pattern:replacement:flags  
" To replace all occurrences of "vi" with "vim"  
:%s:\<vi\>:vim:g

* \%[]: Optionally matches inside the collection/set [ ]

Note: inside the [ ] (collection), all metacharacters behave like ordinary characters.
If you want to include - (dash) in your range put it first:
* /[-0-9]/
Same with [:
* /[[0-9]

To avoid the need for escaping a lot of things (like capture groups), set the
very magic flag:

:s/\v(capture|any|of|these)/\1/g

The above substitution just replaces the captures with themselves, so no changes are made.

Ignoring Case in a Pattern¶

\c: will force the entire pattern to ignore case
\C: will enforce case-sensitive matching for the whole pattern

Including End-of-Line (EOL) and Start-of-Line (SOL) in Pattern Matches¶

Matching a Character Class and End of Line¶

Adding an underscore _ between the backslash and character
for a character class will make it also include end-of-line.
For example:

/\_s

will match whitespace, blank lines, and end-of-line.

Matching Start-of-Line after Another Atom¶

\_^: Matches start-of-line.
Example:
```
\_s*\_^foo  
```
This matches white space, end-of-lines, and blank lines, then "foo" at start-of-line.

Word Boundaries in Vim Regex¶

Word boundaries can be denoted by escaped angle brackets: \<word\>

Overview of Multi Items¶

pattern-overview

`\m`	`\M`	Matches
Magic	No Magic
`\_^`	`\_^`	start-of-line (used anywhere) `/zero-width`
`\_$`	`\_$`	end-of-line (used anywhere) `zero-width`
`\<`	`\<`	beginning of a word `zero-width`
`\>`	`\>`	end of a word `zero-width`
`\zs`	`\zs`	anything, sets start of match
`\ze`	`\ze`	anything, sets end of match
`\%^`	`\%^`	beginning of file `zero-width`
`\%$`	`\%$`	end of file `zero-width`
`\%V`	`\%V`	inside Visual area `zero-width`
`\%#`	`\%#`	cursor position `zero-width`
`\%'m`	`\%'m`	mark m position `zero-width`
`\%23l`	`\%23l`	in line 23 `zero-width`
`\%23c`	`\%23c`	in column 23 `zero-width`
`\%23v`	`\%23v`	in virtual column 23 `zero-width`

Greedy Multis¶

`\m`	`\M`	Matches of the Preceding Atom
Magic	No Magic	Greedy
`*`	`\*`	0 or more, as many as possible
`\+`	`\+`	1 or more, as many as possible
`\=`	`\=`	0 or 1, as many as possible
`\?`	`\?`	0 or 1, as many as possible
`\{n,m}`	`\{n,m}`	`n` to `m`, as many as possible
`\{n}`	`\{n}`	`n`, exactly
`\{n,}`	`\{n,}`	at least `n`,as many as possible
`\{,m}`	`\{,m}`	0 to `m`, as many as possible
`\{}`	`\{}`	0 or more, as many as possible (same as `*`)

Non-Greedy Multis¶

`\m`	`\M`	Matches of the Preceding Atom
Magic	No Magic	Non-Greedy
`\{-n,m}`	`\{-n,m}`	`n` to `m`, as few as possible
`\{-n}`	`\{-n}`	`n` exactly
`\{-n,}`	`\{-n,}`	at least `n` as few as possible
`\{-,m}`	`\{-,m}`	0 to `m` as few as possible
`\{-}`	`\{-}`	0 or more as few as possible

Remember:
- If a dash (-) appears immediately after the opening brace, {, then the shortest match first algorithm is used.
- i.e., \{-...} = Non-Greedy

Non-greedy `pattern-multi-items`:¶

`\m`	`\M`	Matches of the Preceding Atom
`\@>`	`\@>`	1, like matching a whole pattern
`\@=`	`\@=`	nothing, requires a match `zero-width`
`\@!`	`\@!`	nothing, requires NO match `zero-width`
`\@<=`	`\@<=`	nothing, requires a match behind `zero-width`
`\@<!`	`\@<!`	nothing, requires NO match behind `zero-width`

It's recommended to use \zs instead of \@<= with the new regex engine.

Matching Newlines / End-of-Line Inside a Collection¶

Since $ doesn't match newline/end-of-line in a collection, you'll need to use one of these:
* \_ / \n: When used inside a 'collection' ([ ])
* With \_ prepended a collection also includes the end-of-line.
* The same can be done by including \n in a collection.

Matching Literal Key Characters¶

To include a literal ], ^, - or \ in the collection, put a
backslash before it:
- [xyz\]], [\^xyz], [xy\-z] and [xyz\\].
- (Note: POSIX does not support the use of a backslash this way).

Dealing with Accents in Unicode¶

If there are unicode characters with accents, check \Z and \%C.
/[[=* *[==]: An equivalence class.
- This means that characters are matched that
  have almost the same meaning, e.g., when ignoring accents.
- This only works for Unicode, latin1 and latin9.

Unicode Accents Example¶

[=a=] will match characters like a, à, á, â, etc., because
they are all variations of the base character a with different accents.

Range of Operation¶

<number>: an absolute line number
.: the current line
$: the last line in the file
%: the whole file. The same as 1,$
't: position of mark "t"
/pattern[/]: the next line where text "pattern" matches.
?pattern[?]: the previous line where text "pattern" matches
\/: the next line where the previously used search pattern matches
\?: the previous line where the previously used search pattern matches
\&: the next line where the previously used substitute pattern matches

Capture Groups and Backreferences with Substitutions and Other Pattern Commands¶

Commands: `:s`, `:g`, `:v`¶

You can group parts of the pattern expression by enclosing them with $ and $ (escaped parentheses, unless very magic is set).

\(captured\)

Using \| you can combine several expressions into one, matching any of its components.
The first one matched will be used.

\(Date:\|Subject:\|From:\)\(\s.*\)

Then they can be referenced in the substitute with:
&: The whole matched pattern
\0: The whole matched pattern
\1, ..., \9: The matched pattern in the n'th capture group ($...$)
* The numbering is done based on which \( comes first in the pattern (left to right).
~: The previous substitute string
\L: The following characters are made lowercase
\U: The following characters are made uppercase
\E: End of \U and \L
\e: End of \U and \L
\r: Split line in two at this point
\b: Insert a <BS>
\l: Next character made lowercase
\u: Next character made uppercase
<CR>: Split line in two at this point (Type the <CR> as CTRL-Q <Enter>*)
\<CR>: Insert a carriage-return (CTRL-M) (Type the <CR> as CTRL-Q <Enter>*)
\n: Insert a <NL> (<NUL> in the file) (does NOT break the line)
\t: Insert a <Tab>
\\: Insert a single backslash
\x: Is any character not mentioned above: Reserved for future expansion

* Some systems support CTRL-V <Enter> to insert the literals

Flags¶

g: Global, replaces all occurrences on each line.
i: Case insensitive.
I: Case sensitive.
c: Confirm each substitution.
e: Suppress "no match" error.
n: Report the number of matches, and don't actually substitute.
p: Print the line containing the last substitute.
- #: Like p and prepend the line number.
- l: Like p but print the text like :list.
& Must be the first one. Keep the flags from the previous substitute command.
Examples:
- :&&
- :s/this/that/&
r: Only useful in combination with :& or :s without arguments.
:&r works the same way as :~:
- When the search pattern is empty, use the previously used search pattern
  instead of the search pattern from the last :s or :global.
- If the last command that did a search was a :s or :global, there is no effect.

Two and Three Letter `:substitute` Commands¶

You can use flags directly in the in the commands so you don't need to specify them at the end:
| | c | e | g | i | I | n | p | l | r
|---|----|----|----|----|----|----|----|----|---
| g | :sgc | :sge | :sg | :sgi | :sgI | :sgn | :sgp | :sgl | :sgr | I | :sIc | :sIe | :sIg | :sIi | :sI | :sIn | :sIp | :sIl | :sIr | c | :sc | :sce | :scg | :sci | :scI | :scn | :scp | :scl | r | :src | | :srg | :sri | :srI | :srn | :srp | :srl | :sr | i | :sic | :sie | | :si | :siI | :sin | :sip | | :sir | e
| n
| p
| l

Exceptions:
* :scr is :scriptnames * :se is :set * :sig is :sign * :sil is :silent * :sn is :snext * :sp is :split * :sl is :sleep * :sre is :srewind

Repeating Substitutions¶

&: Synonym for :s (repeat last substitute).
:~ Repeat last substitute with same substitute string
but with last used search pattern. This is like :&r.
g&: Synonym for :%s//~/& (repeat last substitute with
last search pattern on all lines with the same flags).

Substitutions with Expressions¶

When the substitute string starts with \=, the remainder is interpreted as an expression.
The separation char can not be in the expression!
| Substitution | Effect | |------------------------|-----------------------------------------------------| |:s@\n@\="\r" .. expand("$HOME") .. "\r"@|This replaces an end-of-line with a new line containing the value of $HOME| |s/E/\="\<Char-0x20ac>"/g|This replaces each E character with a euro sign.|

`:h <Char->`¶

Examples:
:s@\n@\="\r" .. expand("$HOME") .. "\r"@ * This replaces an end-of-line with a new line containing the value of $HOME. >
s/E/\="\<Char-0x20ac>"/g * This replaces each E character with a euro sign. Read more in <Char->.

Quantifiers, Greedy and Non-Greedy¶

Greedy¶

*: matches 0 or more of the preceding characters, ranges or metacharacters
- .* matches everything including empty line
\+: matches 1 or more of the preceding characters.
\=: matches 0 or 1 more of the preceding characters.
\{n}: matches exactly n times of the preceding characters.
\{n,m}: matches from n to m of the preceding characters.
\{,m}: matches at most m (from 0 to m) of the preceding characters.
\{n,}: matches at least n of of the preceding characters.
where n and m are positive integers (>0)

Non-Greedy¶

:h atom Parentheses can be used to make a pattern into an atom.

\{-}: matches 0 or more of the preceding atom, as few as possible
\{-n,m}: matches 1 or more of the preceding characters.
\{-n,}: matches at lease or more of the preceding characters.
\{-,m}: matches 1 or more of the preceding characters.
where n and m are positive integers (>0)

Optionally Match Atoms¶

\%[]: A sequence of optionally matched atoms. This always matches.
- The longest that matches is used.
- There can be no , \%(\) or \z(\) items inside the []
- \%[] does not nest.

Optional Matching Example¶

Pattern	Matches
`/r\%[ead]`	matches `r`, `re`, `rea` or `read`.
`/\<fu\%[nction]\>`	matches the Ex command `function`, where `fu` is required and `nction` is optional
`/\<r\%[[eo]ad]\>`	matches the words `r`, `re`, `ro`, `rea`, `roa`, `read` and `road`.

Match Inside the Visual Area¶

\%V: Match inside the Visual area.
- When Visual mode has already been stopped match in the area that gv would reselect.
- To make sure the whole pattern is inside the Visual area:
  - Put it at the start and just before the end of the pattern.
    - i.e., /\%Vfoo.*ba\%Vr
- Only works for the current buffer.

Visual Area Matching Example¶

String: foo bar

Pattern	Matches
`/\%Vfoo.*ba\%Vr`	This works if only `foo bar` was Visually selected.
`/\%Vfoo.*bar\%V`	Would match `foo bar` if the Visual selection continues after the `r`.

Match with the Cursor Position¶

\%#: Matches with the cursor position.
- Only works when matching in a buffer displayed in a window.

Using Marks for Matching¶

\%'m Matches with the position of mark m.
\%<'m Matches before the position of mark m.
\%>'m Matches after the position of mark m.

Line Number Matching¶

Using Line Numbers for Matching¶

\%23l Matches in a specific line.
\%<23l Matches above a specific line (lower line number).
\%>23l Matches below a specific line (higher line number).

The "23" can be any line number. The first line is 1.

Using the Current Line for Matching¶

\%.l Matches at the cursor line.
\%<.l Matches above the cursor line.
\%>.l Matches below the cursor line.

These six can be used to match specific lines in a buffer.

Matching with Start and End of the File¶

\%^: Matches start of the file. When matching with a string, matches the start of the string.
\%$: Matches end of the file. When matching with a string, matches the end of the string.

Matching with Columns¶

\%23c: Matches in a specific column.
\%<23c: Matches before a specific column.
\%>23c: Matches after a specific column.
\%.c: Matches at the cursor column.
\%<.c: Matches before the cursor column.
\%>.c: Matches after the cursor column.

Matching After a Pattern¶

\@<=: Matches with zero width if the preceding atom matches just before what
follows. |/zero-width|
- Like (?<=pattern) in Perl, but Vim allows non-fixed-width patterns.

Lookback (Matching after a pattern) Example¶

Pattern	Matches
`$an\_s\+$\@<=file`	"file" after "an" and white space or an end-of-line

Matching After a NON-matching pattern¶

\@<!: Matches with zero width if the preceding atom does NOT match just
before what follows.
- Like (?<!pattern) in Perl, but Vim allows non-fixed-width patterns.
- This can be a bit slow.

Lookback (Match after a non-matching pattern) Example¶

Pattern	Matches
`$foo$\@<!bar`	any "bar" that's not in "foobar"
`$\/\/.*$\@<!in`	"in" which is not after "//"

\@123<!: Like \@<! but only look back 123 bytes. This avoids trying lots of
matches that are known to fail and make executing the pattern very
slow.

Match Excluding the Preceding Atom¶

\@= (or \&): Matches the preceding atom with zero width.
- Like (?=pattern) in Perl.

Lookback (Matching the beginning of a pattern) Example¶

The string: foobar | Pattern | Matches | |--------------------|-------------------| | foo$bar$\@= | foo in foobar | | foo$bar$\& | foo in foobar | | foo$bar$\@=foo | nothing | * Using \& works the same as using \@=: * foo\&.. is the same as $foo$\@=...
* \& is easier, you don't need the parentheses.

Use Cases¶

foo$bar$\@=: Find all foos that are followed by bar
foo$bar\|baz\| bar$\@=: Find all foos that are followed by bar, baz, or bar (space
bar)

Zero-Width¶

/zero-width
- When using \@= (or ^, $, \<, \>) no characters are included
  in the match.
- These items are only used to check if a match can be made.
- This can be tricky, because a match with following items will
  be done in the same position.

Zero-Width Matching Example¶

Pattern	Matches
`foo$bar$\@=foo`	nothing

The example above will not match foobarfoo, because it tries match foo in the same position where
bar matched.

Setting the Start of a Match with `\zs`¶

\zs: Matches at any position, but not inside [], and sets the start of the
match there:
- The next char is the first char of the whole match.
- This cannot be followed by a multi. :h multi

Matching with `\zs` Example¶

Pattern	Matches
`/^\s*\zsif`	matches an "if" at the start of a line, ignoring white space.
`/$.\{-}\zsFab$\{3}`	Finds the third occurrence of `Fab`.

Setting the End of a Match with `\ze`¶

\ze: Matches at any position, but not inside [], and sets the end of the
match there:
- The previous char is the last char of the whole match.

Ending match with `\ze` Example¶

Pattern	Matches
`end\ze$if\\|for$`	matches the `end` in `endif` and `endfor`.

Less Useful Patterns¶

Match if Previous Pattern Doesn't Match at the CURRENT Position¶

\@!: Matches with zero width if the preceding atom does NOT match at the
current position.
- Like (?!pattern) in Perl.

Zero-Width Match after Non-Match Example¶

Pattern	Matches
`if $\(then$\@!.\)*$`	`if` not followed by `then`
`a.\{-}p\@!`	`a`, `ap`, `app`, `appp`, etc. not immediately followed by a `p`
`/^\%(.bar\)\@!.\zsfoo`	`foo` in a line that does not contain `bar`
`foo$bar$\@!`	any `foo` not followed by `bar`

You can't use \@! to look for a non-match before the matching position.

$foo$\@!bar will match bar in foobar, because
foo does not match at the position where bar matches.
Use $foo$\@<!bar (\@<!).

Match at the Current Position like a Single Pattern¶

\@> Matches the preceding atom like matching a whole pattern.
- Like (?>pattern) in Perl.

Matching with `\@>` Example¶

The string: aaab

Pattern	Matches
`$a*$\@>ab`	will not match `aaab`, because the `a*` matches the `aaa` (as many "a"s as possible), thus the `ab` can't match.
`$a*$\@>a`	nothing (the `a*` takes all the `a`'s, there can't be another one following)

Matching Different Number Systems¶

\%d123: Matches the character specified with a decimal number.
- Must be followed by a non-digit.
\%o40: Matches the character specified with an octal number up to 0o377.
- Numbers below 0o40 must be followed by a non-octal digit or a
  non-digit.
\%x2a: Matches the character specified with up to two hexadecimal characters.
\%u20AC: Matches the character specified with up to four hexadecimal
characters.
\%U1234abcd: Matches the character specified with up to eight hexadecimal
characters, up to 0x7fffffff

Matching Decimal, Octal, and Hexadecimal Number Systems¶

\%d: Matching Decimal (base10)
\%o: Matching Octal (base8)
\%x: Matching Hexadecimal (base16)
- Up to 2 hexadecimal characters
\%u: Matching Hexadecimal (base16)
- Up to 4 hexadecimal characters
\%U: Matching Hexadecimal (base16)
- Up to 8 hexadecimal characters

Examples¶

\%d123: Matches the character specified with a decimal number.
- Must be followed by a non-digit.
\%o40: Matches the character specified with an octal number up to 0o377.
- Numbers below 0o40 must be followed by a non-octal digit or a non-digit.
\%x2a: Matches the character specified with up to two hexadecimal characters.
\%u20AC: Matches the character specified with up to four hexadecimal characters.
\%U1234abcd: Matches the character specified with up to eight hexadecimal characters, up to 0x7fffffff

/[[=/[==]
- An equivalence class. Match accented a characters (i.e., â, ã, å, etc.)
[..]
- A collation element.
- This currently simply accepts a single
  character in the form: [.a.]

Collections / Sets¶

[]: A Collection (sometimes called a 'set') - Matches any single character in the collection.
- Think of this as a custom character class. A set will only match a single character.
- \%[] A sequence of optionally matched characters. This always matches.
  - The longest match is used with this.
- \_[]: A collection that also matches end-of-line.
- [\n]: With \_ prepended the collection OR \n in the collection also
  includes the end-of-line.
  Starting a collection with ^ will make it match
  everything BUT what is in the collection:
```
^[^\d]  
```
  The above will match a line that does NOT start with
  a digit character.

Collection Limitations / Caveats¶

There can be no , \%(\) or \z(\) items inside the [], and \%[] does not nest.

Collection Examples¶

/index\%[[[]0[]]]

Matches index, index[, index[0, and index[0].

Good Ones to Remember¶

\%(\): A pattern enclosed by escaped parentheses.
- Just like , but without counting it as a capture (no backref).
- This allows using more groups and it's a little bit faster.
~/\~: Matches the last given substitute string.
\<: Matches the beginning of a word: The next char is the first char of a word.
\>: Matches the end of a word: The previous char is the last char of a word.
\_.: Matches any single character or end-of-line.
\_^: Matches start-of-line.
Example:
```
\_s*\_^foo  
```
This matches white space, end-of-lines, and blank lines, then "foo" at start-of-line.

\@<=: Matches everything after the previous atom
- It's recommended to use \zs instead of \@<= with the new regex engine.
\zs: Matches at any position, but not inside [], and sets the start of the
match there.

:s/\(everything\)\@<=after the previous  
:s/\(everything\)\zsafter the previous

The two above essentially do the same thing.

Make it Non-Greedy¶

When using the brace notation (\{1,}), you can easily make it non-greedy.
If a dash (-) appears immediately after the opening brace, {, then the shortest match first* algorithm is used.
* i.e., \{-...} = Non-Greedy

So:
* \{-} is a non-greedy version of * * \{-1} is a non-greedy version of +

Pattern	Matches
`\(foo\)\@<!bar`	any "bar" that's not in "foobar"
`\(\/\/.*\)\@<!in`	"in" which is not after "//"

Pattern	Matches
`/^\s*\zsif`	matches an "if" at the start of a line, ignoring white space.
`/\(.\{-}\zsFab\)\{3}`	Finds the third occurrence of `Fab`.

Pattern	Matches
`if \(\(then\)\@!.\)*$`	`if` not followed by `then`
`a.\{-}p\@!`	`a`, `ap`, `app`, `appp`, etc. not immediately followed by a `p`
`/^\%(.bar\)\@!.\zsfoo`	`foo` in a line that does not contain `bar`
`foo\(bar\)\@!`	any `foo` not followed by `bar`

Pattern	Matches
`\(a*\)\@>ab`	will not match `aaab`, because the `a*` matches the `aaa` (as many "a"s as possible), thus the `ab` can't match.
`\(a*\)\@>a`	nothing (the `a*` takes all the `a`'s, there can't be another one following)