Grep¶
Grep is a coreutil that prints lines that match patterns.
tl;dr:¶
Syntax:
grep -E '^expression$' [filename]
grep -n -r -E '^#?#\s\s?.*$' # Get all markdown H1 and H2s
#
# -n: Line Numbers; -r: recusive; -E: Extended Regex
-i
: Ignore case.-v
: Inv
ert the match. Will print the lines that don't match the pattern.- Matches/prints non-matching lines.
-n
: Output the line number of the match.-E
: Enable Extended regex.-r
: Recursively search starting from working directory.- When a regular file is provided, this doesn't have any effect.
-R
is the same as-r
, but will also follow symlinks.- Functionally equivalent to
grep -d recurse
-l
will only print the filenames of files with matches.-L
will output the filenames of files WITHOUT matches.
-d <ACTION>
: Specify an action to take on directories. Available actions:skip
: Silently skip any directories.read
: Read directories as if they were normal files.recurse
: Recurse through directories, reading all files under each directory.
--exclude=glob
: Skip any files that match the glob pattern.--exclude-dir=glob
: Skip any directories that match the glob pattern.
Using Grep¶
The greps:
grep
egrep
fgrep
rgrep
grep
: Uses Basic Regular Expressions (BREs) by default.egrep
: Uses Extended Regular Expressions (EREs).- Equivalent to
grep -E
- Equivalent to
fgrep
: Treats the pattern as a fixed string, not a regular expression.- Equivalent to
grep -F
- Equivalent to
rgrep
: Recursively searches directories.- Equivalent to
grep -r
orgrep -d recurse
- Equivalent to
There's also pgrep
, but that's not for searching through files. pgrep
is used to
grep through currently running processes and lists the PIDs that match the pattern.
Basic vs Extended Regular Expressions:¶
-
Basic Regular Expressions (BRE)
- In basic regular expressions the meta-characters
?
,+
,{
,|
,(
, and)
lose their special meaning. - Use the backslashed versions instead:
\?
,\+
,\{
,\|
,\(
, and\)
.
- In basic regular expressions the meta-characters
-
Extended Regular Expressions (ERE)
- Used with
egrep
orgrep -E
- The meta-characters keep their special meanings
without needing to be escaped:?
,+
,{
,|
,(
, and)
.
- Used with
Alternation (matching any one of multiple expressions)¶
- The "alternation operator" or "infix operator" (
|
, theOR
operator) is used to match any one of multiple expressions. - It acts as an
or
operator between multiple regular expressions. - For example,
grep 'cat|dog' filename
will match any line containing 'cat' or 'dog'.
Other Useful grep
Options¶
-i
: Ignore case distinctions in both the pattern and the input files.-v
: Invert the sense of matching, to select non-matching lines.-c
: Count the number of lines that match the pattern.-n
: Prefix each line of output with the line number within its input file.-l
: List only the names of files with matching lines, once for each file.-r
or-R
: Recursively search directories for the pattern.--color
: Highlight the matching text.--exclude-dir=PATTERN
: Skip any directory with a name that matches the given pattern.- Uses shell globbing.
Character Classes and Bracket Expressions¶
See man://grep 304
¶
A bracket expression is a set
of characters inside square
brackets [ ]
, which is used to match a single character.
What can go inside the brackets¶
- Sets accept a single character, a range of characters, a set of characters, or a
combination of those.
- E.g.:
[a]
,[abc]
,[a-z]
,[A-Z]
, or[0-9]
- As a combo:
[A-Z_-]
will matchA-Z
and the-
and_
characters.
- E.g.:
Character Classes and Their Matches¶
Character classes can also be used to specify a set of characters, using
the syntax [:class:]
.
[:alnum:]
: Matches any alphanumeric character. Same as[a-zA-Z0-9]
.[:alpha:]
: Matches any alphabetic character. Same as[a-zA-Z]
.[:blank:]
: Matches any horizontal whitespace character, including space and tab.[:cntrl:]
: Matches any control character.- Control characters are non-printing characters that control how text
is processed, like newline (
\n
), carriage return (\r
), tab (\t
), etc.
- Control characters are non-printing characters that control how text
is processed, like newline (
[:digit:]
: Matches any digit. Same as[0-9]
.[:graph:]
: Matches any printable character excluding space.- It includes characters that are visible and can be printed, excluding whitespace.
[:lower:]
: Matches any lowercase alphabetic character. Same as[a-z]
.[:print:]
: Matches any printable character including space.- It's similar to
[:graph:]
but also includes the space character.
- It's similar to
[:punct:]
: Matches any punctuation character.- It includes characters like
.
,,
,!
,?
,;
,:
, and other symbols that are not alphanumeric.
- It includes characters like
[:space:]
: Matches any whitespace character.- Includes spaces, tabs, newlines (
\n
), carriage returns (\r
), form feeds (\f
), and vertical tabs (\v
).
- Includes spaces, tabs, newlines (
[:upper:]
: Matches any uppercase alphabetic character. Same as[A-Z]
.[:xdigit:]
: Matches any hexadecimal digit. Same as[0-9a-fA-F]
.
Inverting the Matches (Match Non-Matching)¶
If the first character of the set
is the caret ^
, then it matches any
character not in the list.
E.g., [^abc]
will match everything except one of abc
.
- Note: Using the
-v
option also has this effect: It matches everything that does not match the pattern.- This applies to the entire pattern passed to
grep
. - Using the caret
^
in aset
only inverts the matches within theset
. - Using the caret
^
with the-v
option will have the opposite effect.
- This applies to the entire pattern passed to
Output Only the Matched Parts of the Line¶
You can effictively extract matched text without the use of capture groups
using grep -o
.
For example, you can extract the paths from an ldd
output:
ldd /bin/bash | grep -o '/[^ ]*'
The regex being used here: -
'/[^ ]*
:
- /
: Match starts with a forward slash.
- [^ ]*
: Matches any non-space characters (match until the first space).
The output:
/lib/x86_64-linux-gnu/libtinfo.so.6
/lib/x86_64-linux-gnu/libc.so.6
/lib64/ld-linux-x86-64.so.2
Examples¶
Count Occurrences of a Word:¶
grep -c 'word' filename
Search Recursively in Directory:¶
grep -r 'pattern' /path/to/directory
Find Lines Not Containing the Pattern:¶
grep -v 'pattern' filename
Search While Ignoring Case:¶
grep -i 'pattern' filename
Note on fgrep
(Fixed Strings)¶
fgrep
or grep -F
treats the pattern as a fixed string.
It matches the literal text provided as the pattern.
This means no regular expression is involved, making it faster for plain string matching.
For instance, fgrep "example.com" filename will match lines containing "example.com" as a fixed string, not as a regex.
Show Only the First Match¶
We can leverage grep
's -m
option to select how many matches we want to print.
grep -m 1 '^# ' ./file.txt
#
" will be printed to the terminal, and the
program will exit.
Grep Colors¶
- Note:
-v
matches NON-MATCHING lines.
Specifies the colors and other attributes used to highlight various parts ofgrep
output.
Its value is a colon-separated list of capabilities that defaults to:
The# Default: GREP_COLORS='ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36'
rv
andne
boolean capabilities are unset / omitted (false) by default.
Boolean capabilities have no =...
part.
They are unset (i.e., false) by default, and become true if they're set.
These substring values are integers in decimal representation and can be concatenated with
semicolons.
- Common values to concatenate include
- 1 for bold.
- 4 for underline.
- 5 for blink.
- 7 for inverse.
- 39 for default foreground color.
- 30 to 37 for foreground colors.
- 90 to 97 for 16-color mode foreground colors.
- 38;5;0 to 38;5;255 for 88-color and 256-color modes foreground colors.
- 48;5;0 to 48;5;255 for 88-color and 256-color modes background colors.
- 49 for default background color.
- 40 to 47 for background colors.
- 100 to 107 for 16-color mode background colors.
Note: grep
takes care of assembling the result into a complete SGR sequence (\33[...m
).
Breakdown¶
-
cx=
: context- Color for the context (lines where there's a match, but not the match itself)
- The default is empty (i.e., the terminal's default color pair).
-
rv
: reverse- Boolean value. Invert match and context colors.
- Default false, true if set.
-
mt=01;31
: matching text (same asms
?)- Color for matching txt in any matching line.
- Setting this is the same as setting
ms=
andmc=
at once to the same value. - The default is a bold red text foreground over the current line background.
-
ms=01;31
: matching selected (matched text)- Color for matches in a selected line.
- The effect of the
sl=
(orcx=
ifrv
) remains active when this kicks in. - The default is a bold red text foreground over the current line background.
-
sl=
:- Color for whole matching lines.
- The default is empty (i.e., the terminal's default color pair).
-
mc=01;31
: matching context- Color for matching text in a context line
- The effect of the
cx=
(orsl=
ifrv
) capability remains active when this kicks in. - The default is a bold red text foreground over the current line background.
-
fn=35
: file names- Color for filenames that come at the beginning of any line
- The default is a magenta text foreground over the terminal's default background.
-
ln=32
: line numbers- Color for line numbers in any line
- The default is a green text foreground over the terminal's default background.
-
bn=32
:- Color for byte offsets before any line
- The default is a green text foreground over the terminal's default background.
-
se=36
:- Color for separators between selected lines.
:
= selected lines-
= context lines--
= adjacent context lines
- The default is a cyan text foreground over the terminal's default background.
- Color for separators between selected lines.
-
ne
:- Boolean value. Disables Erase in Line (
EL
). - Prevents clearing to the end of line using Erase in Line (
EL
) to
Right (\33[K
) each time a colorized item ends. - The default is false (unset). True if set.
- Boolean value. Disables Erase in Line (
pgrep
¶
pgrep
(process grep) is a tool that helps you find the PIDs of running processes by
their names or other attributes.
It's part of the procps
package on most Linux distros.
sudo apt-get install procps
# Or on RedHat-based systems
sudo dnf install procps-ng
-
Find the PID of a process by its name:
This will return the PIDs of all processes whose name matchespgrep bash
bash
. -
Find the PIDs of processes owned by a specific user:
This will list all the PIDs of all the running processes that are owned bypgrep -u root
root
. -
To get the process name as well as the PID, use
-l
.
pgrep -u root -l
-
Or, to get the entire command used to invoke the process, use
-a
.
pgrep -u root -a
-
To match a process name exactly, use
-x
.
pgrep -x sshd
A more practical example, finding currently running SSHD processes.
pgrep sshd
You can also narrow this down further by specifying a user with -u
.
pgrep -u root sshd
sshd
processes that are owned by root
.
You can also specify multiple users, comma-delimited.
pgrep -u root,daemon sshd
root
or daemon
with
the name sshd
.
You can also just view the PIDs that are owned by users without providing a process
name.
pgrep -u root,daemon
List the process name as well as the PID with -l
:
pgrep -u root,daemon -l
Using -c
you can count the number of processes instead of listing the PIDs.
pgrep -c -u root
If you need to format the PIDs a certain way, you can specify a delimiter. It
defaults to newline, printing each PID on its own line.
pgrep -u root -d,
,
) and output the PIDs as
comma-delimited.
When you specify a process by name, it only matches the name of the base process.
Using -f
, you can match the entire command that was run to execute that process.
pgrep -f 'daemon'
--no-daemon
).
List the oldest and newest processes of a given name with -o
and -n
respectively.
pgrep -o bash # Oldest process named "bash"
pgrep -n vim # Newest process named "vim"