
On Linux and Unix systems, you’re constantly dealing with text logs, system reports, command outputs, you name it. Tools like grep
and sed
help, but when you need to slice, reshape, or analyze structured text, awk
is the real workhorse.
Built by Aho, Weinberger, and Kernighan, awk
was designed for scanning patterns and processing fields. It’s more than a command it’s a lightweight language for cleanly extracting exactly what you need from messy data. Personally, I don’t see awk
as just another tool. It changes how you think about text you describe the pattern, define the action, and it just works.
In this article, we’ll break down the core ideas behind awk
from basic printing to built-in variables, arrays, formatting, and more. If you’re also diving into Bash automation, check out our companion guide: Bash Scripting: Mastering Linux Automation. By the end, you’ll be ready to tackle real-world data cleanup, reporting, and automation all from the terminal.
Getting Started with awk
: The Basics

Let’s start by looking at how awk
processes input and structures data.
Let’s start by looking at how awk
processes input and structures data.
The Basic Idea: Pattern and Action
At its core, awk
follows a simple structure: pattern { action }
. It reads your input line by line, and for each line, it checks whether it matches the pattern you’ve defined. If it does, awk
executes the corresponding action.
Here’s how you usually write it
awk 'pattern { action }' filename # Or, like a lot of Linux commands, you can send data to it: cat filename | awk 'pattern { action }'
pattern
(optional): This is usually a regular expression or a condition. If you leave it out,awk
processes every line by default.action
(optional): This is what you wantawk
to do, enclosed in curly braces{}
. If you skip the action,awk
simply prints the entire line that matches the pattern.
How awk
Sees Your Text: Lines and Fields
By default, awk
treats each line in your input as a record. It then splits that record into fields, usually wherever it finds a space or a tab. You can access these fields using special variables:
$1
The first field
$2
This second field.
$N
The Nth field.
$0
This is the whole line, exactly as awk
read it.
Let’s try a simple example with a file called data.txt
:
Name Age City Alice 30 NewYork Bob 24 London Charlie 35 Paris
If you just want to print the names (first field) and ages (second field):
awk '{print $1, $2}' data.txt # Output: # Name Age # Alice 30 # Bob 24 # Charlie 35
See how it printed the header line too? We’ll get to how to skip that later!
Working with Different Separators: The -F
Option
Not all data comes neatly spaced. CSVs (Comma-Separated Values) are everywhere, and you’ll often run into TSVs (Tab-Separated Values) too. That’s where awk’s -F
option comes in it lets you define the character awk
should treat as the field separator.
Say you’ve got a file called employees.csv
:
ID,Name,Department,Salary 101,Alice,HR,60000 102,Bob,IT,75000 103,Charlie,HR,62000
If you want to print just the Name and Salary columns, you can tell awk
to split fields using a comma:
awk -F',' '{print $2, $4}' employees.csv
OutPut:
Name Salary Alice 60000 Bob 75000 Charlie 62000
Using -F
like this makes awk
super flexible when working with all kinds of structured text not just CSVs, but anything with consistent separators.
How awk
Programs Work: More Than Just One-Liners
awk
isn’t just for quick commands you can write full programs with it. Its structure lets you do things before any data is read, while it’s being processed, and after everything is done.
The BEGIN
and END
Sections
These are special parts of an awk
program that run only once:
BEGIN { action }
: This runs before awk reads the first line of input. It’s useful for setting up variables, printing headers, or initializing counters.END { action }
: This runs after awk has processed every line. Ideal for printing summaries, totals, or final messages.
Let’s improve our employees.csv
example by adding a proper report title at the start and a total salary summary at the end:
awk -F',' ' BEGIN { print "--- Employee Salary Report ---" print "Name\tSalary" # \t means a tab total_salary = 0 # Start a variable to keep track of the sum } NR > 1 { # This means "for every line AFTER the first one" (to skip the header) print $2 "\t" $4 total_salary += $4 # Add the current salary to our running total } END { print "----------------------------" print "Total Company Salary: $" total_salary }' employees.csv
Output:
--- Employee Salary Report --- Name Salary Alice 60000 Bob 75000 Charlie 62000 ---------------------------- Total Company Salary: $197000
Patterns: Picking Which Lines to Process
The “pattern” in pattern { action }
is what tells awk
when to do something. It’s surprisingly flexible you can match lines using text patterns, comparisons, ranges, or logical combinations.
- Regular Expressions
This is one of the most common ways to match lines. Just wrap your pattern in slashes (/pattern/
), andawk
will trigger the action on matching lines.
# Find lines that contain "HR" awk '/HR/ { print $0 }' employees.csv
- Conditions
You can also use regular comparisons (==
,!=
,>
,<
,>=
,<=
) to filter based on specific field values.
# Print names and salaries of employees earning more than 70000 # (skipping the header line) awk -F',' 'NR > 1 && $4 > 70000 { print $2, $4 }' employees.csv # Output: Bob 75000
- Range Patterns
You can tell awk to process lines between two matching patterns. This is handy for working with blocks of text, like in config files.
# Print everything between START_BLOCK and END_BLOCK (inclusive) awk '/^START_BLOCK$/,/^END_BLOCK$/ { print $0 }' config.txt
- Combining Patterns
You can use&&
(AND),||
(OR), and!
(NOT) to make more specific patterns.
# Print names and departments of employees in HR or IT awk -F',' 'NR > 1 && ($3 == "HR" || $3 == "IT") { print $2, $3 }' employees.csv
awk
‘s Built-in Tools: Handy Variables
awk
comes with some special built-in variables that give you extra information about the data you’re processing. These are especially useful for filtering, formatting, and generating reports.
NR
(Number of Records): This tracks the current line number awk is processing
# Print each line with its line number in front awk '{print NR, $0}' data.txt # Output: # 1 Name Age City # 2 Alice 30 NewYork # ...
Filtering specific line ranges (like awk 'NR >= 5 && NR <= 10 { print $0 }' large_log.txt
).
NF
(Number of Fields): Tells you how many fields (columns) are on the current line.
# Print the very last field of each line awk '{print $NF}' data.txt
Useful when lines have a variable number of columns or you just want the last value in each row.
FS
(Field Separator): Specifies how input lines are split into fields. Set it using-F
or inside aBEGIN
block.
# Same as -F',', but you set it inside the script awk 'BEGIN {FS=","} {print $2, $4}' employees.csv
OFS
(Output Field Separator): This is the characterawk
puts between fields when you useprint
to show more than one item. By default, it’s a space.
# Change the output separator from a space to a tab awk -F',' 'BEGIN {OFS="\t"} {print $2, $4}' employees.csv # Output: # Name Salary # Alice 60000 # ...
RS
(Record Separator): This is whatawk
uses to decide when one record (line) ends and another begins. Normally, it’s a newline. Changing it lets you process things like paragraphs that span multiple lines.
# Process blocks of text separated by blank lines awk 'BEGIN {RS=""} {print "Paragraph:", NR, $0}' multi_paragraph.txt
ORS
(Output Record Separator): This is whatawk
uses after printing each record. The default is a newline.
# Add an extra newline between each printed record awk '{ORS="\n\n"; print $1, $2, $3}' data.txt
FILENAME
(Current File Name): Shows the name of the file awk is currently reading. Great when working with multiple files.
# Print filename and the line for each "ERROR" awk '/ERROR/ { print FILENAME ":", $0 }' *.log
Doing More with awk
: Calculations and Logic
awk
really shines because it works like a full programming language. You can do math, manipulate text, and use if/else
statements and loops all inside your awk
scripts.
Doing Math
awk
knows how to handle numbers, so you can do all the usual math operations: +
, -
, *
, /
, %
.
# Give a 10% bonus to employees in the IT department awk -F',' ' NR > 1 && $3 == "IT" { bonus = $4 * 0.10 new_salary = $4 + bonus print $2, $4, "Bonus:", bonus, "New Salary:", new_salary }' employees.csv # Output: Bob 75000 Bonus: 7500.0 New Salary: 82500.0
Working with Text (Strings)
awk
has a solid set of built-in functions for working with text great for checking, slicing, or transforming strings:
length(string)
substr(string, start, length)
index(string, substring)
match(string, regex)
RSTART
(start position) and RLENGTH
(length of match).sub(regex, replacement, target_string)
gsub(regex, replacement, target_string)
split(string, array, separator)
# Example: Get initials and make department names all caps awk -F',' ' NR > 1 { # Get the first letter of the name initial = substr($2, 1, 1) # Make the department name uppercase dept_upper = toupper($3) print initial, $2, dept_upper }' employees.csv # Output: # A Alice HR # B Bob IT # C Charlie HR
If/Else and Loops (Just Like Other Languages!)
awk
isn’t just for filtering and printing it can handle logic too. You can use if/else
, for
, and while
just like in Python or JavaScript, which makes it great for handling more complex data logic.
# Put employees into salary categories awk -F',' ' NR > 1 { if ($4 > 70000) { status = "High Earner" } else if ($4 > 60000) { status = "Mid-Range" } else { status = "Entry-Level" } print $2, $4, status }' employees.csv # Output: # Alice 60000 Entry-Level # Bob 75000 High Earner # Charlie 62000 Mid-Range
awk
Arrays: Grouping and Counting Data
One of the coolest things about awk
is how it handles associative arrays. These aren’t like normal arrays that just use numbers (0, 1, 2…). awk
arrays let you use text (or numbers) as keys. This makes it super easy to count things, sum stuff up, and group data.
Counting Things
A common job is counting how many times something shows up.
# Count how many employees are in each department awk -F',' ' NR > 1 { department_counts[$3]++ # Add one to the count for this department } END { print "--- Employee Counts by Department ---" for (dept in department_counts) { # Go through each department in our list print dept ":", department_counts[dept] } }' employees.csv # Output: # --- Employee Counts by Department --- # IT: 1 # HR: 2
Summing Up Data
You can use arrays to add up numbers based on different categories.
# Add up salaries for each department awk -F',' ' NR > 1 { department_salaries[$3] += $4 # Add the current salary to that department's total } END { print "--- Total Salaries by Department ---" for (dept in department_salaries) { print dept ": $" department_salaries[dept] } }' employees.csv # Output: # --- Total Salaries by Department --- # IT: $75000 # HR: $122000
Real-World awk Examples: Get Things Done
Now that you’ve got the basics down, let’s look at how awk actually shines in everyday Linux tasks.
1. Log Analysis: Finding Top IPs in Nginx Access Logs
Got a busy server? Want to know who’s hitting it the most? Let’s use awk to extract and rank the top IP addresses from your Nginx access log.
# A quick peek at what access.log might look like: # 192.168.1.1 - - [29/Jul/2025:10:00:01 +0000] "GET /index.html HTTP/1.1" 200 1234 "-" "Mozilla/5.0" # 192.168.1.2 - - [29/Jul/2025:10:00:02 +0000] "GET /images/logo.png HTTP/1.1" 200 5678 "-" "Mozilla/5.0" # 192.168.1.1 - - [29/Jul/2025:10:00:03 +0000] "GET /about.html HTTP/1.1" 200 987 "-" "Mozilla/5.0" # ... # Here's the command to find the top 5 IPs: awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -n 5 # What's happening here? # awk '{print $1}' access.log: We're just grabbing the IP address (which is the first field). # sort: Puts all the IPs in alphabetical order. # uniq -c: Counts how many times each unique, consecutive IP shows up. # sort -nr: Sorts the results by the count, from highest to lowest. # head -n 5: Just shows us the top 5 lines.
2. Reformatting Data: Turn Fixed-Width into CSV
Not all data comes neatly separated by commas or tabs. Sometimes, it’s fixed-width meaning each field is a specific number of characters wide. awk makes this easy to clean up and convert.
Say your products.txt
looks like this:
001Laptop 1200.00 002Keyboard 0075.50
Convert it to CSV using awk
:
# Convert this fixed-width data into a CSV format: awk '{ id = substr($0, 1, 3) # Grab the first 3 characters for ID name = substr($0, 4, 8) # Next 8 characters for Name price = substr($0, 12, 7) # Next 7 characters for Price printf "%s,%s,%s\n", id, name, price # Print them out as CSV }' products.txt # Output: # 001,Laptop,1200.00 # 002,Keyboard,0075.50
3. Simple Reports: Disk Usage Summary
Want a cleaner view of your disk space usage? We can use awk
to reformat the df -h
output into a neat summary showing just what matters the filesystem, usage percentage, and mount point.
# What 'df -h' output usually looks like: # Filesystem Size Used Avail Use% Mounted on # /dev/sda1 50G 20G 28G 42% / # /dev/sdb1 100G 80G 15G 85% /data # tmpfs 3.9G 0 3.9G 0% /dev/shm df -h | awk ' NR==1 { print "Filesystem\tUsed%\tMountPoint" } # Print a custom header for our report NR > 1 { # For every line after the first one (skip the original header) gsub("%", "", $5) # Get rid of the '%' sign from the "Use%" column print $1 "\t" $5 "\t" $6 # Print the Filesystem, Used%, and MountPoint }' # Output: # Filesystem Used% MountPoint # /dev/sda1 42 / # /dev/sdb1 85 /data # tmpfs 0 /dev/shm
It’s a quick way to turn cluttered system output into something you can actually read or even parse further with a script or dashboard.
Tips for Better awk
Scripts
- Start Small and Test Often: Don’t try to build a monster script in one go. Write small parts, test them, and build up from there. It’s way easier to troubleshoot a few lines than fifty.
- Use
print
to Debug: If something looks off, just toss in aprint
inside yourawk
block. It’s the quickest way to see what the fields ($1
,$2
, etc.) or variables look like at each step. - Use
awk -f
for Bigger Scripts: If yourawk
code gets longer than a single line, save it in a separate file (likemy_awk_script.awk
). Then you run it withawk -f my_awk_script.awk input.txt
. This keeps your code much cleaner and easier to manage. awk
vs.gawk
: You’ll often seeawk
andgawk
used.gawk
is just the GNU version ofawk
, and it’s what most Linux systems use when you typeawk
.gawk
usually has more features than the originalawk
standard. For most common tasks, just usingawk
works fine.- Know When to Use It:
awk
is awesome for working with data that’s in columns or making reports. If you just need to search for text,grep
is faster. For simple find-and-replace,sed
might be easier. But when you need to combine searching with actions on specific fields, do calculations, or rearrange data,awk
is your best friend.
Wrapping Up: Your awk
Skills Just Leveled Up
You’ve just taken a solid step toward mastering awk
. You’ve seen how its unique pattern { action }
structure works, how to tap into built-in variables, perform calculations, tweak text, and even group data using arrays.
But awk
isn’t just another Linux command. It’s a flexible, focused language built for text data a tool that lets you zero in on exactly what you need, transform it on the fly, and format it the way you want. Even complex data manipulation starts to feel clean and manageable with awk
on your side.
Your Linux automation toolkit just got sharper. And as you face new data challenges parsing logs, cleaning reports, prepping CSVs keep awk
in mind. It’s your go-to utility for structured text work and quick data shaping.Pair it with the Bash scripting skills you already have like writing clean scripts, trapping errors, and scheduling tasks and you’re not just automating tasks. You’re building powerful, efficient systems.
The terminal is your workspace. And now,
awk
is one of your precision tools. Use it well, and your scripts won’t just work they’ll work smart.
