Awk
Introduction to awk
awk
is a versatile programming language primarily used for pattern scanning and processing. It's typically used in data extraction and reporting. The name "awk" comes from the initials of its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan.
Basic Syntax
The basic syntax of awk
is:
awk 'pattern {action}' file
pattern
: A condition to match lines in the file.action
: A command to execute on lines that match the pattern.file
: The file to process.
Common Examples
1. Print Specific Columns
To print specific columns from a file:
awk '{print $1, $3}' file.txt
This command prints the first and third columns of each line in file.txt
.
2. Print Lines Matching a Pattern
To print lines that contain a specific pattern:
awk '/pattern/ {print}' file.txt
This command prints all lines in file.txt
that contain the word "pattern".
3. Field Separator
By default, awk
uses whitespace as a field separator. You can change this using the -F
option:
awk -F ',' '{print $1, $2}' file.csv
This command prints the first and second columns of a comma-separated file (file.csv
).
4. Built-in Variables
awk
provides several built-in variables:
$0
: The entire current line.$1, $2, ...
: The first, second, etc., fields.NR
: The current record number (line number).NF
: The number of fields in the current record.
Example:
awk '{print NR, NF, $0}' file.txt
This command prints the line number, the number of fields, and the entire line for each line in file.txt
.
5. Arithmetic Operations
You can perform arithmetic operations within awk
:
awk '{print $1 + $2}' file.txt
This command prints the sum of the first and second columns for each line in file.txt
.
6. Conditionals
You can use conditionals to perform actions based on specific conditions:
awk '$3 > 50 {print $1, $3}' file.txt
This command prints the first and third columns of lines where the third column is greater than 50.
7. BEGIN and END Blocks
awk
allows you to specify actions to be performed before processing any lines (BEGIN) and after processing all lines (END):
awk 'BEGIN {print "Start"} {print $1} END {print "End"}' file.txt
This command prints "Start" before processing the file, the first column of each line, and "End" after processing the file.
Practical Example
Suppose you have a file data.txt
with the following content:
Name,Age,Salary
John,28,50000
Jane,32,60000
Doe,22,40000
Printing the Name and Age of Employees
awk -F ',' '{print $1, $2}' data.txt
Output:
Name Age
John 28
Jane 32
Doe 22
Calculating the Average Salary
awk -F ',' 'NR > 1 {sum += $3; count++} END {print "Average Salary:", sum/count}' data.txt
Output:
Average Salary: 50000
This command skips the header row (NR > 1
), sums the salaries, counts the entries, and calculates the average salary.
Conclusion
awk
is a powerful tool for text processing and data extraction. With its simple syntax and powerful features, it can handle complex data manipulation tasks efficiently. Practice with different patterns and actions to get the most out of awk
.
If you have specific scenarios or data you need help with, feel free to ask!