Awk

Introduction to awk

awk is a versatile programming language primarily used for pattern scanning and processing. It's typically used in data extraction and reporting. The name "awk" comes from the initials of its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan.

Basic Syntax

The basic syntax of awk is:

awk 'pattern {action}' file
  • pattern: A condition to match lines in the file.
  • action: A command to execute on lines that match the pattern.
  • file: The file to process.

Common Examples

1. Print Specific Columns

To print specific columns from a file:

awk '{print $1, $3}' file.txt

This command prints the first and third columns of each line in file.txt.

2. Print Lines Matching a Pattern

To print lines that contain a specific pattern:

awk '/pattern/ {print}' file.txt

This command prints all lines in file.txt that contain the word "pattern".

3. Field Separator

By default, awk uses whitespace as a field separator. You can change this using the -F option:

awk -F ',' '{print $1, $2}' file.csv

This command prints the first and second columns of a comma-separated file (file.csv).

4. Built-in Variables

awk provides several built-in variables:

  • $0: The entire current line.
  • $1, $2, ...: The first, second, etc., fields.
  • NR: The current record number (line number).
  • NF: The number of fields in the current record.

Example:

awk '{print NR, NF, $0}' file.txt

This command prints the line number, the number of fields, and the entire line for each line in file.txt.

5. Arithmetic Operations

You can perform arithmetic operations within awk:

awk '{print $1 + $2}' file.txt

This command prints the sum of the first and second columns for each line in file.txt.

6. Conditionals

You can use conditionals to perform actions based on specific conditions:

awk '$3 > 50 {print $1, $3}' file.txt

This command prints the first and third columns of lines where the third column is greater than 50.

7. BEGIN and END Blocks

awk allows you to specify actions to be performed before processing any lines (BEGIN) and after processing all lines (END):

awk 'BEGIN {print "Start"} {print $1} END {print "End"}' file.txt

This command prints "Start" before processing the file, the first column of each line, and "End" after processing the file.

Practical Example

Suppose you have a file data.txt with the following content:

Name,Age,Salary
John,28,50000
Jane,32,60000
Doe,22,40000

Printing the Name and Age of Employees

awk -F ',' '{print $1, $2}' data.txt

Output:

Name Age
John 28
Jane 32
Doe 22

Calculating the Average Salary

awk -F ',' 'NR > 1 {sum += $3; count++} END {print "Average Salary:", sum/count}' data.txt

Output:

Average Salary: 50000

This command skips the header row (NR > 1), sums the salaries, counts the entries, and calculates the average salary.

Conclusion

awk is a powerful tool for text processing and data extraction. With its simple syntax and powerful features, it can handle complex data manipulation tasks efficiently. Practice with different patterns and actions to get the most out of awk.

If you have specific scenarios or data you need help with, feel free to ask!