LinuxMule.com - GNU Unix Tools Tutorial

Important GNU Utilites

This guide shows the usefulness of GNU programs a Developer must know.

By Omkaram - Oct 16, 2021

Page 1 of 2

AWK
SED
Grep
OpenSSL
cURL & wget
Git

AWK

AWK a program whose name is derived from its founders, is a command line utility tool used for data formating, extraction & manipulation based on your specific needs. The tool is so powerful that all you need is one-liner statements which can do things even advanced tools like Notepad ++, VS Code, GNU Emacs, Atom cannot perform. The speed is amazing!

Let's see how I used awk in my projects. Here is the full AWK documentation AWK Official Manual

Basically, what AWK is a scripting language of somekind which was created for the need to filter, sort, extract and transform a file based on the Filter creterias. For example, recently I have a need to see where are all the places the word/string "linuxmule.com" is present in this website folders. If I wish to find and replace all the occurances of the word in single shot, then that would be nice and AWK can do such tasks. What I need once I run my Awk command in the response is 3 things

The FileName
The LineNumber of the matched string in a File
And the entire line where my string matched

Here is the command to achieve this

                        
$ find ./ -type f -exec awk '/linuxmule.com/ { printf "%s *** %s *** %s\n\n",FILENAME,FNR,$0 }' '{}' +

When I run the above in my website root directory I get the following response

                        
./articles/index.php *** 24 ***     link rel=alternate hreflang=en href="https://linuxmule.com/" /

./index.php *** 8 ***     link rel=alternate hreflang=x-default href="https://linuxmule.com/" /

./index.php *** 9 ***     link rel=alternate hreflang=en href="https://linuxmule.com/" /

./work/index.php *** 8 ***     link rel=alternate hreflang=x-default href="https://linuxmule.com/" /

./work/index.php *** 9 ***     link rel=alternate hreflang=en href="https://linuxmule.com/" /

Now here is the thing with the command we just ran. Alongside with AWK, we depended on another program called find. The reason for doing that is because AWK command originally takes only one file as a filename input argument under command options. Like this awk '/texttosearch/ {print}' somethingfile.txt . Most Unix based programs follow POSIX standards, and one of the behaviour of these programs is to take input file data through IO redirection instead of specifying a file name using the | pipe symbol. What that means is, instead of providing the file name, one could simple do something like this cat somethingfile.txt | awk '/texttosearch/ {print}' . Now in our case, all we did was using the find command find ./ -type f -exec to provide our Awk which a bunch of filenames present in the current directory ./ which started after the word "-exec"

Now that you got the basic idea, allow me the explain the command syntax we applied and what each character/word does.

If you only the command $ find ./ -type f then you will get the list of filenames in your current directory "./". The "-type f" specifies the type of the element we are looking for. Here "-type f" stands for "File" type. If you replace "-type f" with "-type d", then you find will out the folder names. One more thing to note is that this command will work recursively and will show you either all the file names or the sub-folder names in your current directory.

The next word in our orginal command is "-exec". This argument comes under the "find" command and what it does is, execute whatever statement passed after it by passing the output of the find as an Input to whatever follows. So, basically in the original command what follows after -exec is the awk statement, and it takes the filenames find derived as an input to work with. This way we can match for any words using AWK on all of the files present in the current directory at once.

Now Lets take demantle the AWK syntax once and see what's happening there

When you type for "awk -help", the output displays its syntax. Awk has mainly four parts.

Command Options
Selection Creteria (SC)
Action
Filename

The Filename is something we already sorted out. I won't dig in Command Options, because thats a topic for itself, but the selection creteria and the action is what we have actually used in the Original Command. Both the SC and Action are enclosed in single quote ''

In there /linuxmule.com/ is the selection criteria. The criteria here is to use a Regular expression aka Regex to match the word "linuxmule.com", and the rest is the Action. The action is to print using C style printf function, the file name, line number of the match, and the complete line where the match occured using the predefined keywords FILENAME,FNR,$0 respectively. The $0 is ideally not a keyword but a evaluating expression. What it means is $0 aka "the whole line", $1 aka "the first word of every matched line" etc.

Thats it. Now you acquired some basic understanding of AWK.

Learning AWK can be extremly useful in your IT career. I use it all the time in my projects when the clients ask for data obtained from the logging system in a specified format. The Action part of the AWK helps in Formatting and also writing some while loops and if conditions like a traditional c program. If you know AWK you depend less and struggle with GUI tools like Notepad++ which aren't that helpful always.

SED

SED is a streamline data editor. Its is as powerful and similar to AWK. I feel its speed is faster than AWK, but it is used when you want to do simple tasks with one-liner statements which does not has complex looping logic based on conditional statments. The major difference between AWK and SED is that, AWK is a like an independent programming all by itself.

So a question arises, when to use AWK and when to use SED?

The answer is not simple, but whenever you are on a time crunch just go with SED. A great tip is to use SED for string replacement based on pattern matches.

Let's see a short usecase. Here is the full SED documentation SED Official Manual


 # If you are operating on a single file, then a string replacement can be performed as below.
 $ sed -i 's/abc/omkaram/g' filename

 # Here the string abc is being replaces by omkaram on a global. Tag -i is for case insensitivity.

 # If you want to work on multiple files in the same folder and subsequent folders, 
 # then either one or both of the two commands given below must work

 $ sed -i 's/abc/omkaram/g' *
 $ find ./ -type f -exec sed -i 's/Hello world/Omkaram Venkatesh/g' {} \;

 # In the above command sed is being used in conjunction with find, 
 # where find is executing sed for all the files and folders starting from the current directory

 # But what if you do not want to replace a string with another string, instead just what to find the matching words in a file? Like our AWK example?$_COOKIE
 $ find ./ -type f -exec sed -n 's/.*\(omkaram\).*/\1/p' {} \;

 # The above command instead of replacing text, it looks for the regex matched word "omkaram" in all the files and displays that exact match as output. But that alone won't be useful, because you wont be able to tell which line the match occured. But AWK can do that!.

Grep

Grep is a intelligent program which was created by the man whom I admire a lot. The concept of Grep is simple, just retrieve data from file or a pipe, based on regular expression.

Among AWK, SED & Grep, Grep is the fastest.

If you know regular expression then grep is pretty easy, but there are plenty of command line options to worry about. Here is the full Grep documentation Grep Official Manual

OpenSSL

OpenSSL as the name derives to Open Source Secure Sockets Layer is a program library used by browsers, operating system and other third party tools to securely communicate over TLS, HTTPS protocols.

It is highly intergrated with terminals of Unix based systems and is purely written in C.

The tool is used for performing cryptographic operations like,

Encryption
Hashing

There are a range of algorithm used for each operations. RSA, DH algorithm, AES, Blowfish, Twofish are popular for encryption. SHA, MD, CRC are some popular Hash Functions.

Also, Encryption is primarily divided into two types

Symmetric Key Encryption i.e, AES, Blowfish, Twofish
Asymmetric Key Encryption i.e, RSA, DH etc

There are variations of these algorithms based on the number of bits of data the algorithm considers for doing the task.

Below are the commands to encrypt using AES

$ openssl enc -nosalt -aes-256-cbc -in inputFile -out encryptedFile -base64 -k #4&per.@#% -pbkdf2 
$ openssl enc -d -nosalt -aes-256-cbc -in encryptedFile -out originalFile -base64 -k #4&per.@#% -pbkdf2

In the above two commands the first command encrypts the inputFile provided. The second command decrypt. The commands are using AES 256 bit version of algorithm. The tag -pbkdf2 is a password based key derivation function 2. The tag -k is used to provide a normal password value instead of hex value. Tag -d means to decrypt. Tag -base64 to convert the -k key provided to a base64 string.

RSA is complicated. Its an Asymmetric key based public private encryption algorithm which is highly mathematical and is dominantly used in SSL/TLS communication. It uses concepts like modulo arthmetic, eulers totient, phi function, public and private keys.

Now lets see how to use RSA via OpenSSL. If you want to learn how the algorithm works in detailed, I have video on my channel. The link for the channel is given in the contact page.


 # The first step is to generate a private key for the person A.
 # Explanation for the options used
            1. genpkey        - short form to generate private key
            2. -algorithm RSA - choosing RSA as the algorithm
            3. -pkeyopt rsa_keygen_bits:2048 - Specifies how long your private key length is going to be.
            4. -pkeyopt rsa_keygen_pubexp:5  - Choosing the public exponent as 5
            5. -out private-key-A.pem        - Specify the output file of the private key

 $ openssl genpkey -algorithm RSA -pkeyopt rsa_keygen_bits:2048 -pkeyopt rsa_keygen_pubexp:5 
  -out private-key-A.pem

 # To view the base64 output of the private key try cat

 $ cat private-key-A.pem

 # To view the hex code of the private key modulus, exponents and coefficients, try the below

 $ openssl pkey -in private-key-A.pem -text | less

 # Next step is to extract the public key out from the private key. 

 $ openssl pkey -in private-key-A.pem -out public-key-A.pem -pubout

 # Outputs the public key in base64

 $ cat public-key-A.pem

 # To view the hex code of the public key

 $ openssl pkey -in public-key-A.pem -text | less

 # After generating both the public and the private keys, person A sends the public key to person B.
 # Person B uses this public key of A and encrypts a message.txt file and sends the encrypted version 
  back to peson A

 $ cat > message.txt 
 Hi good morning

 $ openssl pkeyutl -encrypt -in message.txt -pubin -inkey public-key-A.pem -out cipher.bin
 
 # cipher.bin is the encrypted version of message.txt. 
 # To decrypt the same file, person A uses his private key using the below command

 $ openssl pkeyutl -decrypt -in cipher.bin -inkey private-key-A.pem -out decrypted-message.txt

 # View the decrypted message
 $ cat decrypted-message.txt

Note – This is OpenSSL version 1.1.1k. OpenSSL or for the fact given any commandline utility program, the options/arguments are subjected for changes. Please check the version of the OpenSSL you are using.

cURL and wget

Both the programs are more or less used for the same purpose, which is to call a service exposed on a web protocol like HTTP or FTP and do operations content downloading, HTTP method based actions etc.

Here is a pro tip.

whenever you have an API (Application Programming Interface) exposed as an HTTP endpoint and you want to perform specific actions like

POST
PUT
GET
DELETE
PATCH

Then use cURL.

If you want to download files such as videos, html pages etc from the internet, use wget. The most useful aspect of wget is the ability to recursive download file and folder from a given root location.

Here are the man pages for cURL, wget

Now lets look at some examples.


 # Fetch OAuth based Token from an API to use the response and call another API. 
 # Here I am showing just the first API call. The curl is self explanatory.
 # `request` variable holds the response of the API call.

 $ request=$(curl --write-out "%{http_code}" --connect-timeout 30 \
      --retry 5 \
      --retry-delay 15 \
      -H "Content-Type: application/x-www-form-urlencoded" \
      -d "client_id=$clientId&client_secret=$clientSecret&grant_type=$grantType&scope=$scope"  \
      -X POST "https://login.microsoftonline.com/3343abc-3d34-4e5c-34e9-62389a112bc/oauth2/v2.0/token")


 # Here is the second API call, the major difference is how we pass the token with Authorization Header,
 # and using POST method to submit data to the API. The response is stored in shell variable `apiResponse`.

 $ apiResponse=$(curl --write-out "%{http_code}" --connect-timeout 30 \
      --retry 0 \
      --retry-delay 15 \
      -H "Accept:application/json" \
      -H "Authorization:Bearer $token" \
      -H "Content-Type: application/json" \
      -X $method $url --data ${data})


 # Example of recursive download of a website

 $ wget -r https://google.com

Git

Everyone knows git. These days git is being intergrated into various IDEs that it literally became irrelavant to perform git actions via git commands. Nevertheless, we must be conscious while applying changes to our repositories. UI based git actions can be easy, but will not give command level flexibility.

Now lets see some git commands. I expect you to use git bash to do the below.


 # To clone a git repository from the master branch

 $ git clone https://somegiturl.git

 # To use a different branch or switch branch by creating a new branch. Remove -b to switch to existing branch

 $ git checkout -b "feature-branch"

 # If other devs pushed their own changes to the master and you are still working on your feature.
 # If the changes made to the master are not your related, or even the same filename. Those content will be added 
   to your feature.
 # If the changes made to the master is in the same file you are working on your feature. Such content will be 
   overwritten depending on when you applied your changes in your local. So, if your pull after making your 
   changes in the local and did not yet commit to upstream, those local changes would be lost.

 $ git pull orgin master

 # To add all your local changes before commiting and push.

 $ git add .

 # To view all the changes

 $ git status

 # To commit your added changes before push. Note that you dont have to push every commit. 
   You can do all at once.

 $ git commit -m "Some commit message"

 # To push commits to upstream for the first time after master clone

 $ git push origin feature-branch 

 or 

 $ git push --set-upstream origin feature-branch

Now let's see how to merge commit. But why do we do it?. The answer is simple. We do it to reduce unnecessary trouble to the reviewer of the pull request. Also, you do not want to have very small changes displayed as its own changes.

Remember, sometimes merging commits is not a good idea. Especially when your reviewers already reviewed and gave some comments


 # To merge last 5 commits

 $ git rebase -i HEAD~5 // An interactive editor opens displaying all the commits with its commit ids
                      // You need to edit the merges using `squash` or `fixup`

Lastly, the biggest challenge while managing repositories is to handle merge conflicts.

Resolving merge conflicts is very difficult and these needs to be handled manually. Same as git rebase, merge conflicts will be addressed as a conflict with the merge operation. The bash will show a merging error when the same file is edited both in master and local.

To resolve merge conflicts, you need to go to the conflicting file and look for conflict markers.


 # This is how the conflict markers look. You need to decide which ones you would like to keep.

  Some content with no conflicts
  <<<<<<< HEAD
  Data present in the local. This is the data you are trying to push
  ========
  Data present in the upstream. In our caser this is the master.
  >>>>>>> HEAD

 # Say, you have decided to keep your local changes. Then you must remove the part after =======

 Data present in the local. This is the data you are trying to push

 # Later, you need to save the file and add the file in the bash
 
 $ git add the_conflict_file
 $ git push

Page 1 of 2