Table of Contents

Background

  • Unix is a multiuser operating system where each user has its own private space on the machine’s harddisk and are identified by an id number.
  • All users have a user name and a password and are required to enter them correctly before using the computer. The user with id 0 is the system’s superuser or administrator and its traditional name is root.
  • UNIX processes act on behalf of the initiating user.
  • Multiple users can be running multiple processes at the same time.
  • Users are organised in groups. A group is a set of users that share the same class of permissions.
  • Everything in UNIX is represented by files. Everything can be:
    • Configuration files. Both user profile and system/server configuration files are plain text files. This allows to easily backup/restore/compare configuration files and remote administration using low-bandwidth text consoles.
    • Devices. Access is done using regular file I/O operations on filesystem objects (under /dev) that represent the real devices. For example cat file.wav >/dev/sndcard plays an audio file directly to the soundcard or cat file.txt|lpr prints it to a printer.

Filesystem

  • The UNIX file system is the same in all Unix versions:
|--bin          Binaries required before mounting /usr
|--etc          System wide configuration files
|--home         Users’ home directories
|--lib          Libraries required by the system
|--tmp          Temporary files. Everyone has RW access.
|--usr          Programs
| |--bin        Programs’ executables
| |--lib        Programs’ libraries
| |--local      Programs that are install locally
| |   |-bin,lib,share
| |--share Programs’ required files (e.g. docs, icons)
| |--sbin  System administration programs
| |--src   Source files for the kernel and programs
|--var          Temporary space for running programs

Permissions

$ ls -la /bin/ls
-rwxr-xr-x 1 root wheel   38624   Jul 15 06:29     /bin/ls
\        /     |      |       |   \          /     |
Permissions  Owner  Owning  Size     Date of     Filename
                    group          last modification
Three permission triads (with r: readable, w: writable, x: executable)
first triad what the owner can do
second triad what the group members can do
third triad what other users can do
  • The command stat -c %a ./ will show the numeric octal representation of the premissions of ./ directory.
    • ls -l ../ will show the permissions in the alphanumeric notation as in the first example together with all the other directories and files in the same level of ./
  • paths can be:
    • Absolute: starting from the root directory / e.g. /var/log/messages - The system log file
    • Relative to the current directory . e.g. if the current directory is /var, the relative path to the system log is ./log/messages or log/messages

File commands

  • ls: list files in a directory
    • ls -l: list details
    • ls -a: list hidden files (files that start with .)
  • find <dir>: walk through a file hierarchy starting from <dir>
    • find <dir> -type [dfl]: Only display directories, files or links
    • find <dir> -name str: Only display entries that start with str
    • find <dir> -{max|min}depth d
  • touch <file>: Create and empty file named <file> or update the modification time for the existing file <file>
  • cp <from> <to>: copy file or directory <from> to the location specified by <to>
    • -R: copy directories recursively
    • -p: preserve filesystem permissions and attributes
  • mv <from_1> · · · <from_n> <to>: move files or directories <from*> to directory <to>
    • -n: do not overwrite existing files
  • mkdir: create an empty directory in the path specified by the . The path to the directory must be pre-existing.
    • -p: also create intermediate directories as required

Pipes |

  • Forward a program’s STDOUT line-by-line to the input of another program. This allows us to combine commands in surprising ways, using the pipe (|) operator.
    • cat file |wc -l: Count lines of file
    • find / -type d | sort: View all directories sorted
    • cat /var/log/access_log |grep foo|tail -n 10: See the last 10 accesses from the host “foo” to our system’s web server
    • cat /etc/passwd |cut -f1 -d':'|sort: Get a sorted list of the system’s users.

Text file processing commands

  • cat file_1 · · · file_n: concatenate and print files to standard output (i.e. the terminal unless < and > are used determining other output)
    • send a program’s output to a file (>) or make a program read from a file (<)
    • (>) Overwrites an existing file; to append, we use (»)
  • less file: displays a file on the screen allowing to browse it on both directions.
    • q exit
    • /pattern search for pattern in text, pressing / repeatedly moves through all occurrences.
  • echo <string> write arguments to standard output
  • head, tail <file>: display first/last lines of <file>
    • -n Number of lines to display
    • -f Display newly appended lines
  • tr character translator; can convert or delete specific characters
    • -s: replace repeating characters into
    • -d: delete a character
$ echo "foo bar" | tr 'o' 'a'
faa bar
# Replace tabs with spaces
$ tr '\t' ' ' < file.txt
# Remove all instances of #
$ tr -d '#' < file.txt
  • cut allows us to split a line into columns, given a character, and extract specific fields.
# Get a list of users and home directories
$ cut -f1,6 -d: /etc/passwd
# Get details for all users that are running java
$ ps -ef|tr -s ' '|grep "java"|cut -f3,11  -d' '
  • sed Modifies a string at its input in various ways using pattern matching
# Replace foo with bar in the input file
$ sed -e 's/foo/bar/' < file.txt

# Change the order of columns in a 2 column file
$ sed -e 's/^\(.*\) \(.*\)$/\2 \1/' < file.txt

# Remove lines 3 and 5 from the input
$ sed -e '3d' -e '5d' < book.txt
  • sed is a domain specific language of its own. You can find a thorough manual here.

Sorting data

  • sort writes a (lexicographical) sorted concatenation of all input files to standard output, using Mergesort
    • -r: reverse the sort
    • -n: do a numeric sort
    • -k and -t: merge by the nth column (argument to -k). -t specifies what is the separator character
  • uniq finds unique records in a sorted file
# Print the 10 most used lines in foo
$ cat foo| sort | uniq -c |sort -rn |head -n 10

# Sort csv file by the 6 field
sort -n -k 6 -t ','  datasets/file.csv

Joining data

  • join joins lines of two sorted files on a common field
    • -1, -2 specify fields in files 1 (first argument) and 2 (second argument) that represent keys
$ cat foodtypes.txt
3 Fat
1 Protein
2 Carbohydrate

$ cat foods.txt
Potato 2
Cheese 1
Butter 3

join -1 1  -2 2 <(sort foodtypes.txt) <(sort -k 2 foods.txt)
  • Practically, join performs a join operation on KV pairs.

Process management

  • UNIX can do many jobs at once, dividing the processor’s time between the tasks so quickly that it looks as if everything is running at the same time. This is called multitasking.
  • The UNIX shell has process management capabilities. When running a process, pressing Ctrl+Z will suspend it.
  • A process can be killed with Ctrl+C
  • A process can be started at the background by appending a & after the command. i.e. find / |sort &
  • ps See which processes are running
 UID   PID  PPID   C STIME   TTY           TIME CMD
    0     1     0   0 28Nov17 ??        21:20.80 /sbin/launchd
    0    51     1   0 28Nov17 ??         0:41.63 /usr/sbin/syslogd
    0    52     1   0 28Nov17 ??         2:19.32 /usr/libexec/UserEventAgent (System)
  • kill -<singalno> <pid>: Send a signal to a process Important signals:
    • TERM: informs the process that it should terminate.
    • KILL: directly kill a process

Documentation

  • UNIX is a self-documenting system. All commands/tools have a manual page that describes their arguments, input and output formats and sometimes, even programming interfaces. man <cmd> invokes the manual for a command.
$ man ls
LS(1)                            User Commands                           LS(1)

NAME
       ls - list directory contents

SYNOPSIS
       ls [OPTION]... [FILE]...

DESCRIPTION
       List  information  about  the FILEs (the current directory by default).
       Sort entries alphabetically if none of -cftuvSUX nor --sort  is  speci-
       fied.
  • Unix systems are also traditionally documented by providing full access to the source code that comprises them.

Running a command per input line

  • xargs cmd will run cmd on each line in STDIN
# Get file size statistics for the current directory
$ find . -type f -maxdepth 1 |xargs wc
  • xargs by default appends each line at the end of cmd. Some times, it may be necessary to append it in the middle. We use the -I {} option and
$ find . -type f -maxdepth 1|xargs -I {} echo File {} is in `pwd`
File ./labcontents.doc is in /Users/gousiosg/Documents/course-material/isrm
File ./Makefile is in /Users/gousiosg/Documents/course-material/isrm
[...]
  • xargs can process things in parallel with -P option.
  • In terms of data processing, xargs is the equivalent of map

Filtering lines with patterns

  • grep prints lines matching a pattern
    • -v: invert search result (only print those that DO NOT match the pattern)
    • -i: make matching case insensitive
    • -n: print the line number of the match
    • -R: recurse a directory structure
# Find all processes run by user 501
$ ps -ef | egrep "^ +501"

# Find all files that extend class Foo
$ grep -Rn "(Foo)" * | grep *.py

# Same, more efficient
$ find . -type f -name '*.py' | xargs grep -n "(Foo)"

# Even more efficient
$ grep -Rn "(Foo)" *.py

Regular expressions

  • . Match any character once
  • * Match the previous pattern 0 or more times
  • + Match the previous pattern 1 or more times
  • [e-fF-M] Match any character in the (ASCII) range F-M or e-f
  • [^e] Match all characters except e
  • ^ and $ Match the beginning or the end of the line, respectively
  • () Group together items for future reference
  • | Match either the left or the right group

Task-based tools

Execute a command on a remote host

  • ssh provides a way to securely login to a remote server and get a prompt. In addition, it enables us to remotely execute a command and capture its output
# List of files on host dutihr
ssh dutihr ls
  • Firewall piercing / tunneling: We can use ssh to access ports on machines where a firewall blocks them
# Connect to port
$ ssh -L 27017:mongoserver:27017 mongoserver

# On another terminal
$ mongo localhost:27017

Retrieve contents from URLs

  • curl queries a URL and prints the raw contents on the terminal
    • -H Set an HTTP header, e.g. “Authorization: token OAUTH-TOKEN”
    • -i Display all headers received
    • -s Don’t anything except from the response
curl -i "https://api.github.com/repos/vmg/redcarpet/issues"
  • We can then process contents with a pipeline
# Get all magnet links from a page
curl -s https://thepiratebay.org/browse/101 | # Get contents
tidy 2>/dev/null |                            # Tidy up HTML
grep magnet\:\? |                             # Only get links
tr -d '"'                                     # Remove quotes

Querying JSON data

  • json_pp pretty-prints JSON files
  • jq uses a Domain Specific Language (DSL) to query tree structures in JSON files.
# Extract information for a Cargo package descriptor
curl -s "https://crates.io/api/v1/crates/libc" |
jq -M '[.crate .id, .crate .repository, .crate .downloads|tostring]|join(", ")'


"libc, https://github.com/rust-lang/libc, 7267424"

Syncronizing files across hosts

  • rsync can be used to sync files between directories
    • -a archive mode, preserve permissions and access times
    • -v display files changed
    • –delete

Run a command when a directory changes

  • inotifywait watches a directory for changes and prints a log of the changes
    • -m enables monitor mode (run forever)
    • -r watch directories recursively
## See changes to your database files
inotifywait -mr --timefmt '%d/%m/%y %H:%M' --format '%T %w %f' /mongo

## Copy all new files in the current directory to another location
inotifywait -mr . |
grep CLOSE_WRITE |
cut -f1 -d' ' |
xargs -I {} cp {} /tmp

Writing programs

  • Bash is the name of the default command interpreter on most Unix environments. Bash is an almost complete programming language, with an interesting caveat: Many of its operators are programs that can be run individually

Variables

  • Variables in bash are strings followed by =, e.g. cwd="foo" and are dereferenced with $, e.g. echo $cwd.
# Store the results of running ls in a variable
listing=`ls -la`
echo $listing
  • An interesting set of variables are called environment variables. Those are declared by the operating system and can be read by all programs. The user can modify them with the export program.
$ export |grep PATH
$ export PATH=$PATH:/home/gousiosg/bin
$ export |grep PATH

Conditionals

  • Bash supports if / else blocks
if [ -e 'test' ]; then
  echo "File exists"
else
  echo "File does not exist"
fi
  • [ is an alias to the program test.
    • [ $foo = 'test' ]: Tests string equality
    • [ $num -eq 3 ]: Tests number equality
    • [ ! expression ]: Negates the expression

Loops

  • The for loop iterates over all items in the list provided as argument:
# Print 1 2 3 4...
for i in `seq 1 10`; do
  echo $i
done

# Iterate over all files in a directory
for i in $(ls); do
  echo `file --mime $i`
done
  • while executes a piece of code if the control expression is true
ls -fa |tr -s ' '|cut -f9 -d' '|
while read file; do
  echo `file --mime $file`
done

Command line input

  • bash maps special variables on command line inputs: $0 is the program name, $1 is the first argument, $2 the second etc. More complex command lines (e.g. with switches) can be done with getopt.
#!/usr/bin/env bash

argA="defaultvalue"
while getopts ":a" opt; do
  case $opt in
    a)
      echo "-a was triggered!" >&2
      argA=$OPTARG
      ;;
    \?)
      echo "Invalid option: -$OPTARG" >&2
      ;;
  esac
done
  • The program above also illustrates the use of case

Cheatsheet (commands in no particular order)

Legend for some commands (do not include the brackets) : [optional field], <mandatory field> #comments. The dollar sign indicates that you are not running the command as ‘root’ user (user that can change “kernel” components i.e. brightness, firewall, hardware stuff…).

  • Print argument on screen
    $ echo hi
    hi
    
  • Abort command with Ctrl + C
    $ echo hi^C
    
  • Clear the terminal with Ctrl + L
  • Navigate through previous commands with and
  • Move command line pointer to the the start with Ctrl + A
  • Move command line pointer to the the end with Ctrl + E
    • Does not work in VSC terminal
  • Show date
    $ date
    Sat Feb 13 17:53:39 CET 2021
    
  • Time (stopwatch) a program
    $ time echo hi
    hi
    real    0m0.000s
    user    0m0.000s
    sys     0m0.000s
    
  • Print current (“working”) directory
    $ pwd
    /mnt/c/Users/sergio/OneDrive/Desktop/skirienkopanea.github.io
    
  • Print $PATH variable (which contains directories where shell checks for programs)
    $ echo $PATH #among other paths in a : separated list.
    /usr/bin/ 
    

    A path that starts with / is called an absolute path. Any other path is a relative path. Relative paths are relative to the current working directory, which we can see with the pwd command and change with the cd command. In a path, . refers to the current directory, and .. to its parent directory.

  • Print path of the program
    $ which echo
    /usr/bin/echo
    
  • Show manual of the program
    $ man <program>
    
  • Show program help
    $ <program> --help #it's up to the program to provide it though.
    
  • List all files and folders in a directory. -a for hidden and -l for more details
    $ ls [path] #if no path then it uses the working directory
    $ ls /usr/bin/
     resizecons       systemd-escape         appres           ec2metadata
     resizepart       systemd-hwdb           apropos          echo
    
  • Change directory
    $ cd [folder] #No folder sends you to home (~) directory
    
  • Go to previous directory
    $ cd -
    
  • Rename/move file
    $ mv <filepath old> <filepath new>
    
  • Copy file
    $ cp <filepath origin> <filepath destination>
    
  • Remove file
    $ rm <filepath>
    
  • Remove directory (non-recursive)
    $ # $ rm test
    $ # rm: cannot remove 'test': Is a directory
    $ rmdir test
    rmdir: failed to remove 'test': Directory not empty
    
  • Remove directory (recursive)
    $ rm <directory path> -r
    
  • Make new directory
    $ mkdir <directory path>
    
  • Direct program output to a file with >
    $ man ls > ls_documentation.txt
    
  • Print contents of a file in the terminal
    $ cat <filepath>
    
  • Define program input from a file with <, and print it on the terimnal
    $ cat < ls_documentation.txt
    NAME
         ls - list directory contents
    SYNOPSIS
         ls [OPTION]... [FILE]...
    
  • Combination of < and >
    $ cat < input_file.txt > output_file.txt
    $ cat < input_file.txt > output_file.txt #overwrites
    $ cat < input_file.txt > output_file.txt #overwrites
    $ cat < output_file.txt #print output on the terimanl
    hello there
    
  • Append with >>
    $ echo abc >> output_file.txt
    $ echo abc >> output_file.txt
    $ echo abc >> output_file.txt
    $ cat < output_file.txt #print output on the terimanl
    hello thereabc
    abc
    abc
    
  • Print last -n<k> lines of its input (by default a file declared in the first paramater)
    $ tail -n1 output_file.txt
    abc
    
  • Print first -n<k> lines of its input (by default a file declared in the first paramater)
    $ head -n1 output_file.txt
    hello thereabc
    
  • Pipe character | takes left program output as input for right program
    $ ls --help | tail -n2 > output_file.txt
    $ cat output_file.txt
    Full documentation at: <https://www.gnu.org/software/coreutils/ls>
    or available locally via: info '(coreutils) ls invocation'
    
  • Execute command as superuser
    $ sudo <command>
    
  • Open a shell terminal as root user
    $ sudo su
    [sudo] password for sergio:
    #
    
  • Exit
    exit
    
  • Store input to a file but also keep it for other uses (by default print it on the terminal)
    $ echo hi | tee log.txt
    hi
    
  • Search for files and directories
    $ find -name '*name*'
    
  • Open file in default program
    $ xdg-open <filepath>
    
  • Update last modified time / create new file
    $ touch <filepath>
    

Sources: Gousios’ Unix Summary and The Missing Semester of Your CS Education

Pipe vs <

sergio@hp:~/cg$ cat fib.c
#include <stdio.h>

int main(void) {
  int x, y, z;

  
    x = 0;
    y = 1;
    do {
      printf("%d\n",x);

      z = x + y;
      x = y;
      y = z;
    } while (x < 255);
  
}
sergio@hp:~/cg$ ./fib | head -n5 > head.txt # Taks output of ./fib as input for program head, then program output is sent to a file (overwrites it)
sergio@hp:~/cg$ cat head.txt
0
1
1
2
3
sergio@hp:~/cg$ ./fib <  head -n5 > head.txt # Takes file "head" as input for program ./fib (program fib ignores any input anyway. Note that we no longer run head program). Then output ./fib to head.txt
sergio@hp:~/cg$ cat head.txt
0
1
1
2
3
5
8
13
21
34
55
89
144
233
sergio@hp:~/cg$ ./fib | head -n6 | tee  head.txt | wc -l # Tee stores input to a file but also keeps it for other uses (it can be passed with pipes to another program)
6
sergio@hp:~/cg$ cat head.txt
0
1
1
2
3
5
sergio@hp:~/cg$ 


  • *I have an empty file named “head” to avoid error message in the last example of wrong command

Scripts

#!/usr/bin/env bash
# requires to have installed youtube-dl and ffmpeg

videos=$(cat list.txt)

for i in $videos; do
    snap run youtube-dl --extract-audio --audio-format mp3 $i || echo $i >> errors.txt
done

Zip all .sh files in directory

#!/usr/bin/env bash
# Takes an output name as first parameter
# If called in a directory it recursively finds all the .sh files and adds them to a zip
find -name '*.sh' | zip -r -j -@ $1