Basics of Linux OS

These are the things quite confusing or abstract while I was at school but now makes lots of sense after many years working with different flavours of OS.

GPL and BSD as software license types

The main difference is that BSD (Berkeley Software Distribution) is a permissive (non-protective) license, while GPL(GNU General Public License) is a copyleft (protective) license.

Permissive licenses do not protect the code from being used in non-open source apps and apply no restrictions on the derivatives, while copyleft licenses force the creator of derivatives or re-distributor of the software to open the modified code. Under GPL you can’t sub-license, meaning, you can’t change any of the original license terms or introduce any of your own. You’re also required to state all the changes you make to the original code. That is why components licensed under GPL and other copyleft licenses should be avoided in commercial products that would later be distributed under proprietary licenses.

The BSD license family (including the Modified BSD License), on the other hand, doesn’t compel you to do any of the above. They have fairly relaxed redistribution terms.

GNU/Linux and BSD as operating systems

Unix – the name of the original system designed at AT&T in the 1970s. At the time, it featured a great deal of novelties such as multi-tasking, multi-user support, time sharing, etc. It was made portable by using C language.

Linux Kernel – a free, open-source, monolithic, Unix-like OS kernel. It was conceived and created in 1991 by Linus Torvalds for his personal computer. The word Linux, technically, is just the kernel. By itself, Linux (Kernel) has no place for user to land (e.g. no apps, no commands)

GNU/Linux – GNU project has developed a comprehensive set of free software tools for use with Unix and Linux. GNU/Linux involves the software tools along with the Linux Kernel.

GNU/Linux distribution – ready-to-use full OS, including Linux Kernel, GNU library and tools, whose developers have made a commitment to follow GNU GFSD, X Window and desktop environment (e.g. KDE, GNOME) and includes many pieces of software. GNU/Linux distro is what many people refer to as “Linux”, and it includes Debian, Ubuntu, RedHat and CentOS.

BSD (Berkeley Software Distribution) – an OS based on Research Unix, originally developed at Bell Labs, eventually grown into a complete operating system. Today, “BSD” often refers to its decendants, such as FreeBSD, OpenBSD, NetBSD, or DragonFly BSD. Each of these are both a kernel and an operating system. Another famous BSD descendant is Darwin, which is what Mac OS X based on.

Comparing “BSD” with “Linux” – Linux is more popular and tends to support new hardware sooner. Typical users usually don’t feel the difference between them. FreeBSD as desktop OS uses the same GNOME, KDE, or Xfce desktop environments that many flavours of Linux use as well. Although you need to install the desktop environment yourself. Another important difference is the licensing model as mentioned above between GPL and BSD. This article is a good reference about all the differences bewteen “BSD” and “Linux”

Swap, Cache and Buffer

Swap – swap file or swap partition. The primary function is to substitute disk space for RAM memory when real RAM fills up and more space is needed. The kernel uses a memory management program that detects blocks, aka pages, of memory in which the contents have not been used recently. The memory management program swaps enough of these relatively infrequently used pages of memory out to a special partition on the hard drive specifically designated for “paging”, or swapping. This frees up RAM and makes room for more data to be entered into your spreadsheet. Those pages of memory swapped out to the hard drive are tracked by the kernel’s memory management code and can be paged back into RAM if they are needed.

  • Swapping moves entire process between main memory and secondary storage; this is the original Unix method and can cause severe performance loss;
  • Paging moves small unites of memory (i.e. pages with 4Kbytes). It is more efficient and was added to BSD

In both cases, least recently used memory is moved to secondary storage and back to main memory only when needed again. In Linux, the term swapping is used to refer to paging. Older Unix-style swapping of entire thread and process is no longer supported.

CPU Cache – a hardware cache used by CPU to reduce the average time to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have different independent caches, inclusing insructions and data caches, where the data cache is usually organized as a hierarchy of more cache levels (L1, L2, L3, L4, etc)

Page Cache (or disk cache) – kept by the OS in computer’s main memory and controlled by the computer. The OS keeps a page cached in otherwise unused portions of the main memory, resulting in quicker access to the contents of cached pages and overall performance.

Disk Buffer – (or ambiguously called disk cache or cache buffer) the embedded memory in a hard disk drive acting as a buffer between the rest of the computer and the physical hard disk platter that is used for storage. Modern hard disk drives come with 8 to 256 MiB of such memory. Disk buffer is physically distinct from and is used differently from page cache. It is controlled by the microcontroller in the hard disk drive.

GRUB (GNU GRand Unified Bootloader) – a boot loader package from GNU project. It is predominantly used for Unix-like systems. Current version is GRUB2

System call – In computing, a system call is the programmatic way in which a computer program requests a service from the kernel of the operating system it is executed on. This may include hardware-related services (for example, accessing a hard disk drive), creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system. Below is a list of key system calls:

System CallDescription
read()read bytes
write()write bytes
open()open a file
close()close a file
fork()create a new process
exec()execute a new program
connect()connect to a network host
accept()accept a network connection
stat()fetch file statistics
ioctl()set I/O properties, or other miscellaneous functions
mmap()map a file to the memory address space
brk()extend the heap pointer

strace is the tool to trace system calls and signals in Linux.

sysVinit, runit and systemd

init – in Unix-based OS, init is the first process started during booting of OS. Init is a daemon process that continues running until the system is shutdown. It is the direct or indirect ancestor of all other processes and automatically adopts all orphaned processes. Init is started by the kernel during the booting process; a kernel panic will occur if the kernel is unable to start it. Init is typically assigned process identifier 1 and its job is to start other programs that are essential to the operation of your system. All other processes are descended from init.

init systems
Linux has several options as init systems. For example: sysvinit, runit, systemd and upstart. Here is a comparison of commands involved in managing each.

SysV init – System initialization process is handled by the init daemon. One of the original daemon is SysVinit, which is a collection of System V-style init programs. init process starts serially. It is a run-once process during the start of the OS. One task starts only after the last task startup was successful and it was loaded in the memory. This often resulted in delayed and long booting time.

Runit – an init scheme for Unix-like operating systems that initializes, supervises, and ends processes throughout the operating system. It is a replacement of sysvinit and features brevity and simplicity.

Systemd – A init replacement daemon designed to start process in parallel, implemented in a number of standard distribution – Fedora, OpenSuSE, Arch, RHEL, CentOS, etc. Its flexibility comes with more complexity. It is an event driven init system, that not only starts stuff at boot (hence managing dependencies), but also after that. It also keeps track of many things after boot, such as mounts, availability of services, integration with resource management, etc. Because of that, systemd is also good at logging and monitoring. Systemd allows services to start when:

  • the system boots
  • a hardware components attaches to the system
  • other service started
  • a timer fires

To determine which system initialization method your current Linux distribution is using (SysVinit or systemd), simply check process id 1:

$ ps -p 1

Soft link and hard link

inode – stores the attributes and disk block locations of a file or directory

Soft/Symbolic link – essentially a shortcut to another file. The link itself is a separate file, pointing to the destination file or directory. The inode of the file is different from that of the symbolic link. Deleting the destination file will leave the symbolic link file a “dangling link”. Symbolic file may also have different permissions from the destination file or directory.

Hard link – essentially an alias of a file. The link itself is not a separate file, and the destination can only be a file (no directory). The inode of the link the the same as the file itself. So there is actually no distinction between destination file and link. Both files are equal. If you delete the file, the link continue to work until the number of hard links to the file becomes zero.

Soft link points to a file by name whereas hard link points by inode number.