There are no items in your cart
Add More
Add More
| Item Details | Price | ||
|---|---|---|---|
Whether you are preparing for a role at Qualcomm, Nvidia, MediaTek, NXP, or Texas Instruments — the Linux kernel is at the heart of every embedded Linux interview. This post covers the top 50 Linux kernel interview questions with detailed answers and real code examples, organised by topic. Bookmark this page and work through it systematically.
The Linux kernel is the core of the OS. It runs in privileged mode (Ring 0) with full access to hardware. User space programs run in Ring 3 (unprivileged) and access hardware only through system calls.
User Space: app → libc → syscall → kernel boundary Kernel Space: kernel → device drivers → hardware
Monolithic kernel (Linux): all OS services (file system, drivers, networking) run in kernel space. Fast but large. Microkernel (QNX, Mach): only basic IPC/scheduling in kernel, everything else in user space — safer but slower. Hybrid (Windows NT, macOS XNU): mix of both approaches.
Kernel Address Space Layout Randomisation (KASLR) randomises the base address of the kernel image at boot time. This makes it harder for attackers to exploit kernel vulnerabilities by guessing addresses.
CONFIG_RANDOMIZE_BASE=y # Enable KASLR in kernel config
Each kernel thread has a small stack — typically 8 KB on 32-bit and 16 KB on 64-bit systems. Stack overflow causes a kernel panic (oops). To detect it, the kernel uses stack canaries and KASAN (Kernel Address Sanitizer).
kmalloc() or vmalloc() instead.Process context: the kernel is executing on behalf of a process. Can sleep, schedule, and access user memory. Interrupt context: the kernel is handling a hardware interrupt. Cannot sleep, must not block.
// Process context — OK to sleep mutex_lock(&my_mutex); // Interrupt context — use spinlock only spin_lock_irqsave(&my_lock, flags);
__init marks a function as initialisation code. After the kernel boots and calls it, the memory it occupied is freed. __exit marks cleanup code that is discarded if the module is built-in (not loadable).
static int __init my_driver_init(void) { ... }
static void __exit my_driver_exit(void) { ... }
module_init(my_driver_init);
module_exit(my_driver_exit);
Kernel threads run entirely in kernel space, have no user address space, and are created with kthread_create(). They perform background tasks like kworker, ksoftirqd. User threads run in user space and are scheduled by the kernel.
struct task_struct *t = kthread_create(my_fn, NULL, "my_kthread"); wake_up_process(t);
PREEMPT_NONE: kernel code runs to completion unless it voluntarily yields (server kernels). PREEMPT_VOLUNTARY: adds explicit preemption points for lower latency. PREEMPT: fully preemptible kernel, best for embedded real-time use (lowest latency).
An LKM is object code that can be dynamically loaded/unloaded into the running kernel without rebooting. A built-in driver is compiled directly into the kernel image and cannot be removed. LKMs use .ko extension.
insmod my_driver.ko # Load module rmmod my_driver # Unload module modinfo my_driver.ko # Show module metadata lsmod # List loaded modules
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
static int __init hello_init(void)
{
pr_info("Hello, Kernel!\n");
return 0; /* 0 = success; negative = error */
}
static void __exit hello_exit(void)
{
pr_info("Goodbye, Kernel!\n");
}
module_init(hello_init);
module_exit(hello_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("EmbeddedShiksha");
MODULE_DESCRIPTION("Minimal kernel module example");
MODULE_LICENSE declares the licence of the module. The kernel exports some symbols only to GPL-compatible modules (marked EXPORT_SYMBOL_GPL). Without GPL, your module cannot access these symbols and will get a "kernel tainted" warning. Non-GPL modules using GPL-only symbols will fail to load.
#include <linux/moduleparam.h> static int speed = 100; module_param(speed, int, 0644); MODULE_PARM_DESC(speed, "Speed setting (default 100)");
# Load with parameter: insmod my_driver.ko speed=200 # Or modify at runtime: echo 300 > /sys/module/my_driver/parameters/speed
EXPORT_SYMBOL(func): makes the function available to all kernel modules including proprietary ones. EXPORT_SYMBOL_GPL(func): makes it available only to GPL-licensed modules. Most core kernel APIs use EXPORT_SYMBOL_GPL to enforce open-source compliance.
1. insmod or modprobe calls the finit_module() system call. 2. The kernel reads the ELF object, resolves symbols from the kernel symbol table. 3. The module's memory is allocated with vmalloc(). 4. Relocations are applied. 5. The module's init() function is called. 6. The module is added to the loaded modules list.
1. User app calls a glibc wrapper (e.g., read()). 2. glibc places the syscall number in rax and arguments in rdi, rsi, rdx.... 3. Executes syscall instruction — CPU switches to Ring 0. 4. Kernel's entry_SYSCALL_64 handler dispatches to sys_call_table[rax]. 5. Kernel function runs, result returned in rax. 6. CPU returns to Ring 3.
// Simplified view of a syscall handler
SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
{
struct fd f = fdget_pos(fd);
/* ... do the read ... */
return ret;
}
__user marks a pointer that comes from user space. The kernel must never dereference it directly — it must use copy_from_user() or copy_to_user(). Direct dereference can cause a kernel oops if the pointer is invalid, or a security vulnerability.
// WRONG — never do this:
char c = *user_ptr;
// CORRECT:
char c;
if (copy_from_user(&c, user_ptr, 1))
return -EFAULT;
copy_from_user(dst, src, n): copies n bytes from user space to kernel space. get_user(x, ptr): copies a single simple type (int, char, etc.) — more efficient for small values. Both return 0 on success or the number of bytes not copied on failure.
1. Define the syscall in a .c file using SYSCALL_DEFINE. 2. Add the syscall number to arch/x86/entry/syscalls/syscall_64.tbl. 3. Add the prototype to include/linux/syscalls.h. 4. Rebuild the kernel. (Note: adding syscalls to mainline requires RFC discussion on LKML.)
SYSCALL_DEFINE1(my_syscall, int, arg)
{
pr_info("my_syscall called with %d\n", arg);
return 0;
}
kmalloc(size, GFP_KERNEL): allocates physically contiguous memory. Fast, suitable for DMA, limited to ~4 MB. vmalloc(size): allocates virtually contiguous but physically non-contiguous memory. Slower, for larger allocations. kzalloc(size, flags): same as kmalloc but zeroes the memory.
void *buf = kmalloc(256, GFP_KERNEL); /* DMA-safe */ void *big = vmalloc(1024 * 1024); /* large buffer */ void *zero = kzalloc(sizeof(struct x), GFP_KERNEL); /* zeroed */ /* Always free with the matching function */ kfree(buf); vfree(big);
GFP_KERNEL: normal allocation, can sleep — use in process context only. GFP_ATOMIC: non-sleeping allocation — safe in interrupt context or with spinlocks held. GFP_DMA: allocates from the DMA zone (below 16 MB on x86) for legacy DMA devices.
The slab allocator manages caches of frequently-allocated kernel objects (e.g., task_struct, file, inodes). It avoids memory fragmentation and reduces alloc/free overhead. Modern kernels use SLUB (default) or SLAB.
/* Create a custom slab cache */
struct kmem_cache *my_cache;
my_cache = kmem_cache_create("my_obj", sizeof(struct my_obj), 0, 0, NULL);
struct my_obj *obj = kmem_cache_alloc(my_cache, GFP_KERNEL);
kmem_cache_free(my_cache, obj);
On 32-bit systems, the kernel address space is only 1 GB (of the 4 GB virtual space). Memory below 896 MB is permanently mapped (low memory) and directly accessible. Memory above 896 MB is high memory and must be temporarily mapped using kmap(). On 64-bit systems, this distinction is irrelevant as the full physical memory can be mapped.
A page fault occurs when a process accesses a virtual address that is not currently mapped to a physical page. The CPU raises a fault exception, the kernel's do_page_fault() handler runs: 1. If address is valid and the page is swapped out → swap it in. 2. If address is valid but page not yet allocated → allocate (demand paging). 3. If address is invalid → send SIGSEGV to the process.
The Out-Of-Memory (OOM) killer is invoked when the kernel cannot satisfy a memory allocation even after swapping. It selects a process to kill based on an oom_score (higher = more likely to be killed). You can tune it via:
# Adjust a process's OOM score (higher = more likely killed) echo 500 > /proc/<pid>/oom_score_adj # range: -1000 to 1000 echo -1000 > /proc/<pid>/oom_score_adj # protect from OOM killer
MMIO maps device registers into the CPU's physical address space. The kernel uses ioremap() to create a virtual mapping for the physical address, then accesses it with ioread32()/iowrite32().
void __iomem *base = ioremap(DEVICE_BASE_ADDR, 0x1000); u32 val = ioread32(base + REG_STATUS); iowrite32(0x1, base + REG_CTRL); iounmap(base); /* Always unmap when done */
task_struct is the kernel's process descriptor — one exists for every process and thread. Key fields:
struct task_struct {
volatile long state; /* TASK_RUNNING, TASK_INTERRUPTIBLE... */
pid_t pid; /* Process ID */
pid_t tgid; /* Thread group ID (= pid for main thread) */
struct mm_struct *mm; /* Memory descriptor (NULL for kthreads) */
struct files_struct *files; /* Open file descriptors */
struct task_struct *parent; /* Parent process */
int prio; /* Scheduling priority */
/* ... many more fields ... */
};
fork(): creates a child process with a copy-on-write copy of the parent's address space. vfork(): child shares parent's address space and the parent is suspended until child calls exec() or exits — faster but dangerous. clone(): the lowest-level call, lets you control exactly what is shared (memory, file descriptors, signal handlers) — used to implement threads (pthreads calls clone with CLONE_THREAD).
CFS is the default Linux scheduler for normal tasks. It uses a red-black tree ordered by vruntime (virtual runtime). The task with the smallest vruntime runs next. CFS gives each task a fair share of CPU time proportional to its weight (nice value).
# Check scheduling class of a process cat /proc/<pid>/sched # Change nice value (higher = lower priority) nice -n 10 ./my_program renice -n 5 -p <pid>
Linux supports two real-time policies: SCHED_FIFO: runs until it blocks or yields, no time slicing. SCHED_RR: round-robin with a time quantum among equal-priority RT tasks. RT tasks have priorities 1–99 (higher number = higher priority) and always preempt normal (CFS) tasks.
struct sched_param param = { .sched_priority = 50 };
sched_setscheduler(0, SCHED_FIFO, ¶m);
Priority inversion occurs when a high-priority task is blocked waiting for a resource held by a low-priority task, while a medium-priority task preempts the low-priority one — effectively the high-priority task is blocked by the medium one. Solution: Priority Inheritance (PI mutexes in Linux). The low-priority task temporarily inherits the high-priority task's priority while holding the mutex.
/* Use PI mutex for real-time scenarios */ #include <linux/rtmutex.h> struct rt_mutex my_mutex; rt_mutex_init(&my_mutex); rt_mutex_lock(&my_mutex); rt_mutex_unlock(&my_mutex);
In a non-preemptive kernel, once a process enters kernel space it runs until it voluntarily gives up the CPU. In a preemptive kernel (CONFIG_PREEMPT), the scheduler can preempt kernel code at almost any point (except within spinlocks or when preemption is explicitly disabled). Preemptive kernels have lower latency, critical for embedded real-time use.
Mutex: sleeping lock — the waiting task is put to sleep, freeing the CPU. Can only be used in process context. Spinlock: busy-wait lock — the task spins in a loop until the lock is free. No context switch overhead. Must be used in interrupt context or when sleeping is not allowed.
/* Spinlock */ spinlock_t lock; spin_lock_init(&lock); spin_lock(&lock); /* critical section */ spin_unlock(&lock); /* Mutex */ struct mutex m; mutex_init(&m); mutex_lock(&m); /* critical section — can sleep */ mutex_unlock(&m);
spin_lock_irqsave(&lock, flags) disables interrupts on the local CPU before acquiring the spinlock, and saves the interrupt state in flags. Use it when the same spinlock is acquired in both process context AND interrupt context — otherwise a deadlock occurs (interrupt fires while lock is held, tries to acquire same lock).
unsigned long flags; spin_lock_irqsave(&my_lock, flags); /* safe from both process and interrupt context */ spin_unlock_irqrestore(&my_lock, flags);
RCU is a synchronization mechanism optimised for read-heavy workloads. Readers take no lock at all (just rcu_read_lock() which only disables preemption). Writers make a copy, modify it, then atomically replace the pointer. Old copies are freed after all readers complete (grace period).
rcu_read_lock(); struct my_data *p = rcu_dereference(global_ptr); /* use p safely — no lock overhead */ rcu_read_unlock();
A semaphore has a count — it can allow multiple concurrent accessors. A mutex is a binary semaphore (count=1) with ownership (only the locker can unlock it). Use semaphores to control access to a pool of N resources. For mutual exclusion, always prefer mutex (it enforces ownership and enables priority inheritance).
struct semaphore sem; sema_init(&sem, 3); /* Allow 3 concurrent users */ down(&sem); /* Acquire (decrements count) */ /* ... use resource ... */ up(&sem); /* Release (increments count) */
Atomic operations are guaranteed to complete without interruption. The kernel provides atomic_t for single integer operations that don't need a lock.
atomic_t counter = ATOMIC_INIT(0); atomic_inc(&counter); /* counter++ atomically */ atomic_dec(&counter); /* counter-- atomically */ int val = atomic_read(&counter); atomic_set(&counter, 10); /* Test and set — returns old value */ int old = atomic_xchg(&counter, 5);
A completion is a lightweight synchronization primitive to signal that a specific event has occurred — simpler than using a mutex+flag combination. Common use: wait for a kernel thread or DMA transfer to finish.
struct completion my_comp; init_completion(&my_comp); /* Thread A — waits */ wait_for_completion(&my_comp); /* Thread B — signals when done */ complete(&my_comp);
A wait queue allows a process to sleep until a condition becomes true. When the condition changes, sleeping processes are woken up. Used extensively in device drivers to wait for data.
DECLARE_WAIT_QUEUE_HEAD(my_wq); int data_ready = 0; /* Reader — sleep until data is ready */ wait_event_interruptible(my_wq, data_ready != 0); /* Writer — wake up readers */ data_ready = 1; wake_up_interruptible(&my_wq);
When a hardware interrupt fires: 1. CPU saves context, switches to interrupt stack. 2. Calls the interrupt descriptor table (IDT) handler. 3. Kernel identifies the IRQ and calls the registered ISR. 4. ISR does minimal work (acknowledges hardware, queues data). 5. CPU restores context, resumes interrupted code. The ISR must be fast — no sleeping, no blocking.
/* Register interrupt handler */
ret = request_irq(irq_num, my_isr, IRQF_SHARED, "my_driver", dev);
/* ISR */
static irqreturn_t my_isr(int irq, void *dev_id)
{
/* Acknowledge hardware, read status */
/* Schedule bottom half for deferred work */
tasklet_schedule(&my_tasklet);
return IRQ_HANDLED;
}
Top half: the ISR itself — runs immediately, must be fast, cannot sleep, acknowledges the interrupt. Bottom half: deferred work scheduled by the top half — runs later when safe, can do heavier processing. Mechanisms: Softirqs (static, very fast), Tasklets (built on softirqs, serialised), Workqueues (run in process context, can sleep).
Tasklet: runs in softirq context (interrupt context) — cannot sleep, runs on the CPU it was scheduled on, serialised (same tasklet never runs concurrently). Workqueue: runs in process context via a kernel thread — can sleep, can use mutexes, suitable for heavy processing.
/* Tasklet */
DECLARE_TASKLET(my_tasklet, my_tasklet_fn, 0);
static void my_tasklet_fn(unsigned long data) { /* no sleeping */ }
/* Workqueue */
INIT_WORK(&my_work, my_work_fn);
schedule_work(&my_work);
static void my_work_fn(struct work_struct *w) { /* can sleep */ }
IRQ affinity controls which CPU(s) handle a specific interrupt. Binding a high-frequency interrupt to a dedicated CPU improves cache efficiency and reduces latency.
# Show current IRQ affinity (CPU bitmask) cat /proc/irq/<irq_num>/smp_affinity # Bind IRQ 45 to CPU 2 (bitmask: 0x4 = CPU2) echo 4 > /proc/irq/45/smp_affinity
request_irq(): the handler runs in hard interrupt context — cannot sleep. request_threaded_irq(): splits into a fast primary handler (top half) and a thread handler that runs in a kernel thread (can sleep). Ideal for complex interrupt processing that needs to sleep or acquire mutexes.
request_threaded_irq(irq, primary_handler, thread_handler,
IRQF_ONESHOT, "my_dev", dev);
The VFS is an abstraction layer that provides a uniform interface for all file systems (ext4, FAT, NFS, procfs, sysfs). User applications call standard POSIX functions (open, read, write) without knowing the underlying file system. VFS translates these into file-system-specific operations via function pointers in inode_operations and file_operations structures.
struct file_operations my_fops = {
.owner = THIS_MODULE,
.open = my_open,
.read = my_read,
.write = my_write,
.release = my_release,
};
Character device: transfers data byte-by-byte or in a stream (serial port, keyboard, custom sensors). Accessed via major/minor numbers under /dev. No buffering. Block device: transfers data in fixed-size blocks (512B or 4KB) with kernel buffering and caching. Examples: hard disks, SSDs, eMMC. Block devices have a request queue and I/O scheduler.
file_operations directly, while block devices use the block_device_operations struct and the generic block layer.procfs (/proc): originally for process information, now also used for kernel tunables (/proc/sys). Unstructured — each file can have arbitrary content. sysfs (/sys): structured, one-value-per-file. Used by drivers to expose attributes to user space. Each device in the driver model has a sysfs entry.
/* Create a sysfs attribute */
static ssize_t speed_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
return sprintf(buf, "%d\n", my_speed);
}
static DEVICE_ATTR_RO(speed);
The page cache stores recently-read disk blocks in RAM. When a process reads a file, the kernel first checks the page cache. On a hit, data is served from RAM (microseconds) instead of disk (milliseconds). Writes are also cached (write-back) — fsync() forces a flush to disk. The page cache uses LRU eviction when memory is low.
# Check page cache usage free -h # "buff/cache" column cat /proc/meminfo # Cached: line # Drop page cache (for benchmarking) echo 3 > /proc/sys/vm/drop_caches
A kernel oops prints a call stack and register dump to the console. Steps to debug:
# 1. Enable kernel symbols in config CONFIG_KALLSYMS=y CONFIG_DEBUG_INFO=y # 2. Decode oops address to function name addr2line -e vmlinux <address> # 3. Use 'decode_stacktrace.sh' from kernel tools ./scripts/decode_stacktrace.sh vmlinux < oops.log # 4. Enable KASAN for memory bugs CONFIG_KASAN=y
CONFIG_FRAME_POINTER=y during development — it gives accurate stack traces in oops dumps.Kernel Address Sanitizer (KASAN) is a dynamic memory error detector. It detects: out-of-bounds accesses (heap, stack, globals), use-after-free bugs, use-after-return bugs. It uses shadow memory (1 byte of shadow per 8 bytes of kernel memory) to track validity.
# Enable in kernel config: CONFIG_KASAN=y CONFIG_KASAN_GENERIC=y # or CONFIG_KASAN_SW_TAGS # KASAN report example: # BUG: KASAN: use-after-free in my_driver_read+0x34/0x80 # Read of size 4 at addr ffff888... by task/1234
ftrace is the kernel's built-in tracing framework. It can trace function calls, latency, IRQ timing, and scheduling events with near-zero overhead.
# Enable function tracer cd /sys/kernel/debug/tracing echo function > current_tracer echo my_driver_read > set_ftrace_filter echo 1 > tracing_on # Run your workload... cat trace | head -50 echo 0 > tracing_on # Trace specific events echo 1 > events/irq/irq_handler_entry/enable cat trace_pipe # live stream
Join EmbeddedShiksha's Embedded System Interview Prep Bootcamp — covers C, DSA, RTOS, Linux Kernel & Drivers, 1 Mock Interview, Resume Prep & Placement Support.
View Interview Prep Bootcamp →Q: How many rounds does a typical Linux kernel interview have?
Usually 3–4 rounds: one online coding test, one deep technical round on kernel internals, one design/system round, and one HR round.
Q: Which companies ask Linux kernel questions?
Qualcomm, Nvidia, MediaTek, NXP, Texas Instruments, VoloCars, Mirafra, and any company building embedded Linux products.
Q: What is the best way to practice Linux kernel programming?
Set up a QEMU virtual machine, build your own kernel, write character drivers, and experiment with modules. Hands-on practice is irreplaceable.
Q: Do I need to know the kernel source code?
You don't need to memorise it, but you should be comfortable navigating it on lxr.kernel.org and understanding key subsystem structures.