Linux ELF文件装入与执行概述


1. ELF格式ELF指定了进程中text段、bss段、data段等应该放置到进程虚拟内存空间的什么位置,以及记录了进程需要用到的各种动态链接库的位置。2. sys_execve的大致执行流程 1) 打开ELF二进制文件,读入ELF头 2) 删除从父进程继承过来的mm相关内容 3) 根据ELF头将interpreter段、text段、data段等映射进内存(由此知linux不支持压缩了的二进制程序) 设置好堆栈等,更新mm内容。 4) "伪造"好本进程的内核栈,为进程返回用户态执行做好准备。内核栈中的ip指向了interpreter段入口。 5) sys_execve系统调用返回到用户态,开始interpreter的执行(interpreter一般为 or similar)进入到用户态后,interpreter做了些什么呢? 6) interpreter帮助用户进程装入动态链接库,做好全部重定位映射工作。 7) interpreter返回到main开始执行。这里面有几个问题需要深究: 1> sys_execve被调用的时候内核栈长什么样?用户态参数是如何传入到内核的? 只有弄明白了这个问题,才知道如何从内核返回到interpreter入口开始执行 A: 关于这个问题请参考linux系统调用相关章节。linux系统调用采取了一个一致的方法来处理系统调用参数问题,非常值得借鉴,将另外撰文梳理其设计思路。 2> interpreter的参数从哪里来?interpreter如何返回到main? A: 如果从传统的C语言函数调用的角度来理解,这个问题会很费解。但是如果能从汇编的角度,动态地、有目的地调整和"伪造"调用栈,就能够做到方便地再各个函数间切换和传参。 内核会构造好interpreter所需要的参数栈,interpreter会构造好main所需要的参数栈。用户栈是在setup_arg_pages函数中构建的。 3> 内核是如何保证将各个段映射到期望的位置? mmap函数有一个参数取MAP_FIXED参数即可。笔记附文:/* 将当前(current)的mm结构替换成参数中的mm结构。本函数被* int flush_old_exec(struct linux_binprm * bprm)调用。* 旧mm被删除。*/static int exec_mmap(struct mm_struct *mm){struct task_struct *tsk;struct mm_struct * old_mm, *active_mm;/* Notify parent that we’re no longer interested in the old VM */tsk = current;old_mm = current->mm;/* 释放当前进程的老mm结构(人老珠黄真可怕!)*/mm_release(tsk, old_mm); if (old_mm) { /* 如果老的mm正在被使用(coredump)则不能继续 *//** Make sure that if there is a core dump in progress* for the old mm, we get out and die instead of going* through with the exec. We must hold mmap_sem around* checking core_state and changing tsk->mm.*/down_read(&old_mm->mmap_sem);if (unlikely(old_mm->core_state)) {up_read(&old_mm->mmap_sem);return -EINTR;}}/* 老的mm已经销毁了,迎接新媳妇 */task_lock(tsk);/* 如果当前线程是个核心线程,则active_mm有效 */active_mm = tsk->active_mm;/* 新mm入洞房 */tsk->mm = mm; tsk->active_mm = mm;/* 第二天起,新媳妇就正式管家啦! */activate_mm(active_mm, mm); task_unlock(tsk);/* 设置了mm中几个函数指针, 何用? */arch_pick_mmap_layout(mm);if (old_mm) {/* 事到如今如果old_mm还没有消失,* 那是因为他们家妹妹active_mm在帮她撑腰*/up_read(&old_mm->mmap_sem);BUG_ON(active_mm != old_mm);/* 如果老mm外头有人,就做个顺水人情 送给外头那位吧 */mm_update_next_owner(old_mm);/* 从自己的通讯录里头把老mm删除 */mmput(old_mm);return 0;}/* 彻底干掉老的active_mm. 莫非是为多线程服务? */mmdrop(active_mm);return 0;}/* 将elf文件映射到当前进程的虚拟内存中* 总体思路为:* **//* 预备知识Complete Reference on ELF format: 为了读懂下面的代码,最好了解ELF头的格式:typedef struct elf32_hdr{unsigned char e_ident[EI_NIDENT]; /* Magic Number */Elf32_Half e_type; /* ET_EXEC或ET_DYN:可执行映像或共享库 */Elf32_Half e_machine; /* 目标CPU类型 */Elf32_Word e_version; /* */Elf32_Addr e_entry; /* Entry point, 一般是_start()的起点 */Elf32_Off e_phoff; /* 指向“程序头(Program Header)”数组的起点 */Elf32_Off e_shoff; /* 向“区段头(div Header)”数组的起点,标定“程序段”“数据段”等等 */Elf32_Word e_flags;Elf32_Half e_ehsize; /* 映像头部本身的大小 */Elf32_Half e_phentsize; /* “程序头(Program Header)”数组元素的大小 */Elf32_Half e_phnum; /* “程序头(Program Header)”数组元素的个数 */Elf32_Half e_shentsize; /* “区段头(div Header)”数组元素的大小 */Elf32_Half e_shnum; /* “区段头(div Header)”数组元素的个数 */Elf32_Half e_shstrndx;} Elf32_Ehdr;2. 每个程序头里面包含的是什么呢?typedef struct elf32_phdr{Elf32_Word p_type; /* 段的类型,特别地,PT_LOAD表示是可加载的段 */Elf32_Off p_offset; /* 该段在文件中相对于文件第0个字节的偏移 */Elf32_Addr p_vaddr; /* 该段加载后在进程空间中占用的内存起始地址 */Elf32_Addr p_paddr; /* 在支持paging的OS中该字段被忽略 */Elf32_Word p_filesz; /*该段在文件中占用的字节大小. 有些段可能在文件中不存在但却占用一定的内存空间,此时这个字段为0 */Elf32_Word p_memsz; /* 该段在内存中占用的字节大小。有些段可能仅存在于文件中而不被加载到内存,此时这个字段为0。*/Elf32_Word p_flags;Elf32_Word p_align; /* 对齐值 */} Elf32_Phdr;3. 每个区段头里面包含的是什么呢?区段表是从链接角度看待ELF文件的结果,所以从区段的角度ELF文件分成了许多的区,每个区保存着用于不同目的的数据,这些数据可能被前面提到的程序头重复引用。typedef struct elf64_shdr {Elf64_Word sh_name; /* div name, index in string tbl */Elf64_Word sh_type; /* Type of div */Elf64_Xword sh_flags; /* Miscellaneous div attributes */Elf64_Addr sh_addr; /* div virtual addr at execution */Elf64_Off sh_offset; /* div file offset */Elf64_Xword sh_size; /* Size of div in bytes */Elf64_Word sh_link; /* Index of another div */Elf64_Word sh_info; /* Additional div information */Elf64_Xword sh_addralign; /* div alignment */Elf64_Xword sh_entsize; /* Entry size if div holds table */} Elf64_Shdr;4. 程序头和区段头有什么区别?链接器和加载器看待elf是完全不同的,链接器看到的是由区段头部表描述的一系列逻辑区段的**(也就是说它忽略了程序头部表)。而加载器则是看成是由程序头部表描述的一系列的段的**(忽略了区段头部表)。区分图片:是从映像装入角度考虑的划分,div才是从连接/启动角度考虑的划分以Wine为例子,div to Segment mapping:Segment divs…00 01 .interp02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r.rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame03 .data .dynamic .ctors .dtors .jcr .got .bss04 .dynamic05 .note.ABI-tag 如何保证各个区段map到期望的虚拟位置?mmap函数flags参数有MAP_FIXED标志,当此标志被设置的时候,一旦映射失败,则返回错误!6. 纵观全函数,load_elf_binary的作用是:1) 将elf各个段的数据读入到内存并建立映射2) 将interpreter载入到内存并建立映射(包括了动态重定位过程)3) 设置好regs结构的ip,sp等,为启动进程做好了准备待解决的问题:interpreter如何把控制权交给_main()?我自己的一点分析:在load_elf_binary中获得的入口地址eax后,执行 push eaxret就进入了领地,在这里帮助装入各个链接库Q1. 如何知道装入哪些链接库?参数从何而来?Q2. 如何在装入完成后返回到main开始执行主程序?A1. 通过堆栈操作!注意到上面两句汇编代码,起本质等价于一个jump,可以想象jump的目标地址load_elf_binary函数内部,此时解释器的代码就和load_elf_binary函数共用参数堆栈了!A2. 通过unwind interpreter的堆栈,然后返回到main开始执行下面的代码取自GNU ELF interpreter,说明了ld.so是如何完成链接的。Code in > gnu > glibc > glibc-2.5.tar.bz2 > glibc-2.5 > sysdeps > i386 > dl-machine.h/* Initial entry point code for the dynamic linker.The C function `_dl_start’ is the real entry point;its return value is the user program’s entry point. */#define RTLD_START asm ("\n\.text\n\.align 16\n\0: movl (%esp), %ebx\n\ret\n\.align 16\n\.globl _start\n\.globl _dl_start_user\n\_start:\n\# Note that _dl_start gets the parameter in %eax.\n\movl %esp, %eax\n\call _dl_start\n\_dl_start_user:\n\# Save the user entry point address in %edi.\n\movl %eax, %edi\n\# Point %ebx at the GOT.\n\call 0b\n\addl $_GLOBAL_OFFSET_TABLE_, %ebx\n\# See if we were run as a command with the executable file\n\# name as an extra leading argument.\n\movl _dl_skip_args@GOTOFF(%ebx), %eax\n\# Pop the original argument count.\n\popl %edx\n\# Adjust the stack pointer to skip _dl_skip_args words.\n\leal (%esp,%eax,4), %esp\n\# Subtract _dl_skip_args from argc.\n\subl %eax, %edx\n\# Push argc back on the stack.\n\push %edx\n\# The special initializer gets called with the stack just\n\# as the application’s entry point will see it; it can\n\# switch stacks if it moves these contents over.\n\&; RTLD_START_SPECIAL_INIT "\n\# Load the parameters again.\n\# (eax, edx, ecx, *–esp) = (_dl_loaded, argc, argv, envp)\n\movl _rtld_local@GOTOFF(%ebx), %eax\n\leal 8(%esp,%edx,4), %esi\n\leal 4(%esp), %ecx\n\movl %esp, %ebp\n\# Make sure _dl_init is run with 16 byte aligned stack.\n\andl $-16, %esp\n\pushl %eax\n\pushl %eax\n\pushl %ebp\n\pushl %esi\n\# Clear %ebp, so that even constructors have terminated backchain.\n\xorl %ebp, %ebp\n\# Call the function to run the initializers.\n\call _dl_init_internal@PLT\n\# Pass our finalizer function to the user in %edx, as per ELF ABI.\n\leal _dl_fini@GOTOFF(%ebx), %edx\n\# Restore %esp _start expects.\n\movl (%esp), %esp\n\# Jump to the user’s entry point.\n\jmp *%edi\n\.previous\n\&;);/* Call the OS-dependent function to set up life so we can do things likefile access. It will call `dl_main’ (below) to do all the real workof the dynamic linker, and then unwind our frame and run the userentry point on the same stack we entered on. */Code in rtld.c ….*/static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs){struct file *interpreter = NULL; /* to shut gcc up */unsigned long load_addr = 0, load_bias = 0;int load_addr_set = 0;char * elf_interpreter = NULL;unsigned long error;struct elf_phdr *elf_ppnt, *elf_phdata;unsigned long elf_bss, elf_brk;int elf_exec_fileno;int retval, i;unsigned int size;unsigned long elf_entry;unsigned long interp_load_addr = 0;unsigned long start_code, end_code, start_data, end_data;unsigned long reloc_func_desc = 0;int executable_stack = EXSTACK_DEFAULT;unsigned long def_flags = 0;struct {struct elfhdr elf_ex;struct elfhdr interp_elf_ex;} *loc;loc = kmalloc(sizeof(*loc), GFP_KERNEL);if (!loc) {retval = -ENOMEM;goto out_ret;}/* Get the exec-header */loc->elf_ex = *((struct elfhdr *)bprm->buf);retval = -ENOEXEC;/* First of all, some simple consistency checks */if (memcmp(loc->elf_ex.e_ident, ELFMAG, SELFMAG) != 0)goto out;if (loc->elf_ex.e_type != ET_EXEC && loc->elf_ex.e_type != ET_DYN)goto out;if (!elf_check_arch(&loc->elf_ex))goto out;/* EFL文件所在的文件系统必须支持mmap操作 */if (!bprm->file->f_op||!bprm->file->f_op->mmap)goto out;/* Now read in all of the header information */if (loc->elf_ex.e_phentsize != sizeof(struct elf_phdr))goto out;if (loc->elf_ex.e_phnum < 1 ||loc->elf_ex.e_phnum > 65536U / sizeof(struct elf_phdr))goto out;/* Note: ELF装载器(区分链接器)只使用Program Header* 下面为Program Header分配空间* Program header里面指明了各个区段应该如何装载到内存中*/size = loc->elf_ex.e_phnum * sizeof(struct elf_phdr);retval = -ENOMEM;elf_phdata = kmalloc(size, GFP_KERNEL);if (!elf_phdata)goto out;/* 将ELF文件中Program Header部分读入到缓存中 */retval = kernel_read(bprm->file, loc->elf_ex.e_phoff,(char *)elf_phdata, size);if (retval != size) {if (retval >= 0)retval = -EIO;goto out_free_ph;}/* 下面对ELF文件的操作应该需要一个fd (?) */retval = get_unused_fd();if (retval < 0)goto out_free_ph;get_file(bprm->file);fd_install(elf_exec_fileno = retval, bprm->file);elf_ppnt = elf_phdata;elf_bss = 0;elf_brk = 0;start_code = ~0UL;end_code = 0;start_data = 0;end_data = 0;/* 下面的代码遍历 三次Program Header数组* 第一次处理PT_INTERP类型的区段* 第二次处理PT_GNU_STACK类型的区段* 第三次才处理PT_LOAD类型的区段* NOTE: PT_DYNAMIC这个字段并没有处理,留给interpreter来映射和重定位。* 下面分区段注释*//** 第一次处理PT_INTERP类型的区段*/for (i = 0; i < loc->elf_ex.e_phnum; i++) {if (elf_ppnt->p_type == PT_INTERP) {/* This is the program interpreter used for* shared libraries – for now assume that this* is an a.out format binary*/retval = -ENOEXEC;if (elf_ppnt->p_filesz > PATH_MAX || elf_ppnt->p_filesz < 2)goto out_free_file;retval = -ENOMEM;elf_interpreter = kmalloc(elf_ppnt->p_filesz,GFP_KERNEL);if (!elf_interpreter)goto out_free_file;/* 在PT_INTERP段中存放的是链接器的名称* ELF规范强制要求OS最先处理该字段* 该字段的内容类似于:* /lib64/*/retval = kernel_read(bprm->file, elf_ppnt->p_offset,elf_interpreter,elf_ppnt->p_filesz);if (retval != elf_ppnt->p_filesz) {if (retval >= 0)retval = -EIO;goto out_free_interp;}/* make sure path is NULL terminated */retval = -ENOEXEC;if (elf_interpreter[elf_ppnt->p_filesz – 1] != ‘\0’)goto out_free_interp;/** The early SET_PERSONALITY here is so that the lookup* for the interpreter happens in the namespace of the * to-be-execed image. SET_PERSONALITY can select an* alternate root.** However, SET_PERSONALITY is NOT allowed to switch* this task into the new images’s memory mapping* policy – that is, TASK_SIZE must still evaluate to* that which is appropriate to the execing application.* This is because exit_mmap() needs to have TASK_SIZE* evaluate to the size of the old image.** So if (say) a 64-bit application is execing a 32-bit* application it is the architecture’s responsibility* to defer changing the value of TASK_SIZE until the* switch really is going to happen – do this in* flush_thread(). – akpm*/SET_PERSONALITY(loc->elf_ex);/* 打开链接器文件,返回文件句柄 */interpreter = open_exec(elf_interpreter);retval = PTR_ERR(interpreter);if (IS_ERR(interpreter))goto out_free_interp;/** If the binary is not readable then enforce* mm->dumpable = 0 regardless of the interpreter’s* permissions.*/if (file_permission(interpreter, MAY_READ) < 0)bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;/* 读入链接器的程序头 */retval = kernel_read(interpreter, 0, bprm->buf,BINPRM_BUF_SIZE);if (retval != BINPRM_BUF_SIZE) {if (retval >= 0)retval = -EIO;goto out_free_dentry;}/* Get the exec headers */loc->interp_elf_ex = *((struct elfhdr *)bprm->buf);break;}elf_ppnt++;}/** 第二次处理PT_GNU_STACK类型的区段*/ elf_ppnt = elf_phdata;for (i = 0; i < loc->elf_ex.e_phnum; i++, elf_ppnt++)if (elf_ppnt->p_type == PT_GNU_STACK) {/* 由代码可以看出,这个区段只是提供了一个标志* 没有实际的段数据*/if (elf_ppnt->p_flags & PF_X)executable_stack = EXSTACK_ENABLE_X;elseexecutable_stack = EXSTACK_DISABLE_X;break;}/* 检查链接器的ELF标志以及其目标平台是否合法 *//* Some simple consistency checks for the interpreter */if (elf_interpreter) {retval = -ELIBBAD;/* Not an ELF interpreter */if (memcmp(loc->interp_elf_ex.e_ident, ELFMAG, SELFMAG) != 0)goto out_free_dentry;/* Verify the interpreter has a valid arch */if (!elf_check_arch(&loc->interp_elf_ex))goto out_free_dentry;} else {/* Executables without an interpreter also need a personality */SET_PERSONALITY(loc->elf_ex);}/* Flush all traces of the currently running executable */retval = flush_old_exec(bprm);if (retval)goto out_free_dentry;/* OK, This is the point of no return */current->flags &= ~PF_FORKNOEXEC;current->mm->def_flags = def_flags;/* Do this immediately, since STACK_TOP as used in setup_arg_pagesmay depend on the personality. */SET_PERSONALITY(loc->elf_ex);if (elf_read_implies_exec(loc->elf_ex, executable_stack))current->personality |= READ_IMPLIES_EXEC;if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)current->flags |= PF_RANDOMIZE;arch_pick_mmap_layout(current->mm);/* Do this so that we can load the interpreter, if need be. We willchange some of these later */current->mm->free_area_cache = current->mm->mmap_base;current->mm->cached_hole_size = 0;retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP),executable_stack);if (retval < 0) {send_sig(SIGKILL, current, 0);goto out_free_dentry;}current->mm->start_stack = bprm->p;/** 第三次处理PT_LOAD类型的区段*/ /* Now we do a little grungy work by mmaping the ELF image intothe correct location in memory. */for(i = 0, elf_ppnt = elf_phdata;i < loc->elf_ex.e_phnum; i++, elf_ppnt++) {int elf_prot = 0, elf_flags;unsigned long k, vaddr;if (elf_ppnt->p_type != PT_LOAD)continue;if (unlikely (elf_brk > elf_bss)) {unsigned long nbyte;/* There was a PT_LOAD segment with p_memsz > p_fileszbefore this one. Map anonymous pages, if needed,and clear the area. */retval = set_brk (elf_bss + load_bias,elf_brk + load_bias);if (retval) {send_sig(SIGKILL, current, 0);goto out_free_dentry;}nbyte = ELF_PAGEOFFSET(elf_bss);if (nbyte) {nbyte = ELF_MIN_ALIGN – nbyte;if (nbyte > elf_brk – elf_bss)nbyte = elf_brk – elf_bss;if (clear_user((void __user *)elf_bss +load_bias, nbyte)) {/** This bss-zeroing can fail if the ELF* file specifies odd protections. So* we don’t check the return value*/}}}if (elf_ppnt->p_flags & PF_R)elf_prot |= PROT_READ;if (elf_ppnt->p_flags & PF_W)elf_prot |= PROT_WRITE;if (elf_ppnt->p_flags & PF_X)elf_prot |= PROT_EXEC;elf_flags = MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE;vaddr = elf_ppnt->p_vaddr;if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {/* 非动态定位部分,必须映射到期望区间,* 故而指定MAP_FIXED参数 */elf_flags |= MAP_FIXED;} else if (loc->elf_ex.e_type == ET_DYN) {/* Try and get dynamic programs out of the way of the* default mmap base, as well as whatever program they* might try to exec. This is because the brk will* follow the loader, and is not movable. */#ifdef CONFIG_X86load_bias = 0;#elseload_bias = ELF_PAGESTART(ELF_ET_DYN_BASE – vaddr);#endif}/* 重点代码* 将file中的对应区段内容map到vaddr中 */error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,elf_prot, elf_flags, 0);if (BAD_ADDR(error)) {send_sig(SIGKILL, current, 0);retval = IS_ERR((void *)error) ?PTR_ERR((void*)error) : -EINVAL;goto out_free_dentry;}/* 本代码只在第一次时执行 */if (!load_addr_set) {load_addr_set = 1;load_addr = (elf_ppnt->p_vaddr – elf_ppnt->p_offset);if (loc->elf_ex.e_type == ET_DYN) {load_bias += error -ELF_PAGESTART(load_bias + vaddr);load_addr += load_bias;reloc_func_desc = load_bias;}}k = elf_ppnt->p_vaddr;if (k < start_code)start_code = k;if (start_data < k)start_data = k;/** Check to see if the div’s size will overflow the* allowed task size. Note that p_filesz must always be* <= p_memsz so it is only necessary to check p_memsz.*/if (BAD_ADDR(k) || elf_ppnt->p_filesz > elf_ppnt->p_memsz ||elf_ppnt->p_memsz > TASK_SIZE ||TASK_SIZE – elf_ppnt->p_memsz < k) {/* set_brk can never work. Avoid overflows. */send_sig(SIGKILL, current, 0);retval = -EINVAL;goto out_free_dentry;}k = elf_ppnt->p_vaddr + elf_ppnt->p_filesz;if (k > elf_bss)elf_bss = k;if ((elf_ppnt->p_flags & PF_X) && end_code < k)end_code = k;if (end_data < k)end_data = k;k = elf_ppnt->p_vaddr + elf_ppnt->p_memsz;if (k > elf_brk)elf_brk = k;} /* end of for PT_LOAD *//* 对PT_LOAD的全部努力就得到如下数据,加上已经* 映射好了的内存段*/loc->elf_ex.e_entry += load_bias;elf_bss += load_bias;elf_brk += load_bias;start_code += load_bias;end_code += load_bias;start_data += load_bias;end_data += load_bias;/* Calling set_brk effectively mmaps the pages that we need* for the bss and break divs. We must do this before* mapping in the interpreter, to make sure it doesn’t wind* up getting placed where the bss needs to go.*/retval = set_brk(elf_bss, elf_brk);if (retval) {send_sig(SIGKILL, current, 0);goto out_free_dentry;}if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) {send_sig(SIGSEGV, current, 0);retval = -EFAULT; /* Nobody gets to see this, but.. */goto out_free_dentry;}/* 读入链接器到内存中,记录入口地址 */if (elf_interpreter) {unsigned long uninitialized_var(interp_map_addr);elf_entry = load_elf_interp(&loc->interp_elf_ex,interpreter,&interp_map_addr,load_bias);if (!IS_ERR((void *)elf_entry)) {/** load_elf_interp() returns relocation* adjustment*/interp_load_addr = elf_entry;elf_entry += loc->interp_elf_ex.e_entry;}if (BAD_ADDR(elf_entry)) {force_sig(SIGSEGV, current);retval = IS_ERR((void *)elf_entry) ?(int)elf_entry : -EINVAL;goto out_free_dentry;}reloc_func_desc = interp_load_addr;allow_write_access(interpreter);fput(interpreter);kfree(elf_interpreter);} else {elf_entry = loc->elf_ex.e_entry;if (BAD_ADDR(elf_entry)) {force_sig(SIGSEGV, current);retval = -EINVAL;goto out_free_dentry;}}kfree(elf_phdata);sys_close(elf_exec_fileno);set_binfmt(&elf_format);#ifdef ARCH_HAS_SETUP_ADDITIONAL_PAGESretval = arch_setup_additional_pages(bprm, executable_stack);if (retval < 0) {send_sig(SIGKILL, current, 0);goto out;}#endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */compute_creds(bprm);current->flags &= ~PF_FORKNOEXEC;/* 这个函数做了很多事,需要仔细分析! * bprm->p在这里被修改了。*/retval = create_elf_tables(bprm, &loc->elf_ex,load_addr, interp_load_addr);if (retval < 0) {send_sig(SIGKILL, current, 0);goto out;}/* N.B. passed_fileno might not be initialized? */current->mm->end_code = end_code;current->mm->start_code = start_code;current->mm->start_data = start_data;current->mm->end_data = end_data;current->mm->start_stack = bprm->p;#ifdef arch_randomize_brkif ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1))current->mm->brk = current->mm->start_brk =arch_randomize_brk(current->mm);#endifif (current->personality & MMAP_PAGE_ZERO) {/* Why this, you ask??? Well SVr4 maps page 0 as read-only,and some applications "depend" upon this behavior.Since we do not have the power to recompile these, weemulate the SVr4 behavior. Sigh. */down_write(&current->mm->mmap_sem);error = do_mmap(NULL, 0, PAGE_SIZE, PROT_READ | PROT_EXEC,MAP_FIXED | MAP_PRIVATE, 0);up_write(&current->mm->mmap_sem);}#ifdef ELF_PLAT_INIT/** The ABI may specify that certain registers be set up in special* ways (on i386 %edx is the address of a DT_FINI function, for* example. In addition, it may also specify (eg, PowerPC64 ELF)* that the e_entry field is the address of the function descriptor* for the startup routine, rather than the address of the startup* routine itself. This macro performs whatever initialization to* the regs structure is required as well as any relocations to the* function descriptor entries when executing dynamically links apps.*/ELF_PLAT_INIT(regs, reloc_func_desc);#endif/* start_thread名不副实,更应该叫做prepare_user_thread()* 它把邋elf_entry、user_stack设置到regs里面去了* 为后面的启动做好了准备。真正启动用户态程序的时机是* sys_execve()返回到用户态的时候!**/start_thread(regs, elf_entry, bprm->p);retval = 0;out:kfree(loc);out_ret:return retval;/* error cleanup */out_free_dentry:allow_write_access(interpreter);if (interpreter)fput(interpreter);out_free_interp:kfree(elf_interpreter);out_free_file:sys_close(elf_exec_fileno);out_free_ph:kfree(elf_phdata);goto out;}/*ip,sp到底是如何转换的呢?这里面用到了诀窍!sys_execve->do_execve->search_binary_handler->load_binary->load_elf_binary->(code above)首先弄明白下面的问题:1. 系统调用中,用户参数、用户栈是如何管理的?保存在哪里?首先描述下陷入内核的时候堆栈长成了什么样: ( in the famous 8K space )struct pt_regs {unsigned long bx; /* 进入内核后SAVE_ALL压入 */ 低地址unsigned long cx; /* 进入内核后SAVE_ALL压入 */unsigned long dx; /* 进入内核后SAVE_ALL压入 */unsigned long si; /* 进入内核后SAVE_ALL压入 */unsigned long di; /* 进入内核后SAVE_ALL压入 */ ^unsigned long bp; /* 进入内核后SAVE_ALL压入 */ ^unsigned long ax; /* 进入内核后SAVE_ALL压入 */ ^unsigned long ds; /* 进入内核后SAVE_ALL压入 */ ^unsigned long es; /* 进入内核后SAVE_ALL压入 */ ^unsigned long fs; /* 进入内核后SAVE_ALL压入 */ ^/* int gs; */unsigned long orig_ax;/* 进入内核后push eax压入 */unsigned long ip; /* 陷入内核时系统自动压入 */unsigned long cs; /* 陷入内核时系统自动压入 */unsigned long flags; /* 陷入内核时系统自动压入 */unsigned long sp; /* 陷入内核时系统自动压入 */ unsigned long ss; /* 陷入内核时系统自动压入 */ 高地址};NOTE: 越是下面的数据越早被压入堆栈.下面是2.6内核中进入内核栈后的代码.# system call handler stubENTRY(system_call)RING0_INT_FRAME # can’t unwind into user space anywaypushl %eax # save orig_eaxCFI_ADJUST_CFA_OFFSET 4 # cld instructionSAVE_ALLGET_THREAD_INFO(%ebp) # 这个时候esp指向的是pt_regs栈顶(高地址)syscall_call:call *sys_call_table(,%eax,4) # call的目标地址为sys_call_table+eax*4, 应该就是eax表示调用号,# 调用目标即为函数入口, 此时ip再次压栈, 参数esp指向ip# 在服务函数内部, 就可以通过esp访问到pt_regs了movl %eax,PT_EAX(%esp) # store the return valuesyscall_exit:LOCKDEP_SYS_EXITDISABLE_INTERRUPTS(CLBR_ANY) # make sure we don’t miss an interrupt# setting need_resched or sigpending# between sampling and the iret……在64为计算机上,sys_execve反汇编结果如下:(objdump -d /lib/modules/ <sys_execve>:0: 48 83 ec 28 sub $0x28,%rsp # 腾出本地局部变量栈4: 48 89 5c 24 08 mov %rbx,0x8(%rsp) # 保存些寄存器到临时栈中9: 48 89 6c 24 10 mov %rbp,0x10(%rsp)e: 48 89 d5 mov %rdx,%rbp # Why rdx? no idea.11: 4c 89 64 24 18 mov %r12,0x18(%rsp)16: 4c 89 6c 24 20 mov %r13,0x20(%rsp)不纠缠这个了。。。乱!反正就是知道一点,pt_regs中有你所需关于Linux用户进程向系统中断调用过程传递参数方面,Linux系统使用了通用寄存器传递方法,例如寄存器ebx、ecx和edx。这种使用寄存器传递参数方法的一个明显优点就是:当进入系统中断服务程序而保存寄存器值时,这些传递参数的寄存器也被自动地放在了内核态堆栈上,因此用不着再专门对传递参数的寄存器进行特殊处理。2. 如何与execve合作?在pt_regs 的帮助下,可以设置ip,esp, 对于execve之类的系统调用,就可以通过替换掉ip,esp来实现移花接木的效果。3. 用户态如何把参数传入核心栈呢?举个例子用户态write被调用时候write:pushl %ebx movl 8(%esp), %ebx ; linux的_syscall3使得这里做了如此的展开movl 12(%esp), %ecx ; 使得寄存器传参得以实现movl 16(%esp), %edx ; 显然,这个过程不依赖于编译器movl $4, %eax int $0x80 ….Read more from this perfect online book-store: 8. System Calls > Anticipating Linux 2.4 – Pg. 241*//** sys_execve() executes a new program.*/asmlinkage int sys_execve(struct pt_regs regs){int error;char * filename;filename = getname((char __user *) regs.bx);error = PTR_ERR(filename);if (IS_ERR(filename))goto out;error = do_execve(filename,(char __user * __user *),(char __user * __user *) regs.dx,&regs);if (error == 0) {/* Make sure we don’t return using sysenter.. */set_thread_flag(TIF_IRET);}putname(filename);out:return error;}当世界给草籽重压时,它总会用自己的方法破土而出。

