syscall 具体逻辑

虽然前面设置 syscall 的环境变化比较复杂，但是后面的逻辑就比较清晰

trap.c:trap() 会根据中断号 tf->trapno，判断是否是syscall
如果是 syscall，则调用 syscall.c:syscall
根据 tf->eax，找到 syscall num，然后调用具体的内核函数

static int (*syscalls[])(void) 定义 syscall 的数组，数组下标是 syscall num，数组元素是 syscall 对应的内核函数。其中内核函数签名为

int sys_<syscall_name> (void)

如果参数不是void，那么这个数组就不好定义，因为参数个数，类型都不相同。

syscall 参数

虽然内核函数都不带参数，但是 syscall 的参数是变化的，那内核函数怎么获取 syscall 的参数呢？

C 语言通过栈来传递参数，发生函数调用时，参数从右往左依次压栈，然后 push eip。

前面讲到，当 int 指令发生时，如果有栈切换，会把旧栈的信息保存到 trapframe 中。所以在内核函数中，就可以通过进程的 trapframe 获取进程栈的地址，然后根据进程栈的地址，获取 syscall 参数。

例如获取 int 型参数，syscall.c:int argint(int n, int *ip) 代码如下：

int
fetchint(uint addr, int *ip)
{
  if(addr >= proc->sz || addr+4 > proc->sz)
    return -1;
  *ip = *(int*)(addr);
  return 0;
}

// Fetch the nth 32-bit system call argument.
int
argint(int n, int *ip)
{
  return fetchint(proc->tf->esp + 4 + 4*n, ip);
}

首先获取参数在用户栈的地址，用户栈顶位置为 proc->tf->esp，栈内第一个元素是 eip，参数是从第2个元素开始的，地址是 proc->tf->esp + 4。

这里有个限制，系统调用参数的大小只能是4个字节，所以只能是 int 型或者指针，否则无法确定参数在栈中的位置。知道参数在栈中哪个位置之后，就可以很容易获得参数的内容， fetchint 就是获取 int 型参数的内容。

系统调用返回

每次调用内核函数之后，都会把内核函数的结果存放到 trapframe->eax 中，具体代码为 syscall.c:syscall

proc->tf->eax = syscalls[num]()

分析完系统调用的过程，下面看看系统调用结束后，返回的流程，具体代码在 trapasm.S

  # Call trap(tf), where tf=%esp
  pushl %esp
  call trap
  addl $4, %esp

  # Return falls through to trapret...
.globl trapret
trapret:
  popal
  popl %gs
  popl %fs
  popl %es
  popl %ds
  addl $0x8, %esp  # trapno and errcode
  iret

call trap 之前压栈 esp 当作 trap() 的参数，trap() 返回后，需要将 esp 出栈，addl $4, %esp 相当于将 esp 出栈。
然后将前面 pushal 的所有寄存器恢复，但是 eax 已经不是 syscall num，而是前面讲的内核函数的返回值
弹出 trapno & errcode
iret 完成如下事情 ¹
- 恢复 cs, eip, eflags 寄存器
- 如果发生栈切换，恢复原来的栈

这样就恢复原来进程的运行上下文（除了 eax），包括栈 ss:esp 和指令 cs:eip。CPU 继续运行 syscall 之后的代码。

homework

homework主要实现两个 syscall，一个是 date，一个是 dup2。如果理解了 syscall 的流程和实现机制，应该很容易完成这两个 syscall。

主要涉及以下几个文件

user.h 定义 syscall 签名
syscall.h 定义 syscall num
usys.S 定义 syscall 调用过程
syscall.c 定义 syscall num 和内核函数的对应关系，声明内核函数
实现 syscall 对应的内核函数
如果有用户代码，需要在 Makefile 文件中增加用户代码信息

debug

不知道其他大神怎么调试 syscall，自己摸索的调试是这样的

首先将 Makefile 中 qemu 的 cpu 数改成1，不然多 CPU 调试起来比较麻烦。

在一个 terminal 启动 qemu

$ make qemu-nox-gdb

另起一个 terminal，运行 gdb，加载 _sh，然后在 _sh 中加断点，启动 _sh

gdb) file _sh
A program is being debugged already.
Are you sure you want to change the file? (y or n) y
Load new symbol table from "/vagrant/xv6-public/_sh"? (y or n) y
Reading symbols from /vagrant/xv6-public/_sh...done.
(gdb) b main
Breakpoint 1 at 0x267: file sh.c, line 146.
(gdb) c
Continuing.
The target architecture is assumed to be i386
=> 0x267 <main>:	push   %ebp

Breakpoint 1, main () at sh.c:146
146	{

启动 _sh 后，继续加载用户进程，比如 _date，设置断点

(gdb) file _date
A program is being debugged already.
Are you sure you want to change the file? (y or n) y
Load new symbol table from "/vagrant/xv6-public/_date"? (y or n) y
Reading symbols from /vagrant/xv6-public/_date...done.
(gdb) b main
Note: breakpoint 1 also set at pc 0xc.
Breakpoint 2 at 0xc: file date.c, line 10.
(gdb) c
Continuing.
=> 0xc <main+12>:	lea    0x28(%esp),%eax

Breakpoint 1, main (argc=1, argv=0x2ff0) at date.c:10
10	  if (date(&r)) {

然后在 qemu terminal 运行 date 命令。之后就可以在 gdb terminal 运行 si 单步调试了。单步到 kernel 的时候，可以加载 kernel ，这样就能看到 kernel 的源代码，不然调试只能显示 kernel 的汇编代码

(gdb) si
=> 0x10 <main+16>:	mov    %eax,(%esp)
0x00000010	10	  if (date(&r)) {
(gdb) si
=> 0x13 <main+19>:	call   0x38c <date>
0x00000013	10	  if (date(&r)) {
(gdb) file kernel
A program is being debugged already.
Are you sure you want to change the file? (y or n) y
Load new symbol table from "/vagrant/xv6-public/kernel"? (y or n) y
Reading symbols from /vagrant/xv6-public/kernel...done.
(gdb) si
=> 0x38c:	mov    $0x16,%eax
0x0000038c in ?? ()
(gdb) si
=> 0x391:	int    $0x40
0x00000391 in ?? ()
(gdb) si
=> 0x80107014:	push   $0x40
320	  pushl $64

总结

xv6 用 trap 实现系统调用，中断号为 0x40。当用户进程调用 syscall 时，具体逻辑如下

用户进程将 syscall 参数 push 到用户进程栈中
将 syscall num 存到 eax 中
执行 int 0x40 指令，CPU 自动执行如下操作
- 切换到进程对应的内核栈中（内核栈信息存放在 TSS 中）
- push 用户进程栈（ss:esp)到内核栈中
- push eflags，cs，eip，error code（如果有）到内核栈中
- CPU 切换到 interrupt handler 指令中（改变 cs:eip)
如果前面没有 error code，push 0。然后push 中断号(0x40)
保存段寄存器和通用寄存器
设置内核态运行环境，主要是设置各个段寄存器
push esp，把前面 push 的内容当作 trapframe，esp 作为 trapframe 指针，传入到 trap.c:trap()参数中
从用户栈中获取 syscall 参数，运行 syscall 对应的内核函数，将结果保存到 eax 中
返回 syscall，恢复各个寄存器，切换回用户栈（用户进程可以从 eax 获取 syscall 的结果），继续执行 syscall 之后的代码

Reference

《Intel(R) 64 and IA-32 Architectures Software Developer’s Manuals-vol-3A》6.12.1 节 ↩