转自:https://www.litreily.top/2020/05/07/ubi-driver/

在处理ubifs相关bug的过程中,学习了ubi驱动对坏块保留分区的处理方式,在此记录一下。

相关概念

mtd

mtd 全称 memory technology device 内存技术设备,是用于访问内存设备(RAM, Flash)的Linux 子系统,在硬件层和用户空间之间提供抽象接口。

在嵌入式linux设备中的/dev/目录下有很多/dev/mtdxx文件,这些文件对应的就是内存设备。比如,嵌入式设备的Nand Flash被划分为多个分区,每个分区对应一个/dev/mtdxx文件。

如下所示,/dev/mtd0至/dev/mtd10对应uboot分区,mtd21对应固件分区,mtd20就是本文将要用到的数据分区。

root:/dev# ls mtd* |grep -v block

mtd0 mtd16ro mtd23ro mtd30ro

mtd0ro mtd17 mtd24 mtd31

mtd1 mtd17ro mtd24ro mtd31ro

mtd10 mtd18 mtd25 mtd3ro

mtd10ro mtd18ro mtd25ro mtd4

mtd11 mtd19 mtd26 mtd4ro

mtd11ro mtd19ro mtd26ro mtd5

mtd12 mtd1ro mtd27 mtd5ro

mtd12ro mtd2 mtd27ro mtd6

mtd13 mtd20 mtd28 mtd6ro

mtd13ro mtd20ro mtd28ro mtd7

mtd14 mtd21 mtd29 mtd7ro

mtd14ro mtd21ro mtd29ro mtd8

mtd15 mtd22 mtd2ro mtd8ro

mtd15ro mtd22ro mtd3 mtd9

mtd16 mtd23 mtd30 mtd9ro

root:/dev#

root:/dev# cat /proc/mtd

dev: size erasesize name

mtd0: 00100000 00020000 "0:SBL1"

mtd1: 00100000 00020000 "0:MIBIB"

mtd2: 00100000 00020000 "0:BOOTCONFIG"

...

mtd7: 00080000 00020000 "0:BOOTCONFIG1"

mtd8: 00080000 00020000 "0:APPSBLENV"

mtd9: 00200000 00020000 "0:APPSBL"

mtd10: 00200000 00020000 "0:APPSBL_1"

mtd11: 00080000 00020000 "0:ART"

mtd12: 00080000 00020000 "0:ART.bak"

mtd13: 00100000 00020000 "config"

mtd14: 00080000 00020000 "data1"

mtd15: 00040000 00020000 "data2"

...

mtd20: 01e00000 00020000 "mtddata"

mtd21: 02800000 00020000 "firmware"

...

mtd25: 02780000 00020000 "reserved"

ubi

ubi 是Unsorted Block Image的简称,ubifs是Unsorted Block Image File System(无序区块镜像文件系统)的简称,构建于MTD之上,可操控大容量的Nand flash.

nand flash, mtd 和ubifs三者关系可以简述为:nand flash作为硬件设备,mtd介于硬件设备和用户层间提供抽象接口,ubifs是在mtd之上构建的文件系统,方便对nand flash进行数据读写。

ubi相关概念有:

PEB: physical eraseblock, 物理擦除块,通常为128KB(131072 Bytes)

LEB: logical eraseblock, 逻辑擦除块,通常为124KB(126976 Bytes)

UBI HeadersUBI stores 2 small 64-byte headers at the beginning of each non-bad physical eraseblock:

erase counter header (or EC header) which contains the erase counter of the physical eraseblock (PEB) plus other information;

volume identifier header (or VID header) which stores the volume ID and the logical eraseblock (LEB) number to which this PEB belongs.

从字面意思也可以理解,LEB是逻辑块,PEB是物理块,实际上LEB包含于PEB,通常LEB会比PEB小4KB,其中2KB用于存储VID Headers, 另外2KB是偏移量,用于对齐。

console log

了解了相关概念,接下来从嵌入式设备的console log看看UBI相关的信息,依据log可以方便在用户空间或者内核空间搜索相关信息,定位到与之相关的代码。

UBI attach

首先来看下嵌入式linux设备开机过程中UBI设备的加载信息

Info: init ubi volumes on mtddata raw partition

UBI: attaching mtd20 to ubi0

random: procd: uninitialized urandom read (4 bytes read, 60 bits of entropy available)

UBI: scanning is finished

UBI: attached mtd20 (name "mtddata", size 30 MiB) to ubi0

UBI: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes

UBI: min./max. I/O unit sizes: 2048/2048, sub-page size 2048

UBI: VID header offset: 2048 (aligned 2048), data offset: 4096

UBI: good PEBs: 240, bad PEBs: 0, corrupted PEBs: 0

UBI: user volume: 5, internal volumes: 1, max. volumes count: 128

UBI: max/mean erase counter: 2/1, WL threshold: 4096, image sequence number: 860068978

UBI: available PEBs: 3, total reserved PEBs: 237, PEBs reserved for bad PEB handling: 20

UBI: background thread "ubi_bgt0d" started, PID 115

UBI device number 0, total 240 LEBs (30474240 bytes, 29.1 MiB), available 3 LEBs (380928 bytes, 372.0 KiB), LEB size 126976 bytes (124.0 KiB)

Info: attach ubi device on mtddata success!

以上信息绝大部分在内核UBI驱动的build.c/ubi_attach_mtd_dev函数中打印输出,下面内核空间部分会讲到。

下面分析其中几条主要信息:

# 在 mtddata 原始分区初始化 ubi 卷

Info: init ubi volumes on mtddata raw partition

# 将mtd20附加到ubi0...

UBI: attaching mtd20 to ubi0

# 将mtd20附加到ubi0 完成

UBI: attached mtd20 (name "mtddata", size 30 MiB) to ubi0

# PEB 128KB, LEB 124KB

UBI: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes

# 最小最大I/O读写单元:2048/2048, subpage 2048, 就是2KB

UBI: min./max. I/O unit sizes: 2048/2048, sub-page size 2048

# VID header偏移量2KB,用于对齐,数据偏移量4KB

UBI: VID header offset: 2048 (aligned 2048), data offset: 4096

# 240 pebs正常,无坏块

UBI: good PEBs: 240, bad PEBs: 0, corrupted PEBs: 0

# 剩余可用3 PEBs, 总保留 237 PEBs(已用或保留),用于坏块处理的保留PEBs 20(本文讨论重点)

UBI: available PEBs: 3, total reserved PEBs: 237, PEBs reserved for bad PEB handling: 20

# UBI 设备号0,共240 LEBs(29.1MB),剩余可用3 LEBs,每个LEB大小为124KB

UBI device number 0, total 240 LEBs (30474240 bytes, 29.1 MiB), available 3 LEBs (380928 bytes, 372.0 KiB), LEB size 126976 bytes (124.0 KiB)

# 成功在mtddata附加ubi设备

Info: attach ubi device on mtddata success!

从log中可以知晓很多关键信息,UBI挂载的分区名称为mtddata,对应mtd20; PEB 128KB, LEB 124KB; ubi0共240 LEBs/PEBs, 剩余可以3 LEBs/PEBs, 无坏块; 用于坏块处理的保留部分为20 PEBs, 本文后续将要介绍的就是这个保留20 PEBs是如何得来的。

ubinfo -a

开机过程会自动打印UBI的信息,那开机后如何手动获取呢,这就要用到ubi相关的工具集了,ubi相关的指令包含:

root:/# ubi

ubiattach ubidetach ubinfo ubirmvol

ubiblock ubiformat ubinize ubirsvol

ubicrc32 ubimkvol ubirename ubiupdatevol

其中的ubinfo就可以查看ubi信息

root:/# ubinfo -a

UBI version: 1

Count of UBI devices: 1

UBI control device major/minor: 10:60

Present UBI devices: ubi0

ubi0

Volumes count: 5

Logical eraseblock size: 126976 bytes, 124.0 KiB

Total amount of logical eraseblocks: 240 (30474240 bytes, 29.1 MiB)

Amount of available logical eraseblocks: 3 (380928 bytes, 372.0 KiB)

Maximum count of volumes 128

Count of bad physical eraseblocks: 0

Count of reserved physical eraseblocks: 20

Current maximum erase counter value: 2

Minimum input/output unit size: 2048 bytes

Character device major/minor: 249:0

Present volumes: 0, 1, 2, 3, 4

...

ubi0包含5个分卷,最多可包含128个分卷,其它基本信息与开机过程中内核打印的信息一致,本文主要关心下面这一条信息。

Count of reserved physical eraseblocks: 20

好啦,到此就获取并分析了最直观的log信息,下面将从用户空间和内核空间两个维度来追溯这个20 PEBs是怎么来的。

用户空间

在用户空间的ubi-utils代码库中搜索“Count of reserved physical eraseblocks”,就可以定位到函数ubinfo.c/print_dev_info

print_dev_info

static int print_dev_info(libubi_t libubi, int dev_num, int all)

{

int i, err, first = 1;

struct ubi_dev_info dev_info;

struct ubi_vol_info vol_info;

err = ubi_get_dev_info1(libubi, dev_num, &dev_info);

if (err)

return sys_errmsg("cannot get information about UBI device %d", dev_num);

printf("ubi%d\n", dev_info.dev_num);

printf("Volumes count: %d\n", dev_info.vol_count);

printf("Logical eraseblock size: ");

util_print_bytes(dev_info.leb_size, 0);

printf("\n");

printf("Total amount of logical eraseblocks: %d (", dev_info.total_lebs);

util_print_bytes(dev_info.total_bytes, 0);

printf(")\n");

printf("Amount of available logical eraseblocks: %d (", dev_info.avail_lebs);

util_print_bytes(dev_info.avail_bytes, 0);

printf(")\n");

printf("Maximum count of volumes %d\n", dev_info.max_vol_count);

printf("Count of bad physical eraseblocks: %d\n", dev_info.bad_count);

printf("Count of reserved physical eraseblocks: %d\n", dev_info.bad_rsvd);

printf("Current maximum erase counter value: %lld\n", dev_info.max_ec);

printf("Minimum input/output unit size: %d %s\n",

dev_info.min_io_size, dev_info.min_io_size > 1 ? "bytes" : "byte");

printf("Character device major/minor: %d:%d\n",

dev_info.major, dev_info.minor);

if (dev_info.vol_count == 0)

return 0;

printf("Present volumes: ");

for (i = dev_info.lowest_vol_id;

i <= dev_info.highest_vol_id; i++) {

err = ubi_get_vol_info1(libubi, dev_info.dev_num, i, &vol_info);

if (err == -1) {

if (errno == ENOENT)

continue;

return sys_errmsg("libubi failed to probe volume %d on ubi%d",

i, dev_info.dev_num);

}

if (!first)

printf(", %d", i);

else {

printf("%d", i);

first = 0;

}

}

printf("\n");

if (!all)

return 0;

first = 1;

printf("\n");

for (i = dev_info.lowest_vol_id;

i <= dev_info.highest_vol_id; i++) {

if(!first)

printf("-----------------------------------\n");

err = ubi_get_vol_info1(libubi, dev_info.dev_num, i, &vol_info);

if (err == -1) {

if (errno == ENOENT)

continue;

return sys_errmsg("libubi failed to probe volume %d on ubi%d",

i, dev_info.dev_num);

}

first = 0;

err = print_vol_info(libubi, dev_info.dev_num, i);

if (err)

return err;

}

return 0;

}

ref: http://git.infradead.org/mtd-utils.git/blob/639b871fe3d2cb3e73d21363e8c13ede2bbd9f99:/ubi-utils/ubinfo.c

打印保留size的是下面这一行,对应变量dev_info.bad_rsvd

printf("Count of reserved physical eraseblocks: %d\n", dev_info.bad_rsvd);

追踪 bad_rsvd

根据 dev_info.bad_rsvd 这个变量可以逐步逆向追溯到信息来源

从以上函数调用关系可以看出,饶了一大圈,实际上就是读了个文件里的值,没错,这个值就是存在/sys/class/ubi/ubi0/reserved_for_bad文件里

root:/sys/devices/virtual/ubi/ubi0# ls

avail_eraseblocks max_ec reserved_for_bad ubi0_2

bad_peb_count max_vol_count subsystem ubi0_3

bgt_enabled min_io_size total_eraseblocks ubi0_4

dev mtd_num ubi0_0 uevent

eraseblock_size power ubi0_1 volumes_count

root:/sys/devices/virtual/ubi/ubi0# cat reserved_for_bad

20

在目录/sys/class/ubi/ubi0下还可以看到其它ubi信息,比如avail_eraseblocks(可用块), bad_peb_count(坏块个数)等。

好啦,用户空间就到这吧,我们已经搞清楚了ubinfo -a的信息来源是系统目录下的文件,当然我们也可以很容易猜到这些文件是系统内核产生的,确切的说是UBI驱动程序产生的。

内核空间

接下来看内核空间的UBI驱动部分,查找底层驱动是如何计算坏块保留大小的。

根据开机过程的UBI log,在linux内核的drivers/mtd/ubi/目录grep搜索相关字符串(如"PEBs reserved for bad PEB handling"),可以找到打印这些log的函数ubi_attach_mtd_dev.

ubi_attach_mtd_dev

该函数用于附加MTD device到UBI并分配@ubi_num给新创建的UBI设备,在附加过程中会打印UBI设备的相关信息,也就是UBI attach部分的console log.

int ubi_attach_mtd_dev(struct mtd_info *mtd, int ubi_num,

int vid_hdr_offset, int max_beb_per1024)

{

struct ubi_device *ubi;

int i, err, ref = 0;

/*省略部分代码*/

ubi_msg("attached mtd%d (name \"%s\", size %llu MiB) to ubi%d",

mtd->index, mtd->name, ubi->flash_size >> 20, ubi_num);

ubi_msg("PEB size: %d bytes (%d KiB), LEB size: %d bytes",

ubi->peb_size, ubi->peb_size >> 10, ubi->leb_size);

ubi_msg("min./max. I/O unit sizes: %d/%d, sub-page size %d",

ubi->min_io_size, ubi->max_write_size, ubi->hdrs_min_io_size);

ubi_msg("VID header offset: %d (aligned %d), data offset: %d",

ubi->vid_hdr_offset, ubi->vid_hdr_aloffset, ubi->leb_start);

ubi_msg("good PEBs: %d, bad PEBs: %d, corrupted PEBs: %d",

ubi->good_peb_count, ubi->bad_peb_count, ubi->corr_peb_count);

ubi_msg("user volume: %d, internal volumes: %d, max. volumes count: %d",

ubi->vol_count - UBI_INT_VOL_COUNT, UBI_INT_VOL_COUNT,

ubi->vtbl_slots);

ubi_msg("max/mean erase counter: %d/%d, WL threshold: %d, image sequence number: %u",

ubi->max_ec, ubi->mean_ec, CONFIG_MTD_UBI_WL_THRESHOLD,

ubi->image_seq);

ubi_msg("available PEBs: %d, total reserved PEBs: %d, PEBs reserved for bad PEB handling: %d",

ubi->avail_pebs, ubi->rsvd_pebs, ubi->beb_rsvd_pebs);

/*省略部分代码*/

}

ref: https://elixir.bootlin.com/linux/v3.14.77/source/drivers/mtd/ubi/build.c#L867

从函数可以看到打印坏块保留分区的语句:

ubi_msg("available PEBs: %d, total reserved PEBs: %d, PEBs reserved for bad PEB handling: %d",

ubi->avail_pebs, ubi->rsvd_pebs, ubi->beb_rsvd_pebs);

结合以下变量定义

/**

* struct ubi_device - UBI device description structure

* ...

* @rsvd_pebs: count of reserved physical eraseblocks

* @avail_pebs: count of available physical eraseblocks

* @beb_rsvd_pebs: how many physical eraseblocks are reserved for bad PEB

* handling

* @beb_rsvd_level: normal level of PEBs reserved for bad PEB handling

* ...

*/

ref: https://elixir.bootlin.com/linux/v3.14.77/source/drivers/mtd/ubi/ubi.h#L383

可知变量beb_rsvd_pebs对应的就是为坏块预留的大小,beb_rsvd_level是坏块预留的常规等级,这两者有啥联系呢。OK,接下来要做的和用户空间一样,我们进行逆向追踪,看下这两个变量的值是如何获取的。

追踪 beb_rsvd_pebs

图中虚线代表非直接调用关系,虚线中间的全局变量代表的是两个节点的关联信息,捋一下:

ubi_eba_init 调用 ubi_calculate_reserved 函数计算出 beb_rsvd_level

ubi_calculate_reserved 调用了 get_bad_peb_limit 获取 bad_peb_limit

get_bad_peb_limit 调用了其它3个函数计算 bad_peb_limit

ubi_eba_init 将 beb_rsvd_level 赋值给 beb_rsvd_level

ubi_attach_mtd_dev 将 beb_rsvd_pebs 打印到 console

有点绕,没关系,下面按照箭头方向从下往上逐一细说。

ubi_eba_init

ubi_eba_init 使用ubi信息初始化EBA子系统,但是这个我们不关心,主要看其中一小段代码。

int ubi_eba_init(struct ubi_device *ubi, struct ubi_attach_info *ai)

{

/*省略部分代码*/

if (ubi->bad_allowed) {

ubi_calculate_reserved(ubi);

if (ubi->avail_pebs < ubi->beb_rsvd_level) {

/* No enough free physical eraseblocks */

ubi->beb_rsvd_pebs = ubi->avail_pebs;

print_rsvd_warning(ubi, ai);

} else

ubi->beb_rsvd_pebs = ubi->beb_rsvd_level;

ubi->avail_pebs -= ubi->beb_rsvd_pebs;

ubi->rsvd_pebs += ubi->beb_rsvd_pebs;

}

/*省略部分代码*/

}

在允许坏块,并且有足够PEBs的情况下, beb_rsvd_pebs 等于 beb_rsvd_level

ubi->beb_rsvd_pebs = ubi->beb_rsvd_level;

ok,那接下来的问题是beb_rsvd_level如何得来,继续往下↓

ubi_calculate_reserved

从上面的流程图可知,beb_rsvd_level 由 以下函数计算得到。

/**

* ubi_calculate_reserved - calculate how many PEBs must be reserved for bad

* eraseblock handling.

* @ubi: UBI device description object

*/

void ubi_calculate_reserved(struct ubi_device *ubi)

{

/*

* Calculate the actual number of PEBs currently needed to be reserved

* for future bad eraseblock handling.

*/

ubi->beb_rsvd_level = ubi->bad_peb_limit - ubi->bad_peb_count;

if (ubi->beb_rsvd_level < 0) {

ubi->beb_rsvd_level = 0;

ubi_warn("number of bad PEBs (%d) is above the expected limit (%d), not reserving any PEBs for bad PEB handling, will use available PEBs (if any)",

ubi->bad_peb_count, ubi->bad_peb_limit);

}

}

这个函数的精髓就一行, beb_rsvd_level 等于坏块限制 bad_peb_limit 减去当前已检测到的坏块数量bad_peb_count.

ubi->beb_rsvd_level = ubi->bad_peb_limit - ubi->bad_peb_count;

由于检测到的坏块数量与硬件实际情况有关,我们不深究,接下来继续追溯 bad_peb_limit 的来源。

get_bad_peb_limit

get_bad_peb_limit 就是用来计算坏块限制的函数,函数中有段注释,大致意思是我们不能保证坏块平均分摊在整个flash芯片,考虑最坏情况,有可能所有坏块都出现在附加了ubi的MTD分区。因此在计算限制大小时使用的是整个flash size.

static int get_bad_peb_limit(const struct ubi_device *ubi, int max_beb_per1024)

{

int limit, device_pebs;

uint64_t device_size;

if (!max_beb_per1024)

return 0;

/*

* Here we are using size of the entire flash chip and

* not just the MTD partition size because the maximum

* number of bad eraseblocks is a percentage of the

* whole device and bad eraseblocks are not fairly

* distributed over the flash chip. So the worst case

* is that all the bad eraseblocks of the chip are in

* the MTD partition we are attaching (ubi->mtd).

*/

device_size = mtd_get_device_size(ubi->mtd);

device_pebs = mtd_div_by_eb(device_size, ubi->mtd);

limit = mult_frac(device_pebs, max_beb_per1024, 1024);

/* Round it up */

if (mult_frac(limit, 1024, max_beb_per1024) < device_pebs)

limit += 1;

return limit;

}

这里用到3个函数:

mtd_get_device_size - 获取整个flash芯片的大小

mtd_div_by_eb - 将flash大小换算成eraseblock个数,就是将Bytes单位换算为PEBs

mult_frac - 分数相乘函数,把以PEB为单位的limit值乘以一个坏块系数

前面两个函数都好理解,单独来看下mult_frac, 这其实是个宏定义,用于分数相乘。

/*

* Multiplies an integer by a fraction, while avoiding unnecessary

* overflow or loss of precision.

*/

#define mult_frac(x, numer, denom)( \

{ \

typeof(x) quot = (x) / (denom); \

typeof(x) rem = (x) % (denom); \

(quot * (numer)) + ((rem * (numer)) / (denom)); \

} \

)

举例说明,假设flash为128MB(134,217,728 Bytes), get_bad_peb_limit函数用到的max_beb_per1024来自于kernel config, 默认值为20,代表每1024 PEBs中最多允许20个坏块,那么对应的limit计算如下:

device_size = 134217728; /* flash size 128MB */

device_pebs = 134217728 / (128 * 1024) = 1024; /* eraseblock: 128KB */

limit = mult_frac(device_pebs, max_beb_per1024, 1024) = 1024 * (20 / 1024) = 20;

最终计算得到bad_peb_limit为20 PEBs, 与 ubinfo -a 中的结果一致。

小结

本文首先从 console log 入手分析了UBI相关配置信息,然后从用户空间和内核空间两个方面分析了UBI信息的来源以及坏块保留大小的计算. 其中坏块保留大小 beb_rsvd_pebs 的计算过程可以归结为:

/* get_bad_peb_limit */

device_size = mtd_get_device_size(ubi->mtd);

device_pebs = mtd_div_by_eb(device_size, ubi->mtd);

limit = mult_frac(device_pebs, max_beb_per1024, 1024);

ubi->bad_peb_limit = get_bad_peb_limit(ubi, max_beb_per1024);

/* ubi_calculate_reserved */

ubi->beb_rsvd_level = ubi->bad_peb_limit - ubi->bad_peb_count;

/* ubi_eba_init */

ubi->beb_rsvd_pebs = ubi->beb_rsvd_level;

参考

https://elixir.bootlin.com/linux/v3.14.77/source/drivers/mtd/ubi

http://git.infradead.org/mtd-utils.git

本文作者:litreily

本文链接:https://www.litreily.top/2020/05/07/ubi-driver/

版权声明:本博客所有文章除特殊声明外,均采用 CC BY-NC 4.0 许可协议。转载请注明出处 litreily的博客!

查看原文