妇科衣原体是什么病| 口干舌燥口苦是什么原因引起的| 苎麻是什么| 布洛芬吃多了有什么后果| 指甲上的月牙代表什么| 广西北海有什么好玩的地方| 思维跳脱是什么意思| 梅花象征着什么| 小孩过敏性咳嗽吃什么药| 梦见蛇是什么预兆| 后羿属什么生肖| 草字头见念什么| 电解质水有什么好处| 什么是三有保护动物| 牙龈肿痛吃什么中成药| 煜这个字读什么| 羊属于什么科| 清炖排骨汤放什么调料| 瘢痕是什么意思| 血管属于什么组织| 长期失眠吃什么食物好| 突然头晕眼花站不稳是什么原因| 为什么肚子老是胀气| 梦到捡到钱是什么预兆| hg是什么元素| 丈青色是什么颜色| 古灵精怪是什么意思| 鼻子老是出血是什么原因| 神仙是什么生肖| 瑞字属于五行属什么| 什么药治痒效果最好| is是什么组织| 手心热吃什么药| 什么是双氧水| 4月5日是什么星座| 再生牙技术什么时候能实现| 肝的主要功能是什么| champion什么牌子| 脓包疮用什么药| 精液是什么形成的| 糖是什么意思| 手足口疫苗叫什么名字| boss是什么意思| 新奇的什么| hardy是什么意思| 打豆豆是什么意思| 胃疼吃什么药好| 刀纸是什么| 授人以鱼不如授人以渔什么意思| 心悸是什么原因造成的呢| 阴茎出血是什么原因| 什么的废墟| 林彪为什么要叛逃| 左侧肚脐旁边疼是什么原因| 25度天气穿什么衣服| 深圳市市长什么级别| 什么什么挺立| 医保乙类是什么意思| 常州有什么特产| boy是什么牌子| 孙膑是什么学派| 毛爷爷是什么意思| 腰痛是什么原因引起的| 憩息是什么意思| 药娘吃的什么药| 视力模糊是什么原因引起的| 西洋参有什么功效和作用| 什么叫关税| 什么是早教机| 阳历三月是什么星座| 什么是事实婚姻| 什么的天空填合适的词| 湾湾是什么意思| 一个巾一个童读什么| 泛性恋是什么意思| 胆囊病变是什么意思| 广州地铁什么时候停运| 加速度是什么意思| 身份证借给别人有什么危害性| 为什么针灸后越来越痛| 头皮发麻是什么病的前兆| 龋病是什么意思| 小肚子是什么部位| xn是什么意思| 跳爵士舞穿什么衣服| 坏血症什么症状| 女人为什么不会怀孕| 孔子的父亲叫什么| 基因病是什么意思| 酸奶不能和什么一起吃| 境字五行属什么| 凤眼果什么时候成熟| 被蜜蜂蛰了涂什么药膏| 王加民念什么| 互攻是什么意思| 王允和貂蝉什么关系| 一甲子是什么意思| 上嘴唇发白是因为什么原因| 睾丸疼痛挂什么科| 瓞是什么意思| 不能生育的女人有什么特征| 壬水是什么水| 梦到蜘蛛是什么意思| 雍正为什么只在位13年| 什么带不能系| 牛子是什么| 卵巢囊性占位是什么意思| 胎盘植入是什么意思| 高血压二级是什么意思| 荣誉的誉是什么意思| 为什么肾阳虚很难恢复| 夜宵和宵夜有什么区别| 初级会计什么时候拿证| 美国为什么不敢打伊朗| 数字2代表什么意思| 临床医学主要学什么| 心跳慢是什么原因| 为什么丰胸霜一抹就变大| 胸闷是什么原因引起的| 美的不可方物是什么意思| 腰膝酸软是什么症状| dvt是什么意思| 脂溢性皮炎用什么药膏| 狮子座女和什么星座最配| 什么是音色| lg是什么牌子| 幽闭恐惧症是什么症状| 拔牙之前要注意什么| 为什么要割包皮| 福字挂在家里什么位置最合适| 388是什么意思| 81什么意思| 心脏早搏吃什么药好| 文火是什么意思| 唐山大地震是什么时候| 老丈人是什么意思| 肚子左侧疼是什么原因| 卤牛肉用什么部位| 大麦和小麦有什么区别| 吃什么补雌激素| 蛇为什么有毒| rm是什么意思| 怎么知道自己是什么血型| 月光蓝是什么颜色| 打呼噜是什么原因引起的| 百鸟朝凤是什么生肖| 什么叫扁平疣长什么样| 血压低吃什么水果最好| 勃勃生机是什么意思| mm是什么病| 月经不调有什么危害| 办身份证需要带什么| 心脏大是什么原因| 什么是数位| 什么是对冲| 总胆汁酸高是什么意思| 茉莉花长什么样| 减肥适合吃什么| 月经过后有褐色分泌物是什么原因| 什么是肺大泡| 婴儿放屁多是什么原因| 艳字五行属什么| 黄体酮不足吃什么药| ad什么时候吃最好| 黑头是什么| 肾病应该吃什么| 揉肚子有什么好处| 八是什么生肖| 狗狗拉血是什么原因| 馋肉是身体里缺什么| catl是什么意思| 被蜱虫咬了有什么症状| 什么叫三叉神经痛| 8月11是什么星座| 三头六臂是什么意思| 点状强回声是什么意思| 用什么药可以缩阴紧致| 一根筋是什么意思| 风湿热是什么病| 电视剧上星是什么意思| 干细胞能治什么病| 痔疮吃什么药效果好| 什么茶下火| 胃消化不好吃什么调理| 眼肿是什么原因| bitch是什么意思| 什么水果对眼睛好| 尿结石什么症状| 膝盖疼痛吃什么药好| 周到是什么意思| 1992年什么命| 去香港需要准备什么| 荷花象征着什么| 花胶有什么功效| 胃难受吃什么药| 牙疼吃什么止疼药见效快| 狐仙一般找什么人上身| 甲减吃什么盐| 脚后跟疼是什么病| 貌合神离是什么意思| 化气行水是什么意思| 什么叫高尿酸血症| 十二指肠溃疡吃什么中成药| 牙齿痛吃什么消炎药| 火是什么| 沉香有什么作用| 宫外孕是什么| 孕妇现在吃什么水果好| 食糜是什么意思| 新生儿为什么会有黄疸| 陈醋泡花生米有什么功效| 升白针叫什么名字| 02年属马的是什么命| 2岁打什么疫苗| 冒泡是什么意思| 夏天适合种植什么蔬菜| 眼睛五行属什么| 什么是脱脂牛奶| 器质性心脏病是什么意思| 什么条件| 七个月宝宝能吃什么水果| 不甘心是什么意思| 经常抽筋是什么原因| 珍珠疹是什么| 富士康是做什么的| 吃鱼生后吃什么打虫药| 为什么很多人不去庐山| 总口渴是什么原因| 喝菊花茶有什么好处| 发飙是什么意思| 高尿酸有什么症状| 爱打扮的女人说明什么| 女人心肌缺血吃什么药| 霸王别姬讲的是什么故事| 肝内点状钙化灶什么意思| 湿疹是什么原因引起的| 脚底有痣代表什么意思| azul是什么颜色| 腺肌症有什么症状| 很什么很什么| 脑血管堵塞吃什么药最好| 同心同德是什么意思| 喝酒容易醉是什么原因| 蟑螂长什么样| 经常喝饮料有什么危害| 开封古代叫什么| 舌苔黑是什么病| 手足口吃什么药| 甲基是什么| 梦见捡板栗是什么意思| 西药是什么药| 拔牙挂什么科| 梦见卖东西是什么意思| 薏米长什么样子的图片| 11月16日是什么星座| o型血的孩子父母是什么血型| 百香果什么时候开花结果| cll是什么意思| 丘疹是什么| 一孕傻三年是什么意思| 霉菌是什么引起的| 蟑螂讨厌什么味道| 肩周炎有什么症状| 甲减是什么症状| 百度
|
|
Subscribe / Log in / New account

中国银保监会党委班子宣布成立 郭树清任党委书记

百度 总部位于深圳。

December 22, 2009

This article was contributed by Steven Rostedt

$ sudo subscribe today

Subscribe today and elevate your LWN privileges. You’ll have access to all of LWN’s high-quality articles as soon as they’re published, and help support LWN in the process. Act now and you can start with a free trial subscription.

The Ftrace tracing utility has many different features that will assist in tracking down Linux kernel problems. The previous article discussed setting up Ftrace, using the function and function graph tracers, using trace_printk(), and a simple way to stop the recording of a trace from user space. This installment will touch on how user space can interact with Ftrace, faster ways of stopping the trace, debugging a crash, and finding what kernel functions are the biggest stack hogs.

Trace Markers

Seeing what happens inside the kernel gives the user a better understanding of how their system works. But sometimes there needs to be coordination between what is happening in user space and what is happening inside the kernel. The timestamps that are shown in the traces are all relative to what is happening within the trace, but they do not correspond well with wall time.

To help synchronize between the actions in user space and kernel space, the trace_marker file was created. It provides a way to write into the Ftrace ring buffer from user space. This marker will then appear in the trace to give a location in the trace of where a specific event occurred.

    [tracing]# echo hello world > trace_marker
    [tracing]# cat trace
    # tracer: nop
    #
    #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
    #              | |       |          |         |
               <...>-3718  [001]  5546.183420: 0: hello world

The <...> indicates that the name of the task that wrote the marker was not recorded. Future releases may fix this.

Starting, Stopping and Recording in a Program

The tracing_on and trace_marker files work very well to trace the activities of an application if the source of the application is available. If there is a problem within the application and you need to find out what is happening inside the kernel at a particular location of the application, these two files come in handy.

At the start of the application, you can open these files to have the file descriptors ready:

    int trace_fd = -1;
    int marker_fd = -1;

    int main(int argc, char **argv)
    {
	    char *debugfs;
	    char path[256];
	    [...]

	    debugfs = find_debugfs();
	    if (debugfs) {
		    strcpy(path, debugfs);  /* BEWARE buffer overflow */
		    strcat(path,"/tracing/tracing_on");
		    trace_fd = open(path, O_WRONLY);
		    if (trace_fd >= 0)
			    write(trace_fd, "1", 1);

		    strcpy(path, debugfs);
		    strcat(path,"/tracing/trace_marker");
		    marker_fd = open(path, O_WRONLY);

Then, at some critical location in the code, markers can be placed to show where the application currently is:

    if (marker_fd >= 0)
	    write(marker_fd, "In critical area\n", 17);

    if (critical_function() < 0) {
	    /* we failed! */
	    if (trace_fd >= 0)
		    write(trace_fd, "0", 1);
    }

In looking at the example, first you see a function called "find_debugfs()". The proper location to mount the debug file system is at /sys/kernel/debug but a robust tool should not depend on the debug file system being mounted there. An example of find_debugfs() is located here. The file descriptors are initialized to -1 to allow this code to work both with and without a tracing enabled kernel.

When the problem is detected, writing the ASCII character "0" into the trace_fd file descriptor stops tracing. As discussed in part 1, this only disables the recording into the Ftrace ring buffer, but the tracers are still incurring overhead.

When using the initialization code above, tracing will be enabled at the beginning of the application because the tracer runs in overwrite mode. That is, when the trace buffer fills up, it will remove the old data and replace it with the new. Since only the most recent trace information is relevant when the problem occurs there is no need to stop and start the tracing during the normal running of the application. The tracer only needs to be disabled when the problem is detected so the trace will have the history of what led up to the error. If interval tracing is needed within the application, it can write an ASCII "1" into the trace_fd to enable the tracing.

Here is an example of a simple program called simple_trace.c that uses the initialization process described above:

    req.tv_sec = 0;
    req.tv_nsec = 1000;
    write(marker_fd, "before nano\n", 12);
    nanosleep(&req, NULL);
    write(marker_fd, "after nano\n", 11);
    write(trace_fd, "0", 1);

(No error checking was added due to this being a simple program for example purposes only.)

Here is the process to trace this simple program:

    [tracing]# echo 0 > tracing_on
    [tracing]# echo function_graph > current_tracer
    [tracing]# ~/simple_trace
    [tracing]# cat trace

The first line disables tracing because the program will enable it at start up. Next the function graph tracer is selected. The program is executed, which results in the following trace. Note that the output can be a little verbose so much of it has been cut and replaced with [...]:

    [...]
     0)               |      __kmalloc() {
     0)   0.528 us    |        get_slab();
     0)   2.271 us    |      }
     0)               |      /* before nano */
     0)               |      kfree() {
     0)   0.475 us    |        __phys_addr();
     0)   2.062 us    |      }
     0)   0.608 us    |      inotify_inode_queue_event();
     0)   0.485 us    |      __fsnotify_parent();
    [...]
     1)   0.523 us    |          _spin_unlock();
     0)   0.495 us    |    current_kernel_time();
     1)               |          it_real_fn() {
     0)   1.602 us    |  }
     1)   0.728 us    |            __rcu_read_lock();
     0)               |  sys_nanosleep() {
     0)               |    hrtimer_nanosleep() {
     0)   0.526 us    |      hrtimer_init();
     1)   0.418 us    |            __rcu_read_lock();
     0)               |      do_nanosleep() {
     1)   1.114 us    |            _spin_lock_irqsave();
    [...]
     0)               |      __kmalloc() {
     1)   2.760 us    |  }
     0)   0.556 us    |        get_slab();
     1)               |  mwait_idle() {
     0)   1.851 us    |      }
     0)               |      /* after nano */
     0)               |      kfree() {
     0)   0.486 us    |        __phys_addr();

Notice that the writes to trace_marker show up as comments in the function graph tracer.

The first column here represents the CPU. When we have the CPU traces interleaved like this, it may become hard to read the trace. The tool grep can easily filter this, or the per_cpu trace files may be used. The per_cpu trace files are located in the debugfs tracing directory under per_cpu.

    [tracing]# ls per_cpu
    cpu0  cpu1  cpu2  cpu3  cpu4  cpu5  cpu6  cpu7

There exists a trace file in each one of these CPU directories that only show the trace for that CPU.

To get a nice view of the function graph tracer without the interference of other CPUs just look at per_cpu/cpu0/trace.

    [tracing]# cat per_cpu/cpu0/trace
     0)               |      __kmalloc() {
     0)   0.528 us    |        get_slab();
     0)   2.271 us    |      }
     0)               |      /* before nano */
     0)               |      kfree() {
     0)   0.475 us    |        __phys_addr();
     0)   2.062 us    |      }
     0)   0.608 us    |      inotify_inode_queue_event();
     0)   0.485 us    |      __fsnotify_parent();
     0)   0.488 us    |      inotify_dentry_parent_queue_event();
     0)   1.106 us    |      fsnotify();
    [...]
     0)   0.721 us    |    _spin_unlock_irqrestore();
     0)   3.380 us    |  }
     0)               |  audit_syscall_entry() {
     0)   0.495 us    |    current_kernel_time();
     0)   1.602 us    |  }
     0)               |  sys_nanosleep() {
     0)               |    hrtimer_nanosleep() {
     0)   0.526 us    |      hrtimer_init();
     0)               |      do_nanosleep() {
     0)               |        hrtimer_start_range_ns() {
     0)               |          __hrtimer_start_range_ns() {
     0)               |            lock_hrtimer_base() {
     0)   0.866 us    |              _spin_lock_irqsave();
    [...]
     0)               |      __kmalloc() {
     0)               |        get_slab() {
     0)   1.851 us    |      }
     0)               |      /* after nano */
     0)               |      kfree() {
     0)   0.486 us    |        __phys_addr();

Disabling the Tracer Within the Kernel

During the development of a kernel driver there may exist strange errors that occur during testing. Perhaps the driver gets stuck in a sleep state and never wakes up. Trying to disable the tracer from user space when a kernel event occurs is difficult and usually results in a buffer overflow and loss of the relevant information before the user can stop the trace.

There are two functions that work well inside the kernel: tracing_on() and tracing_off(). These two act just like echoing "1" or "0" respectively into the tracing_on file. If there is some condition that can be checked for inside the kernel, then the tracer may be stopped by adding something like the following:

    if (test_for_error())
	    tracing_off();

Next, add several trace_printk()s (see part 1), recompile, and boot the kernel. You can then enable the function or function graph tracer and just wait for the error condition to happen. Examining the tracing_on file will let you know when the error condition occurred. It will switch from "1" to "0" when the kernel calls tracing_off().

After examining the trace, or saving it off in another file with:

cat trace > ~/trace.sav
you can continue the trace to examine another hit. To do so, just echo "1" into tracing_on, and the trace will continue. This is also useful if the condition that triggers the tracing_off() call can be triggered legitimately. If the condition was triggered by normal operation, just restart the trace by echoing a "1" back into tracing_on and hopefully the next time the condition is hit will be because of the abnormality.

ftrace_dump_on_oops

There are times that the kernel will crash and examining the memory and state of the crash is more of a CSI science than a program debugging science. Using kdump/kexec with the crash utility is a valuable way to examine the state of the system at the point of the crash, but it does not let you see what has happened prior to the event that caused the crash.

Having Ftrace configured and enabling ftrace_dump_on_oops in the kernel boot parameters, or by echoing a "1" into /proc/sys/kernel/ftrace_dump_on_oops, will enable Ftrace to dump to the console the entire trace buffer in ASCII format on oops or panic. Having the console output to a serial log makes debugging crashes much easier. You can now trace back the events that led up to the crash.

Dumping to the console may take a long time since the default Ftrace ring buffer is over a megabyte per CPU. To shrink the size of the ring buffer, write the number of kilobytes you want the ring buffer to be to buffer_size_kb. Note that the value is per CPU, not the total size of the ring buffer.

    [tracing]# echo 50 > buffer_size_kb
The above will shrink the Ftrace ring buffer down to 50 kilobytes per CPU.

You can also trigger a dump of the Ftrace buffer to the console with sysrq-z.

To choose a particular location for the kernel dump, the kernel may call ftrace_dump() directly. Note, this may permanently disable Ftrace and a reboot may be necessary to enable it again. This is because ftrace_dump() reads the buffer. The buffer is made to be written to in all contexts (interrupt, NMI, scheduling) but the reading of the buffer requires locking. To be able to perform ftrace_dump() the locking is disabled and the buffer may end up being corrupted after the output.

    /*
     * The following code will lock up the box, so we dump out the
     * trace before we hit that location.
     */
    ftrace_dump();

    /* code that locks up */

Stack Tracing

The final topic to discuss is the ability to examine the size of the kernel stack and how much stack space each function is using. Enabling the stack tracer (CONFIG_STACK_TRACER) will show where the biggest use of the stack takes place.

The stack tracer is built from the function tracer infrastructure. It does not use the Ftrace ring buffer, but it does use the function tracer to hook into every function call. Because it uses the function tracer infrastructure, it does not add overhead when not enabled. To enable the stack tracer, echo 1 into /proc/sys/kernel/stack_tracer_enabled. To see the max stack size during boot up, add "stacktrace" to the kernel boot parameters.

The stack tracer checks the size of the stack at every function call. If it is greater than the last recorded maximum, it records the stack trace and updates the maximum with the new size. To see the current maximum, look at the stack_max_size file.

    [tracing]# echo 1 > /proc/sys/kernel/stack_tracer_enabled
    [tracing]# cat stack_max_size
    2928
    [tracing]# cat stack_trace
            Depth    Size   Location    (34 entries)
            -----    ----   --------
      0)     2952      16   mempool_alloc_slab+0x15/0x17
      1)     2936     144   mempool_alloc+0x52/0x104
      2)     2792      16   scsi_sg_alloc+0x4a/0x4c [scsi_mod]
      3)     2776     112   __sg_alloc_table+0x62/0x103
    [...]
     13)     2072      48   __elv_add_request+0x98/0x9f
     14)     2024     112   __make_request+0x43e/0x4bb
     15)     1912     224   generic_make_request+0x424/0x471
     16)     1688      80   submit_bio+0x108/0x115
     17)     1608      48   submit_bh+0xfc/0x11e
     18)     1560     112   __block_write_full_page+0x1ee/0x2e8
     19)     1448      80   block_write_full_page_endio+0xff/0x10e
     20)     1368      16   block_write_full_page+0x15/0x17
     21)     1352      16   blkdev_writepage+0x18/0x1a
     22)     1336      32   __writepage+0x1a/0x40
     23)     1304     304   write_cache_pages+0x241/0x3c1
     24)     1000      16   generic_writepages+0x27/0x29
    [...]
     30)      424      64   bdi_writeback_task+0x3f/0xb0
     31)      360      48   bdi_start_fn+0x76/0xd7
     32)      312     128   kthread+0x7f/0x87
     33)      184     184   child_rip+0xa/0x20

Not only does this give you the size of the maximum stack found, it also shows the breakdown of the stack sizes used by each function. Notice that write_cache_pages had the biggest stack with 304 bytes being used, followed by generic_make_request with 224 bytes of stack.

To reset the maximum, echo "0" into the stack_max_size file.

    [tracing]# echo 0 > stack_max_size

Keeping this running for a while will show where the kernel is using a bit too much stack. But remember that the stack tracer only has no overhead when it is not enabled. When it is running you may notice a bit of a performance degradation.

Note that the stack tracer will not trace the max stack size when the kernel is using a separate stack. Because interrupts have their own stack, it will not trace the stack usage there. The reason is that currently there is no easy way to quickly see what the top of the stack is when the stack is something other than the current task's stack. When using split stacks, a process stack may be two pages but the interrupt stack may only be one. This may be fixed in the future, but keep this in mind when using the stack tracer.

Conclusion

Ftrace is a very powerful tool and easy to configure. No extra tools are necessary. Everything that was shown it this tutorial can be used on embedded devices that only have Busybox installed. Taking advantage of the Ftrace infrastructure should cut the time needed to debug that hard-to-find race condition. I seldom use printk() any more because using the function and function graph tracers along with trace_printk() and tracing_off() have become my main tools for debugging the Linux kernel.


Index entries for this article
KernelDevelopment tools/Kernel tracing
KernelFtrace
KernelTracing/Ftrace
GuestArticlesRostedt, Steven


to post comments

Debugging the kernel using Ftrace - part 2

Posted Jan 12, 2023 6:32 UTC (Thu) by psc (guest, #151343) [Link]

could you give me a C program example about start ftrace , execute app,stop ftrace, thank you very much


Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds

肚胀是什么原因 面部神经痉挛吃什么药 rangerover是什么车 动物蛋白是什么 落选是什么意思
胃糜烂吃什么药可以根治 什么是超纤皮 什么治失眠最有效 经常口腔溃疡是什么原因引起的 来例假头晕是什么原因
站着说话不腰疼什么意思 rap是什么意思 产后可以吃什么水果 什么是失信被执行人 女生腋毛多是什么原因
肌酐300多属于什么期 双离合什么意思 什么可以变白皮肤 什么是植物神经紊乱 吃生姜对身体有什么好处
一热就咳嗽是什么原因hcv9jop8ns3r.cn 子宫内膜厚什么原因引起的hcv8jop8ns5r.cn 肌肤之钥是什么档次hcv8jop5ns0r.cn 回执单是什么意思hcv8jop6ns9r.cn 什么的英语单词hcv8jop1ns3r.cn
肩膀上的肌肉叫什么hcv7jop6ns0r.cn 姊是什么意思hcv8jop6ns1r.cn 吃什么东西可以减肥hcv8jop4ns5r.cn 无国界医生是什么意思hcv8jop6ns7r.cn 凤是什么意思hcv9jop1ns9r.cn
当今社会做什么赚钱hcv9jop1ns2r.cn 毛豆不能和什么一起吃hcv8jop5ns6r.cn 属兔配什么属相最好hcv8jop5ns4r.cn 补气血吃什么最好最快hcv8jop5ns9r.cn 黄金变黑是什么原因hcv8jop9ns7r.cn
1988年是什么命hcv8jop2ns8r.cn 传教士是什么意思hcv8jop3ns5r.cn 什么眉什么脸yanzhenzixun.com 结婚送什么wuhaiwuya.com 英气是什么意思hcv9jop7ns9r.cn
百度