Bad blocks in NAND - buildroot-2015.05 - ubifs read-only mode

  • Hello,


    I have the buildroot-2015.05 version with ubifs in read-only mode. Nothing is ever written to NAND memory. It is 100% read-only at all times.

    But after 1 or 2 months of machine operation, more than 25% of the machines become corrupted in some block of the NAND. (15 of 44 machines)

    On the old F&S boards, before changing the NAND chip because of the component crisis, this problem did not occur.


    I attach boot logs.


    Does anyone know what may be the cause of this problem? It never writes to the NAND.

    I need help because the machines are for medical treatment and it is already approved and customers are withdrawing orders.


    Thank you.

  • This is really strange. Is the log from the same board as the picture? Because in the picture, block 141 is bad, but in the log file it is block 147.


    The error message from the log shows error 1 in ubi_io_read(). Which means that mtd_read() returned 1. Errors are typically negative values, so a positive value is rather strange. I have followed back the code and if I'm not wrong, then the return value comes from function nand_do_read_ops() in file drivers/mtd/nand/nand_base.c. But when I look at the current code, I can not see how it can ever return a value of 1.


    Which release is your kernel based on? Kernel 3.0.15 is either release fsimx6-V2.0 or fsimx6-V2.1. There was one fix in the NAND driver between these two versions that could actually be an explanation for this error if you are still on fsimx6-V2.0. So can you have a look at function nand_do_read_ops() in file drivers/mtd/nand/nand_base.c. There is a comment line


    /* Transfer not aligned data */


    Please check if the line above this comment shows


    ret = 0;


    If not, then please insert this line before the comment line, recompile the kernel and install it on the board in U-Boot. Maybe this will solve your problem. Just install the kernel, then it should be capable of reading your UBI even with the bad block.


    Your F&S Support Team

    F&S Elektronik Systeme GmbH
    As this is an international forum, please try to post in English.
    Da dies ein internationales Forum ist, bitten wir darum, Beiträge möglichst in Englisch zu verfassen.

  • Sorry,

    the image belongs to another failed board.


    drivers/mtd/nand/nand_base.c. Comment: /* Transfer not aligned data */




    We use:


    Versión NBOOT: F&S NAND Loader VN29 build Aug 9 2025

    Versión U-Boot: U-Boot 2014.07 (Dec 23 2015 - 19:21:35) for F&S

    Kernel: Linux-3.0.35-F+S

    Buildroot: buildroot-2015.05-fsimx6-V2.1


    do you know what might be happening?


    Thank you

  • Yes, this is the code from V2.0. This would explain the problem. The code in V2.1 looks like this:



    Please add this line, recompile the kernel and try again. The problem should disappear.


    Your F&S Support Team

    F&S Elektronik Systeme GmbH
    As this is an international forum, please try to post in English.
    Da dies ein internationales Forum ist, bitten wir darum, Beiträge möglichst in Englisch zu verfassen.

  • NAND flash is no ROM. After many many millions of read cycles, the charge of the NAND cells has decreased considerably and single bits may flip. Typically, the ECC can correct these bit flips, so it does not matter. The code that we are looking at is dealing exactly with this part. There are bit flips in a page, but they can be corrected by ECC. So the function should detect the bit flips, but as long as they are under a certain threshold, it will still report the page as OK (error 0). If it exceeds this threshold, the page read returns a specific error -EUCLEAN. This means the page had bit flips, that could be corrected, but the upper layer should take care of this, for example by refreshing the page. Refreshing means, the page is erased and written again. Then all NAND cells in this page have the full charge again and all bits should be read correctly afterwards.


    Of course real I/O errors should also be reported to the caller. This is the if (ret < 0). If there is a real error, indicated by a negative value, then the loop breaks and returns immediately. Normally, the return code is zero in all other cases. But not here. The read functions return the number of flipped bits. So even a single flipped bit will be returned as one. This is OK if it happens in a page somewhere in the middle of the loop. But if the flipped bit happens to be in the last page handled in the loop, then this value 1 is passed through to the end of the function and is finally returned. But this has the meaning of "one uncorrectable bit flip", which is a wrong information. So the upper UBI layer handles this (correctly) as a real error, which it isn't. So we need to set the ret value to zero after the if comparison.


    So yes, it does affect you, because there are also bit flips in read-only scenarios.


    Your F&S Support Team

    F&S Elektronik Systeme GmbH
    As this is an international forum, please try to post in English.
    Da dies ein internationales Forum ist, bitten wir darum, Beiträge möglichst in Englisch zu verfassen.