Bad eraseblock on boot armStoneA9 in fsimx6-B2024.04

  • Hi,


    fsimx6-B2024.04 works fine in armStoneA9 but we are doing power ON/power OFF tests and we have a problem at startup on one of the 10 machines. We don't know if it could be because this armStoneA9 is no longer in good condition or it could be another reason.


    The problem is that it fails to boot 50% of the time because it gets stuck at kernel boot reporting that ALL the NAND blocks are corrupt. If we turn the machine off and on again, it starts normally, until the same problem occurs again at another startup.


    Boot logs: https://drive.google.com/uc?ex…VGq7H32c91SkMtojFrYLHKQBr


    In the boot logs the failed boot is the last boot in the file, the previous ones being all correct boots.


    Do you know if the problem could be due to this particular armstoneA9?




  • The bad block is also reported in the log, "Bad eraseblock 484 at 0x000003c80000". But I do not think that this causes the problem. I believe the problem is the DMA timeout. In these cases, Linux fails to initialize the NAND, most probably because of a problem in the GPMI periphery (e.g. NAND DMA or NAND BCH engine). Access to NAND is done by setting up a sequence of DMA requests and then the sequence is executed, doing the real transfer from/to NAND flash. In these cases, the DMA sequence does not report success (or even completion). In other words, all accesses to NAND fail.


    At the beginning, the first task of the NAND driver is to build a map of all bad blocks in RAM. This is done by accessing the frist page of each block in NAND. But as every access fails, the algorithm seems to mark every block as bad in the RAM table. This results in all those bad blocks that you see in this case. It does not have bad block markers on the NAND flash itself, it just recognizes all blocks as bad in the RAM table. When NAND access works again in a later boot process, again only the one real bad block is reported.


    Unfortunately I have no idea why the initialization in Linux fails. Initialization in U-Boot did work, otherwise it could not have loaded and started Linux in the first place. Have you changed anything in the MTD/NAND code in your Linux version? Did you change any pad settings (driver strengths or similar) in the device tree regarding NAND flash? Or did you add a Realtime patch to Linux? Anything else?


    Your F&S Support Team

    F&S Elektronik Systeme GmbH
    As this is an international forum, please try to post in English.
    Da dies ein internationales Forum ist, bitten wir darum, Beiträge möglichst in Englisch zu verfassen.

  • No, I have not changed anything. My buildroot image works perfectly and I only have the problem on one particular machine.


    I think it is likely that this armstoneA9 is damaged. What do you think?


    In case it happens in the future on a different armstoneA9, would it be a good idea to activate the uboot-linux watchdog?