NetDCU 14/10 Operating System Installation damaged

  • Hello


    We have some issues with various NetDCU10 and NetDCU14 boards. It has been observed on about 10 Units out of 200.


    We believe that the system failure can be traced to corrupt OS files (e.g. Windows Kernel and/or .dll Files). Following facts lead us to this reason:
    - Resetting the registry to factory default, does never solve the issues
    - Reinstalling the Executable of our Application does not help
    - The only thing that helps is re-flashing the Windows Kernel


    We use these Versions:
    - XIPV210_CE6Core_130605.bin (NetDCU14)
    - NK10_CF35_090603.bin (NetDCU10)


    We have observed the following symptoms:
    - msxml.dll was damaged. XML parser could not be loaded anymore.
    - OS did not boot anymore, NetDCU was stuck at our Bootscreen.
    - Our application did not launch anymore. OS freezes after calling CreateFile() with valid parameters
    - OS freezes when accessing SD-Card
    - OS response time got very slow after some minutes of operation


    All these problems were gone only after re-flashing the Kernel.


    Are the following assumptions correct?
    - I believe that the OS is saved in the same physical NAND flash as the FFSDISK?
    - The OS is saved in a non-mounted space of the NAND memory and during boot-up, the bootloader creates a fresh copy of the OS to a Windows mounted Drive?
    - This means that Windows always boots with fresh OS files?


    At the moment we can not locate the problem, but we have the following theories:
    a) A bug in the Operating System or in a Device Driver messes up random files on the NAND flash?!
    b) Electromagnetic interference could change the NAND flash content?!
    c) Eventually memory can be corrupted by improper shutdown of the NetDCU, e.g. PowerOff during memory access?


    Could you please comment on this and share your experience with similar cases?

  • Hello,


    please send us the boards as RMA, we will invastigate. It is difficult to give a remote opinion.


    Note, write to flash while power off may demage the FAT! F3S catch such cases.


    Quote

    Are the following assumptions correct?
    - I believe that the OS is saved in the same physical NAND flash as the FFSDISK?
    - The OS is saved in a non-mounted space of the NAND memory and during boot-up, the bootloader creates a fresh copy of the OS to a Windows mounted Drive?
    - This means that Windows always boots with fresh OS files?

    Right.

    Quote


    At the moment we can not locate the problem, but we have the following theories:
    a) A bug in the Operating System or in a Device Driver messes up random files on the NAND flash?!
    b) Electromagnetic interference could change the NAND flash content?!
    c) Eventually memory can be corrupted by improper shutdown of the NetDCU, e.g. PowerOff during memory Access?

    Right, and it would be good to check your cards in detail.

    F&S Elektronik Systeme GmbH
    As this is an international forum, please try to post in English.
    Da dies ein internationales Forum ist, bitten wir darum, Beiträge möglichst in Englisch zu verfassen.

  • Hello,


    we just download new System SW with enhanced ECC (EBoot114, Kernel V109).
    Do not hesitate to send us board as RMA or for investigation.

    F&S Elektronik Systeme GmbH
    As this is an international forum, please try to post in English.
    Da dies ein internationales Forum ist, bitten wir darum, Beiträge möglichst in Englisch zu verfassen.

  • We are currently testing the new System SW with enhanced ECC, thanks.


    I have some more requests regarding this issue:


    Since we have delivered many devices with the old System SW, we need a simple way to update the Kernel and EBoot. So, my question is, if there is a way to update the System SW without using the Serial Interface and the USBLoader. We would prefer a solution that uses an Executable for Windows CE and the binary Files for the Flash, that automatically updates all System SW.
    What options do we have for updating the NetDCU14?


    Further we also have problems, that our Executable, which is the Device Firmware, gets damaged from time to time. Will the improved ECC also improve the Data Integrity on the FFSDISK?


    Another question is, if the ECC detects an Error when loading the Kernel, if the Bootloader/Kernel will write the corrected version of the Kernel back to the Read-Only Kernel Storage area? Otherwise Errors can accumulate over time...

  • Hello,
    according ECC datails my college will answer on Monday.


    In general you can update NBoot, EBoot and Kernel from OS. But in this case i am not sure. Two things can not be handled while the update repartitioning and internal changes of the memory layout. In this boths cases the flash have to be complete formated.


    Which loader version are installed?


    You can easy check if it works in your office. In any case you should check and handle this by your "update SW" before send it to your customers.


    Update EBoot: ndcucfg comand "boot write <filename>"
    Update NBoot: ndcucfg comand "nboot write <filename>"
    Update Kernel: we offer a tool "Update Tool" for update kernel and applcation. Contact sales@fs-net.de.


    PS: If one card has same problem several times you should send it as RMA so that we can exclude a HW problem.

    F&S Elektronik Systeme GmbH
    As this is an international forum, please try to post in English.
    Da dies ein internationales Forum ist, bitten wir darum, Beiträge möglichst in Englisch zu verfassen.

    Edited once, last by fs-support_ZU ().

  • Hello Marc,


    with improved driver we handles followed flash memory problem.


    When flash memory is read many many times, some bits may flip after a long period of time. These are called read disturbs. To avoid getting read errors from these toggled bits, the NAND flash must support a error correction via ECC. By default we are using an ECC that is capable of detecting and correcting up to 4 bit errors per 2K page. If a page shows many bitflips, the whole block is erased and re-written. This is a known procedure to revert bitflips caused by read disturbs.


    Remember the strategy isn't safe if power fails during block refresh.


    Quote

    Further we also have problems, that our Executable, which is the Device Firmware, gets damaged from time to time. Will the improved ECC also improve the Data Integrity on the FFSDISK?


    Yes, but only data integrity of written page (block). The driver don't handle atomic read/write file operations . This is a function of a file system. E.g. our file system F3S guaranteed atomic R/W operations.


    Quote

    Another question is, if the ECC detects an Error when loading the Kernel, if the Bootloader/Kernel will write the corrected version of the Kernel back to the Read-Only Kernel Storage area? Otherwise Errors can accumulate over time...


    Yes, in this case a page (block) will be refreshed. Read only is a capabiliaty of a file system.

    F&S Elektronik Systeme GmbH
    As this is an international forum, please try to post in English.
    Da dies ein internationales Forum ist, bitten wir darum, Beiträge möglichst in Englisch zu verfassen.

  • Thanks for the information. Unfortunately I have bad news:

    • There are more and more support cases. More and more systems delivered with NetDCU14 are failing. All problems point to the data reliability of the Flash Memory. We face a failure rate of the NetDCU14 of about 20% !! More will follow for sure. There are various symptoms:
      • OS-Booting failure
      • Corrupt system files
      • Corrupt configuration files
      • Corrupt executable (our firmware)
      • Corrupt Registry
    • I started a long term test using three NetDCU14 running the latest EBoot and Kernel (ebootv210_114.nb0 and XIPV210_CE6Core_CF2_141119.bin). But long term testing is not really necessary, since all three systems failed within one day. RELIABILTY OF THE NETDCU14 SEEMS FAR WORSE NOW! Symptoms are different from what we have seen up to now. Instead of the issues listed above, we now face complete loss of files and registry settings. Details for the 3 systems:
      • Failure after 4 hours (~720 reboots):

        • Registry is back to default
        • All files on FFSDISK lost
      • Failure after 14 hours (~2500 reboots):
        • Registry is back to default
      • Failure after 20 hours (~3600 reboots):
        • Registry is back to default
        • All files on FFSDISK lost
      • Test procedure is as following:
        • Reboot of NetDCU every 20s (Power Off/On)
        • System is 16s on and 4s off.
        • 16s are enough to finish booting of WinCE and our Executable completely.
    • The problems do not seem to be related to any specific Board/Hardware. Since so many NetDCUs are involved...


    We need a working solution AS SOON AS POSSIBLE. Since we spend a lot of money replacing NetDCU14 around the world.