CAN - hard_xmit

  • I use both can interfaces on the netdcu. After some time, my application blocks and
    the error messages
    "mcp251x spi1.1: hard_xmit called while tx busy"
    is posted (100 times and more).


    I use the code out of the can example files.

  • Hello !


    Unfortunately I can confirm that we sometimes observe the same problem in our application. We're unsure whether or not this is a problem triggered by a CAN bus wiring failure.


    Worse than that is in case this situation arises it seems that the whole system is under such a high workload it is no longer responding. We managed to enter via the debugging console and found out that sometimes the system stabilizes after pulling the CAN connectors.


    I saw that in newer kernel versions the driver mcp251x contains at least one bugfix addressing a silicon bug in MCP2515 Rev. B which is described as "repeated frame problem". Unfortunately I was unable to disclose the silicon version of the MCP2515 used in the NetDCU14 design (The errata document is dated 2007 so I would expect the NetDCU14 uses an already fixed version of the MCP2515).


    UPDATE: According to the errata document (http://ww1.microchip.com/downloads/en/DeviceDoc/80179g.pdf) Revision B4 is the latest revision and in production since 2005 so I would recommend to update the driver (even though I don't expect that this solves our problem).


    Regards,


    Volker

  • Hello again !


    The last days we tried a backported version of the driver mcp251x and it seems to solve our problem. So I strongly recommend to try this too. I would attach our patch for the Multiplatform Linux V2.0 but I can't find any option in this forum to attach anything. So I placed the patch here : http://pastebin.com/yDcf660A.


    Regards


    Volker

  • The patch you are giving here differs only in four points


    • It uses spi_get_drvdata() instead of dev_get_drvdata() to get hold of the private data pointer. This does not matter at all.


    • It tries to call some LED stuff. This does not solve any problems with the communication. To the contrary, as it is additional stuff, it can even slow down the communication and can make things even worse.


    • In function mcp251x_open(), it initializes the IRQ flags a little bit different. However it is still very important that you have set the irq_flags value in your platform data (in arch/arm/mach-s5pv210/mach-netdcu14.c) correctly to IRQF_TRIGGER_LOW | IRQF_ONESHOT. This is required in both versions, yours and ours. So no semantic difference here.


    • In function mcp251x_hw_tx(), it uses a slightly different version to trigger the transmission. This is what is meant with "solving the repeated frame problem". We can see if we can include this part in our next version.


    Other than that, the driver is identical to our version.


    Your F&S Support Team

    F&S Elektronik Systeme GmbH
    As this is an international forum, please try to post in English.
    Da dies ein internationales Forum ist, bitten wir darum, Beiträge möglichst in Englisch zu verfassen.

  • Sorry but I don't get the point of your post.


    There definitely is a problem with the 'original' driver and you cannot get a better proof for that than observations from two different customers. If you are not able to reproduce these problems I can only assume that your test setup differs from ours (both CAN devices are in use, 16 participants communicating with each other). Perhaps MWeber can sketch his test setup and we can try to find the common ground.


    Regards


    Volker

  • There are two points:

    • Do you use the correct initialization that is required? This is not in the driver file, it is in the mach-netdcu14.c file. That's what I said in point 3. Can you show me the structure that you pass to the CAN initialization call, or even better the whole initialization part of the CAN controller? This part I am still missing and I still assume that there is something different to our release V2.0. This could be the reason why it is working here in our version but not in your version.
    • Here in this thread you are telling me that the driver *is* working after your modification. So I was wondering what part of your code could fix any problems. And I did not find much. This was what I explained in points 1 to 3 of my last post. The only thing that *could* have an influence on the communication was point 4. So if your code actually solves a problem, than we can add this to our next release. However, as you already have a version of this driver running, and other people can download your version here, too, I don't see any immediate need for action. The solution is already available here in this thread.

    Now you are again talking about a problem with the driver. What is true? Does your version solve the problem or not?


    Your F&S Support Team

    F&S Elektronik Systeme GmbH
    As this is an international forum, please try to post in English.
    Da dies ein internationales Forum ist, bitten wir darum, Beiträge möglichst in Englisch zu verfassen.

    Edited once, last by fs-support_HK ().

  • We didn't change anything in the section of the file mach-netdcu14.c regarding the CAN initialization. The relevant section is as follows



    Yes, the backported driver solves the problem we had with the driver included in the multiplatform release V2.0. There are at least two changes you didn't mention but that may address critical flaws :


    threaded irqs must provide a primary handler or set the IRQF_ONESHOT flag (I'm not sure whether the data handling is based upon threaded IRQs) :
    http://git.kernel.org/cgit/lin…3b3b38429da6f70a913f89b04


    repeated frame problem :
    https://lkml.org/lkml/2012/9/16/120


    But I definitely don't know exactly what change finally solved our problems with the driver included in the multiplatform release V2.0. But I can say that - after applying the patch mentioned above - our NetDCU14 application works at least stable when attached to the CAN bus. So this solves one problem but this doesn't solve the steady loss of CAN messages I mentioned in some of my other posts. Please don't mix these two problems.


    Quote

    However, as you already have a version of this driver running, and other people can download your version here, too, I don't see any immediate need for action. The solution is already available here in this thread.


    I would expect that other people aren't forced to collect all those patches floating around in this forum to finally get a working multiplatform release V2.0 +, especially in case these other people spent a lot of money to buy hundreds of your boards.


    Regards


    Volker


  • That's the interesting part. And yes it is set as active low, which is correct.


    Quote

    Yes, the backported driver solves the problem we had with the driver included in the multiplatform release V2.0. There are at least two changes you didn't mention but that may address critical flaws :


    threaded irqs must provide a primary handler or set the IRQF_ONESHOT flag (I'm not sure whether the data handling is based upon threaded IRQs) :
    http://git.kernel.org/cgit/lin…3b3b38429da6f70a913f89b04


    That's what I mentioned as point 3, the different initialization of the flags value in function mcp251x_open(). In fact when using the above init code from mach-netdcu14.c, the flags will be set to exactly the same value with the old and new code, i.e. IRQF_TRIGGER_LOW | IRQF_ONESHOT. However if the .flags value would be missing in the init code, it would be set to IRQF_TRIGGER_FALLING | IRQF_ONESHOT, which would be wrong.


    Quote

    repeated frame problem :
    https://lkml.org/lkml/2012/9/16/120


    This is what I listed as point 4, the different version to trigger the transmission in function mcp251x_hw_tx().


    Quote

    But I definitely don't know exactly what change finally solved our problems with the driver included in the multiplatform release V2.0. But I can say that - after applying the patch mentioned above - our NetDCU14 application works at least stable when attached to the CAN bus.


    OK. So the version posted by you does actually solve the problem. As your error message was "mcp251x spi1.1: hard_xmit called while tx busy", i.e. the transmit code was called repeatedly, I would assume that the point 4, that is commented in the source code with "avoid repeated frame problem", may actually be the part that solves this problem. At least it sounds reasonable.


    Quote

    I would expect that other people aren't forced to collect all those patches floating around in this forum to finally get a working multiplatform release V2.0 +


    Yes, we know that this is unfortunate, and we plan to do some release that collects all the patches so far. But at it has to fit into our schedule of doing other releases for other boards and there are also very important things to do. So it still has to wait a little while.


    Your F&S Support Team

    F&S Elektronik Systeme GmbH
    As this is an international forum, please try to post in English.
    Da dies ein internationales Forum ist, bitten wir darum, Beiträge möglichst in Englisch zu verfassen.