I use both can interfaces on the netdcu. After some time, my application blocks and
the error messages
"mcp251x spi1.1: hard_xmit called while tx busy"
is posted (100 times and more).
I use the code out of the can example files.
I use both can interfaces on the netdcu. After some time, my application blocks and
the error messages
"mcp251x spi1.1: hard_xmit called while tx busy"
is posted (100 times and more).
I use the code out of the can example files.
Hello !
Unfortunately I can confirm that we sometimes observe the same problem in our application. We're unsure whether or not this is a problem triggered by a CAN bus wiring failure.
Worse than that is in case this situation arises it seems that the whole system is under such a high workload it is no longer responding. We managed to enter via the debugging console and found out that sometimes the system stabilizes after pulling the CAN connectors.
I saw that in newer kernel versions the driver mcp251x contains at least one bugfix addressing a silicon bug in MCP2515 Rev. B which is described as "repeated frame problem". Unfortunately I was unable to disclose the silicon version of the MCP2515 used in the NetDCU14 design (The errata document is dated 2007 so I would expect the NetDCU14 uses an already fixed version of the MCP2515).
UPDATE: According to the errata document (http://ww1.microchip.com/downloads/en/DeviceDoc/80179g.pdf) Revision B4 is the latest revision and in production since 2005 so I would recommend to update the driver (even though I don't expect that this solves our problem).
Regards,
Volker
Hello again !
The last days we tried a backported version of the driver mcp251x and it seems to solve our problem. So I strongly recommend to try this too. I would attach our patch for the Multiplatform Linux V2.0 but I can't find any option in this forum to attach anything. So I placed the patch here : http://pastebin.com/yDcf660A.
Regards
Volker
The patch you are giving here differs only in four points
Other than that, the driver is identical to our version.
Your F&S Support Team
Sorry but I don't get the point of your post.
There definitely is a problem with the 'original' driver and you cannot get a better proof for that than observations from two different customers. If you are not able to reproduce these problems I can only assume that your test setup differs from ours (both CAN devices are in use, 16 participants communicating with each other). Perhaps MWeber can sketch his test setup and we can try to find the common ground.
Regards
Volker
There are two points:
Now you are again talking about a problem with the driver. What is true? Does your version solve the problem or not?
Your F&S Support Team
We didn't change anything in the section of the file mach-netdcu14.c regarding the CAN initialization. The relevant section is as follows
Yes, the backported driver solves the problem we had with the driver included in the multiplatform release V2.0. There are at least two changes you didn't mention but that may address critical flaws :
threaded irqs must provide a primary handler or set the IRQF_ONESHOT flag (I'm not sure whether the data handling is based upon threaded IRQs) :
http://git.kernel.org/cgit/lin…3b3b38429da6f70a913f89b04
repeated frame problem :
https://lkml.org/lkml/2012/9/16/120
But I definitely don't know exactly what change finally solved our problems with the driver included in the multiplatform release V2.0. But I can say that - after applying the patch mentioned above - our NetDCU14 application works at least stable when attached to the CAN bus. So this solves one problem but this doesn't solve the steady loss of CAN messages I mentioned in some of my other posts. Please don't mix these two problems.
QuoteHowever, as you already have a version of this driver running, and other people can download your version here, too, I don't see any immediate need for action. The solution is already available here in this thread.
I would expect that other people aren't forced to collect all those patches floating around in this forum to finally get a working multiplatform release V2.0 +, especially in case these other people spent a lot of money to buy hundreds of your boards.
Regards
Volker
Display MoreCode
- static struct mcp251x_platform_data mcp251x_info = {
- .oscillator_frequency = 20 * 1000 * 1000, /* 20MHz */
- .irq_flags = IRQF_TRIGGER_LOW | IRQF_ONESHOT, /* Low, not falling */
- .board_specific_setup = netdcu14_mcp251x_setup,
- .transceiver_enable = NULL,
- .power_enable = NULL,
- };
- static struct mcp251x_platform_data mcp251x_info1 = {
- .oscillator_frequency = 20 * 1000 * 1000, /* 20MHz */
- .irq_flags = IRQF_TRIGGER_LOW | IRQF_ONESHOT, /* Low, not falling */
- .board_specific_setup = netdcu14_mcp251x_setup,
- .transceiver_enable = NULL,
- .power_enable = NULL,
- };
That's the interesting part. And yes it is set as active low, which is correct.
QuoteYes, the backported driver solves the problem we had with the driver included in the multiplatform release V2.0. There are at least two changes you didn't mention but that may address critical flaws :
threaded irqs must provide a primary handler or set the IRQF_ONESHOT flag (I'm not sure whether the data handling is based upon threaded IRQs) :
http://git.kernel.org/cgit/lin…3b3b38429da6f70a913f89b04
That's what I mentioned as point 3, the different initialization of the flags value in function mcp251x_open(). In fact when using the above init code from mach-netdcu14.c, the flags will be set to exactly the same value with the old and new code, i.e. IRQF_TRIGGER_LOW | IRQF_ONESHOT. However if the .flags value would be missing in the init code, it would be set to IRQF_TRIGGER_FALLING | IRQF_ONESHOT, which would be wrong.
Quoterepeated frame problem :
https://lkml.org/lkml/2012/9/16/120
This is what I listed as point 4, the different version to trigger the transmission in function mcp251x_hw_tx().
QuoteBut I definitely don't know exactly what change finally solved our problems with the driver included in the multiplatform release V2.0. But I can say that - after applying the patch mentioned above - our NetDCU14 application works at least stable when attached to the CAN bus.
OK. So the version posted by you does actually solve the problem. As your error message was "mcp251x spi1.1: hard_xmit called while tx busy", i.e. the transmit code was called repeatedly, I would assume that the point 4, that is commented in the source code with "avoid repeated frame problem", may actually be the part that solves this problem. At least it sounds reasonable.
QuoteI would expect that other people aren't forced to collect all those patches floating around in this forum to finally get a working multiplatform release V2.0 +
Yes, we know that this is unfortunate, and we plan to do some release that collects all the patches so far. But at it has to fit into our schedule of doing other releases for other boards and there are also very important things to do. So it still has to wait a little while.
Your F&S Support Team