From UART to EtherCAT: When a Robot Outgrows Serial Motor Control
Hands-on · Robotics software · ~16 min read
Serial motor control is the right answer more often than automation engineers like to admit.
A UART link, a CRC, a firmware watchdog, and a small ros2_control hardware interface can carry a real robot a long way. For a differential base, a pan-tilt head, a gripper, or a two-axis lab fixture, it is cheap, inspectable, and easy to debug with a logic analyzer.
It also has a ceiling.
The ceiling usually does not arrive as a dramatic failure. It arrives as a collection of field symptoms: the robot drives cleanly at 20 Hz and becomes erratic at 100 Hz; one axis starts a few milliseconds before another; a USB adapter adds more latency than the whole control period; the firmware watchdog saves the robot, but the host never quite knows what state the drive is in. At that point the problem is no longer "how do I send velocity commands?" The problem is distributed motion control.
That is when UART stops being a convenient cable and starts being the architecture.
This article is the sequel to the ros2_control hardware-interface article: same robot, same controller-manager loop, but a different class of drive underneath it. We will start with the serial timing budget, show the exact point where it breaks, and then map the same read() -> update() -> write() contract onto EtherCAT and CiA 402 / DS402 drives.
The architecture shift
With a serial motor board, the host usually owns one packet stream and the firmware owns the inner loops. With EtherCAT, the host becomes a real-time master and each drive becomes a synchronized process-data endpoint.
The software boundary does not move much. In both designs, ros2_control still sees state interfaces and command interfaces. The difference is what your hardware layer must guarantee. A serial interface is a translator. An EtherCAT interface is a cyclic real-time participant.
1. The serial timing budget
The useful serial calculation is brutally simple. For an 8N1 UART, each byte costs ten bit times: one start bit, eight data bits, one stop bit. At baud rate B, the bus time for a control cycle is:
T_bus = 10 * (N_tx + N_rx) / B
where:
N_txis the number of host-to-drive bytes in the cycle.N_rxis the number of drive-to-host bytes in the cycle.Bis the UART baud rate in bit/s.
If your differential base sends a 13-byte command frame and receives a 21-byte state frame at 115200 baud:
T_bus = 10 * (13 + 21) / 115200
= 2.95 ms
That looks fine at 50 Hz:
T_loop = 20 ms
U_bus = T_bus / T_loop = 14.8%
It is still workable at 100 Hz:
T_loop = 10 ms
U_bus = 29.5%
But this is only the raw line time. It does not include USB latency timers, host scheduling jitter, parsing time, retransmission policy, or the fact that many robots are not two axes forever. UART is asynchronous and point-to-point: both sides must agree on baud rate and framing, and there is no inherent shared clock across multiple devices. That is a feature for simplicity and a limitation for synchronized motion.
Field heuristic:
U_bus = T_bus / T_loop < 0.3
As a field heuristic, keep serial bus utilization below roughly 30% when the link is in the control path. Above that, every hiccup consumes margin you were implicitly counting on.
2. The first failure mode: latency becomes part of the plant
Control engineers talk about sample time because the controller is designed around it. Embedded engineers talk about baud rate because the wire is slow. Robotics engineers have to care about both.
The host loop has its own latency stack:
The phase penalty of delay is:
phi_delay = -omega_c * T_delay
or in degrees:
phi_delay_deg = -360 * f_c * T_delay
At a modest 20 Hz outer-loop bandwidth, 5 ms of extra delay costs:
phi_delay_deg = -360 * 20 * 0.005 = -36 deg
That is not bookkeeping. That is stability margin.
This is why a robot can behave perfectly on the bench and then lose crispness when the host CPU governor changes, the USB-serial driver buffers a packet, or a non-real-time callback lands on the same executor. Real-time ROS 2 systems make response latency a full-stack problem: hardware, firmware, operating system, libraries, and application code all contribute to the path. The fix is not "use a faster baud rate" if the architecture still lets the command arrive at a non-deterministic time.
3. What serial is still good at
Do not upgrade the fieldbus because the word "EtherCAT" sounds more professional. Upgrade because the requirements changed.
UART is still a good architecture when:
- There are one or two axes.
- The host loop is 50-100 Hz.
- The drive firmware owns the current loop and velocity loop.
- The command is low-rate: velocity, position target, or mode change.
- A firmware watchdog can make loss of host communication safe.
- There is no hard requirement that multiple axes latch commands on the same clock edge.
The serial firmware should still be serious:
frame = sync | length | command | payload | crc16
and the safety rule is:
T_watchdog < T_hazard
T_watchdog > N_miss * T_loop
where T_hazard is the time before uncontrolled motion becomes unacceptable, and N_miss is the number of missed host cycles you intentionally tolerate. A 100 ms firmware watchdog is a reasonable order of magnitude for many mobile bases. It is not a substitute for E-stop or safe torque off; it is the board-level answer to "the host stopped talking."
Watchdog warning:
Feed the watchdog only from code that proves the control path is alive. Servicing it from an unconditional timer interrupt defeats the purpose: the firmware can be wedged while the watchdog continues to be reset.
Between UART and EtherCAT there are middle-ground options such as RS-485 multidrop, CAN/CANopen, and CAN-FD. They can be excellent when robustness and multi-node wiring matter more than sub-millisecond synchronized motion.
When not to use EtherCAT
Do not migrate to EtherCAT if the robot has one or two axes, low-bandwidth commands, a good firmware watchdog, no synchronization requirement, and the current serial link has measured timing margin. EtherCAT adds commissioning complexity, ESI/PDO configuration, real-time master requirements, drive-state handling, and more failure modes to diagnose.
4. The signs you have outgrown UART
You have outgrown serial motor control when the problems become architectural rather than local:
Symptom
Works at 20 Hz, fails at 100 Hz
Likely cause: Bus and host latency consume the loop period.
EtherCAT changes: Cyclic process data at a defined master period.
Symptom
Axes do not start together
Likely cause: No shared command latch time.
EtherCAT changes: Distributed clocks and synchronized PDO exchange.
Symptom
One packet loss causes a visible bump
Likely cause: State and command are not decoupled cleanly.
EtherCAT changes: Drive state machine plus cyclic setpoint policy.
Symptom
Debugging requires vendor-specific commands per drive
Likely cause: No standard object model.
EtherCAT changes: CiA 402 object dictionary, controlword, statusword.
Symptom
Host code grows per-axis special cases
Likely cause: Each board has its own state semantics.
EtherCAT changes: DS402 common state machine.
Symptom
Velocity estimation is noisy on the host
Likely cause: Host differentiates encoder counts at low rate.
EtherCAT changes: Drive reports actual position, velocity, and torque each cycle.
Symptom
The robot needs many axes
Likely cause: Point-to-point links do not scale cleanly.
EtherCAT changes: One cyclic fieldbus frame touches all nodes.
The decisive question is not "can UART send the bytes?" It usually can. The question is whether the control system needs a shared, cyclic, multi-axis time base.
5. What EtherCAT actually buys
EtherCAT is not magic Ethernet. It is a deterministic process-data mechanism designed around short cycles and low synchronization jitter. The EtherCAT Technology Group describes the technology focus as short cycle times, low jitter, and distributed clocks for accurate synchronization. In practice, the important change is that the master exchanges process data with all drives in a cycle, and the drives can synchronize their local actions to a distributed clock instead of the instant a packet happens to arrive.
That changes the failure modes. You are no longer asking, "Did I parse the latest ASCII line?" You are asking:
- Did the master complete the cycle?
- Did every drive receive fresh process data?
- Is the drive still
Operation Enabled? - Is the statusword reporting a warning, following error, quick-stop, or fault?
- Are the PDOs mapped to the control mode you think you are using?
- Is the distributed-clock configuration actually active?
DC being "enabled" in the configuration is not enough; verify the drive's actual synchronization state, Sync0 period/shift, and the measured working-counter and cycle-miss diagnostics.
The work moves from framing bytes to commissioning motion.
6. DS402: the common language underneath the drives
CiA 402, often called DS402 in EtherCAT conversations, is the standardized drive profile that makes a servo drive look like a servo drive instead of a vendor-specific box. It defines a finite state machine, operating modes, and a common object dictionary. The host writes a controlword; the drive reports a statusword. Process data maps target values and actual values into PDOs.
Practical rule: use PDOs for cyclic real-time process data; use SDOs for configuration, commissioning, and diagnostics outside the hot path.
Simplified state flow:
Typical control sequence:
Shutdown 0x0006
Switch on 0x0007
Enable operation 0x000F
The key operating modes for robot builders are:
DS402 code 1
PP, Profile Position
Master sends: Target position and profile parameters.
Drive closes: Trajectory/profile internally.
DS402 code 3
PV, Profile Velocity
Master sends: Target velocity and profile parameters.
Drive closes: Velocity profile internally.
DS402 code 4
PT, Profile Torque
Master sends: Target torque and profile parameters.
Drive closes: Torque profile internally.
DS402 code 8
CSP, Cyclic Synchronous Position
Master sends: Position setpoint every cycle.
Drive closes: Inner loops, synchronized.
DS402 code 9
CSV, Cyclic Synchronous Velocity
Master sends: Velocity setpoint every cycle.
Drive closes: Inner loops, synchronized.
DS402 code 10
CST, Cyclic Synchronous Torque
Master sends: Torque setpoint every cycle.
Drive closes: Torque/current loops.
DS402 code 6
HM, Homing
Master sends: Homing command and parameters.
Drive closes: Homing sequence.
For robotics, the cyclic synchronous modes are usually the point. The master sends a new setpoint each cycle, and the drive applies it in the configured synchronous timing model. Profile modes are excellent for point-to-point industrial moves; cyclic modes are the natural fit when a robot controller owns the trajectory.
7. Mapping DS402 into ros2_control
The ros2_control contract is still narrow:
read() -> fill state interfaces
update() -> controllers compute command interfaces
write() -> push command interfaces to hardware
For a serial base, write() might serialize two wheel velocities. For an EtherCAT robot, write() writes PDO targets for every drive. The implementation changes from "send packet" to "stage process data for the next bus cycle."
Design rule:
Choose the DS402 mode from where you want the control authority to live:
- Use CSP when the robot controller owns trajectory generation and sends position setpoints.
- Use CSV when the robot controller owns velocity commands and the drive owns velocity regulation.
- Use CST when a model-based controller owns torque and the drive is mostly a torque/current actuator.
- Use profile modes when the drive should generate the trajectory from point-to-point commands.
Do not choose CSP because it sounds advanced. Choose it because your master can produce setpoints at a reliable cycle time and your trajectory generator belongs above the drive.
8. The cycle-time equation
EtherCAT does not remove timing math. It changes which timing matters.
The cyclic loop must satisfy:
T_compute + T_bus + T_margin <= T_cycle
where:
T_computeis the host-side control computation and EtherCAT stack time.T_busis process-data exchange time.T_marginis deliberate slack for scheduling variation and diagnostics.T_cycleis the configured EtherCAT control period.
For a 1 kHz loop:
T_cycle = 1 ms
That leaves no room for casual software. A 1 kHz EtherCAT loop on a non-real-time host can fail for the same reason a 100 Hz UART loop fails: the bus is not the only source of latency. ROS 2 real-time design guidance makes the uncomfortable point explicit: application code, libraries, OS scheduling, firmware, and hardware all contribute to response latency.
Migration warning:
EtherCAT gives deterministic cyclic process-data exchange at the fieldbus level. It does not automatically make Linux, your user-space master, your trajectory generator, or your ros2_control hardware interface deterministic.
9. Watchdogs move down and up
Serial systems usually have one obvious watchdog: "if no command frame arrives within T_watchdog, stop the motors."
EtherCAT systems have several layers:
The DS402 state machine gives you more precise language for failure handling:
- A stale command is not the same as a drive fault.
- A following error is not the same as a bus dropout.
- A quick stop is not the same as disabling torque.
- A fault reset is an explicit state transition, not "send the packet again."
Design rule:
Every layer should fail in the safest direction, but only one layer should own each decision. Let the controller handle command timeout, the master handle bus-cycle loss, the drive handle local following/fault limits, and the safety chain handle hazardous energy.
10. A practical migration path
Do not port the whole robot in one step. Port the contract.
Step 1: Freeze the interfaces
Write down the existing ros2_control interfaces:
left_wheel_joint:
command: velocity
state: position, velocity
right_wheel_joint:
command: velocity
state: position, velocity
or for an arm:
joint_i:
command: position
state: position, velocity, effort
The controller should not care whether the hardware layer speaks UART or EtherCAT.
Step 2: Choose DS402 mode per joint
For a differential base, CSV often maps cleanly: diff_drive_controller produces wheel velocity commands, and the drive closes the velocity loop.
For a multi-axis arm under joint_trajectory_controller, CSP is usually the more natural first port: the trajectory controller produces position targets, and the drive receives cyclic position setpoints.
For high-end model-based control, CST is attractive, but only if the robot controller owns the dynamics and your safety story is mature.
Step 3: Commission one drive outside ROS
Before writing the hardware interface, prove:
- The drive reaches
Operation Enabled. - The expected PDOs are mapped.
- The selected mode appears in the mode display.
- Actual position changes when the motor moves.
- Following-error and quick-stop behavior are understood.
- The process-data watchdog does what you think it does.
Step 4: Build the SystemInterface
The hot path should look boring:
return_type read(...) {
master.process();
for (auto& axis : axes_) {
axis.position = counts_to_rad(axis.pdo.actual_position);
axis.velocity = counts_to_rad_s(axis.pdo.actual_velocity);
axis.status = axis.pdo.statusword;
}
return all_axes_healthy() ? return_type::OK : return_type::ERROR;
}
return_type write(...) {
for (auto& axis : axes_) {
axis.pdo.controlword = axis.next_controlword();
axis.pdo.target_position = rad_to_counts(axis.command_position);
}
return return_type::OK;
}
The non-boring work belongs outside the hot path: parsing ESI files, validating PDO layout, resolving drive states, logging faults, and exposing diagnostics.
Step 5: Measure before driving
Use the same discipline as the serial bring-up:
- Spin an axis by hand and verify
/joint_states. - Command a single low-speed move.
- Verify signs and units.
- Log cycle time and missed cycles.
- Pull the cable and observe the fault path.
- Trigger quick stop intentionally.
- Reboot a drive and confirm the state machine recovers only when commanded.
11. Symptom table
Symptom
Drive never leaves Switch On Disabled
Almost always: Controlword sequence wrong, safety input open, or fault latched.
Fix: Decode statusword bits; verify STO/safety chain before software.
Symptom
Drive reaches Operation Enabled but does not move
Almost always: Mode not active, PDO target not mapped, or command units wrong.
Fix: Check mode display, PDO map, and scale factors.
Symptom
First move jumps
Almost always: Stale target position or missing setpoint initialization.
Fix: On activate, seed target = actual before enabling motion.
Symptom
Axes start slightly apart
Almost always: Distributed clocks not configured, not active, or not actually synchronized.
Fix: Enable DC sync, then verify the drive sync state, Sync0 period/shift, working counter, and cycle-miss diagnostics.
Symptom
Works at 250 Hz, faults at 1 kHz
Almost always: Host loop or EtherCAT stack misses cycle budget.
Fix: Measure T_compute; lower cycle rate or move to RT kernel/core isolation.
Symptom
Following error at acceleration changes
Almost always: Trajectory too aggressive or feedforward missing.
Fix: Limit jerk/acceleration; add velocity/acceleration feedforward where appropriate.
Symptom
Robot reports healthy while drive is warning
Almost always: Statusword decoded incompletely.
Fix: Promote warnings, internal limits, and following-error bits into diagnostics.
Symptom
Recovery after fault is unpredictable
Almost always: Fault reset treated as a packet, not a state transition.
Fix: Implement explicit DS402 state handling.
12. The decision
UART is a good way to make a motor controller talk to ROS. EtherCAT is a good way to make a set of drives behave like a machine.
The migration is justified when at least one of these becomes true:
- You need more axes than point-to-point serial links can manage cleanly.
- The loop period is short enough that bus and host jitter are now control variables.
- Multiple axes must latch commands at the same effective time.
- You need standard drive states, faults, quick-stop behavior, and diagnostics.
- You want to swap drive vendors without rewriting the control contract.
- The robot controller, not the drive, must own the trajectory.
The final design rule:
Use UART while the drive is a smart peripheral.
Use EtherCAT when the drives are distributed parts of one motion controller.
That distinction is the whole article. A serial link sends commands. A fieldbus coordinates motion.
This article is based on practical embedded motor-control experience, ROS 2 hardware-interface design, CiA 402 drive concepts, and EtherCAT fieldbus architecture. Always verify PDO maps, statusword bits, timing behavior, and controller interfaces against the specific drive and ROS 2 distribution in front of you.