From UART to EtherCAT: When a Robot Outgrows Serial Motor Control

Social preview artwork for From UART to EtherCAT

Hands-on · Robotics software · ~16 min read

Serial motor control is the right answer more often than automation engineers like to admit. A UART link, a CRC, a firmware watchdog, and a small ros2_control hardware interface can carry a real robot a long way. For a differential base, a pan-tilt head, a gripper, or a two-axis lab fixture, it is cheap, inspectable, and easy to debug with a logic analyzer.

It also has a ceiling.

The ceiling usually does not arrive as a dramatic failure. It arrives as a collection of field symptoms: the robot drives cleanly at 20 Hz and becomes erratic at 100 Hz; one axis starts a few milliseconds before another; a USB adapter adds more latency than the whole control period; the firmware watchdog saves the robot, but the host never quite knows what state the drive is in. At that point the problem is no longer "how do I send velocity commands?" The problem is distributed motion control.

That is when UART stops being a convenient cable and starts being the architecture.

This article is the sequel to the ros2_control hardware-interface article: same robot, same controller-manager loop, but a different class of drive underneath it. We will start with the serial timing budget, show the exact point where it breaks, and then map the same read() -> update() -> write() contract onto EtherCAT and CiA 402 / DS402 drives.

The architecture shift

With a serial motor board, the host usually owns one packet stream and the firmware owns the inner loops. With EtherCAT, the host becomes a real-time master and each drive becomes a synchronized process-data endpoint.

Block diagram comparing a UART motor-control architecture with an EtherCAT architecture using DS402 drives.
Figure 1. The architecture shift: serial motor control is a point-to-point translator; EtherCAT turns each drive into a synchronized cyclic endpoint.

The software boundary does not move much. In both designs, ros2_control still sees state interfaces and command interfaces. The difference is what your hardware layer must guarantee. A serial interface is a translator. An EtherCAT interface is a cyclic real-time participant.

1. The serial timing budget

The useful serial calculation is brutally simple. For an 8N1 UART, each byte costs ten bit times: one start bit, eight data bits, one stop bit. At baud rate B, the bus time for a control cycle is:

T_bus = 10 * (N_tx + N_rx) / B

where:

  • N_tx is the number of host-to-drive bytes in the cycle.
  • N_rx is the number of drive-to-host bytes in the cycle.
  • B is the UART baud rate in bit/s.

If your differential base sends a 13-byte command frame and receives a 21-byte state frame at 115200 baud:

T_bus = 10 * (13 + 21) / 115200
      = 2.95 ms

That looks fine at 50 Hz:

T_loop = 20 ms
U_bus  = T_bus / T_loop = 14.8%

It is still workable at 100 Hz:

T_loop = 10 ms
U_bus  = 29.5%

But this is only the raw line time. It does not include USB latency timers, host scheduling jitter, parsing time, retransmission policy, or the fact that many robots are not two axes forever. UART is asynchronous and point-to-point: both sides must agree on baud rate and framing, and there is no inherent shared clock across multiple devices. That is a feature for simplicity and a limitation for synchronized motion.

Field heuristic:

U_bus = T_bus / T_loop < 0.3

As a field heuristic, keep serial bus utilization below roughly 30% when the link is in the control path. Above that, every hiccup consumes margin you were implicitly counting on.

2. The first failure mode: latency becomes part of the plant

Control engineers talk about sample time because the controller is designed around it. Embedded engineers talk about baud rate because the wire is slow. Robotics engineers have to care about both.

The host loop has its own latency stack:

Control-loop timing diagram showing read, update, write, drive action, sensor state, and jitter sources.
Figure 2. Latency does not only slow the system down. In a feedback loop, delayed samples and delayed commands become phase lag.

The phase penalty of delay is:

phi_delay = -omega_c * T_delay

or in degrees:

phi_delay_deg = -360 * f_c * T_delay

At a modest 20 Hz outer-loop bandwidth, 5 ms of extra delay costs:

phi_delay_deg = -360 * 20 * 0.005 = -36 deg

That is not bookkeeping. That is stability margin.

This is why a robot can behave perfectly on the bench and then lose crispness when the host CPU governor changes, the USB-serial driver buffers a packet, or a non-real-time callback lands on the same executor. Real-time ROS 2 systems make response latency a full-stack problem: hardware, firmware, operating system, libraries, and application code all contribute to the path. The fix is not "use a faster baud rate" if the architecture still lets the command arrive at a non-deterministic time.

3. What serial is still good at

Do not upgrade the fieldbus because the word "EtherCAT" sounds more professional. Upgrade because the requirements changed.

UART is still a good architecture when:

  • There are one or two axes.
  • The host loop is 50-100 Hz.
  • The drive firmware owns the current loop and velocity loop.
  • The command is low-rate: velocity, position target, or mode change.
  • A firmware watchdog can make loss of host communication safe.
  • There is no hard requirement that multiple axes latch commands on the same clock edge.

The serial firmware should still be serious:

frame = sync | length | command | payload | crc16

and the safety rule is:

T_watchdog < T_hazard
T_watchdog > N_miss * T_loop

where T_hazard is the time before uncontrolled motion becomes unacceptable, and N_miss is the number of missed host cycles you intentionally tolerate. A 100 ms firmware watchdog is a reasonable order of magnitude for many mobile bases. It is not a substitute for E-stop or safe torque off; it is the board-level answer to "the host stopped talking."

Watchdog warning:

Feed the watchdog only from code that proves the control path is alive. Servicing it from an unconditional timer interrupt defeats the purpose: the firmware can be wedged while the watchdog continues to be reset.

Between UART and EtherCAT there are middle-ground options such as RS-485 multidrop, CAN/CANopen, and CAN-FD. They can be excellent when robustness and multi-node wiring matter more than sub-millisecond synchronized motion.

When not to use EtherCAT

Do not migrate to EtherCAT if the robot has one or two axes, low-bandwidth commands, a good firmware watchdog, no synchronization requirement, and the current serial link has measured timing margin. EtherCAT adds commissioning complexity, ESI/PDO configuration, real-time master requirements, drive-state handling, and more failure modes to diagnose.

4. The signs you have outgrown UART

You have outgrown serial motor control when the problems become architectural rather than local:

Symptom

Works at 20 Hz, fails at 100 Hz

Likely cause: Bus and host latency consume the loop period.

EtherCAT changes: Cyclic process data at a defined master period.

Symptom

Axes do not start together

Likely cause: No shared command latch time.

EtherCAT changes: Distributed clocks and synchronized PDO exchange.

Symptom

One packet loss causes a visible bump

Likely cause: State and command are not decoupled cleanly.

EtherCAT changes: Drive state machine plus cyclic setpoint policy.

Symptom

Debugging requires vendor-specific commands per drive

Likely cause: No standard object model.

EtherCAT changes: CiA 402 object dictionary, controlword, statusword.

Symptom

Host code grows per-axis special cases

Likely cause: Each board has its own state semantics.

EtherCAT changes: DS402 common state machine.

Symptom

Velocity estimation is noisy on the host

Likely cause: Host differentiates encoder counts at low rate.

EtherCAT changes: Drive reports actual position, velocity, and torque each cycle.

Symptom

The robot needs many axes

Likely cause: Point-to-point links do not scale cleanly.

EtherCAT changes: One cyclic fieldbus frame touches all nodes.

The decisive question is not "can UART send the bytes?" It usually can. The question is whether the control system needs a shared, cyclic, multi-axis time base.

5. What EtherCAT actually buys

EtherCAT is not magic Ethernet. It is a deterministic process-data mechanism designed around short cycles and low synchronization jitter. The EtherCAT Technology Group describes the technology focus as short cycle times, low jitter, and distributed clocks for accurate synchronization. In practice, the important change is that the master exchanges process data with all drives in a cycle, and the drives can synchronize their local actions to a distributed clock instead of the instant a packet happens to arrive.

Sequence diagram of ros2_control, an EtherCAT master, and three drives exchanging PDO setpoints and status every control cycle.
Figure 3. EtherCAT makes the hardware interface a cyclic participant: setpoints go out and actual position plus statusword come back every configured cycle.

That changes the failure modes. You are no longer asking, "Did I parse the latest ASCII line?" You are asking:

  • Did the master complete the cycle?
  • Did every drive receive fresh process data?
  • Is the drive still Operation Enabled?
  • Is the statusword reporting a warning, following error, quick-stop, or fault?
  • Are the PDOs mapped to the control mode you think you are using?
  • Is the distributed-clock configuration actually active?

DC being "enabled" in the configuration is not enough; verify the drive's actual synchronization state, Sync0 period/shift, and the measured working-counter and cycle-miss diagnostics.

The work moves from framing bytes to commissioning motion.

6. DS402: the common language underneath the drives

CiA 402, often called DS402 in EtherCAT conversations, is the standardized drive profile that makes a servo drive look like a servo drive instead of a vendor-specific box. It defines a finite state machine, operating modes, and a common object dictionary. The host writes a controlword; the drive reports a statusword. Process data maps target values and actual values into PDOs.

Practical rule: use PDOs for cyclic real-time process data; use SDOs for configuration, commissioning, and diagnostics outside the hot path.

Simplified state flow:

Simplified DS402 state machine from switch on disabled to operation enabled, quick stop, and fault.
Figure 4. DS402 is useful because enabling motion is a state transition, not a vendor-specific packet.

Typical control sequence:

Shutdown          0x0006
Switch on         0x0007
Enable operation  0x000F

The key operating modes for robot builders are:

DS402 code 1

PP, Profile Position

Master sends: Target position and profile parameters.

Drive closes: Trajectory/profile internally.

DS402 code 3

PV, Profile Velocity

Master sends: Target velocity and profile parameters.

Drive closes: Velocity profile internally.

DS402 code 4

PT, Profile Torque

Master sends: Target torque and profile parameters.

Drive closes: Torque profile internally.

DS402 code 8

CSP, Cyclic Synchronous Position

Master sends: Position setpoint every cycle.

Drive closes: Inner loops, synchronized.

DS402 code 9

CSV, Cyclic Synchronous Velocity

Master sends: Velocity setpoint every cycle.

Drive closes: Inner loops, synchronized.

DS402 code 10

CST, Cyclic Synchronous Torque

Master sends: Torque setpoint every cycle.

Drive closes: Torque/current loops.

DS402 code 6

HM, Homing

Master sends: Homing command and parameters.

Drive closes: Homing sequence.

For robotics, the cyclic synchronous modes are usually the point. The master sends a new setpoint each cycle, and the drive applies it in the configured synchronous timing model. Profile modes are excellent for point-to-point industrial moves; cyclic modes are the natural fit when a robot controller owns the trajectory.

7. Mapping DS402 into ros2_control

The ros2_control contract is still narrow:

read()  -> fill state interfaces
update() -> controllers compute command interfaces
write() -> push command interfaces to hardware

For a serial base, write() might serialize two wheel velocities. For an EtherCAT robot, write() writes PDO targets for every drive. The implementation changes from "send packet" to "stage process data for the next bus cycle."

Block diagram mapping ros2_control command interfaces to DS402 target objects and state interfaces to actual values and statusword.
Figure 5. The ros2_control contract stays narrow: commands go down through exported interfaces and PDO targets; measured state and drive status come back up.

Design rule:

Choose the DS402 mode from where you want the control authority to live:

  • Use CSP when the robot controller owns trajectory generation and sends position setpoints.
  • Use CSV when the robot controller owns velocity commands and the drive owns velocity regulation.
  • Use CST when a model-based controller owns torque and the drive is mostly a torque/current actuator.
  • Use profile modes when the drive should generate the trajectory from point-to-point commands.

Do not choose CSP because it sounds advanced. Choose it because your master can produce setpoints at a reliable cycle time and your trajectory generator belongs above the drive.

8. The cycle-time equation

EtherCAT does not remove timing math. It changes which timing matters.

The cyclic loop must satisfy:

T_compute + T_bus + T_margin <= T_cycle

where:

  • T_compute is the host-side control computation and EtherCAT stack time.
  • T_bus is process-data exchange time.
  • T_margin is deliberate slack for scheduling variation and diagnostics.
  • T_cycle is the configured EtherCAT control period.

For a 1 kHz loop:

T_cycle = 1 ms

That leaves no room for casual software. A 1 kHz EtherCAT loop on a non-real-time host can fail for the same reason a 100 Hz UART loop fails: the bus is not the only source of latency. ROS 2 real-time design guidance makes the uncomfortable point explicit: application code, libraries, OS scheduling, firmware, and hardware all contribute to response latency.

Migration warning:

EtherCAT gives deterministic cyclic process-data exchange at the fieldbus level. It does not automatically make Linux, your user-space master, your trajectory generator, or your ros2_control hardware interface deterministic.

9. Watchdogs move down and up

Serial systems usually have one obvious watchdog: "if no command frame arrives within T_watchdog, stop the motors."

EtherCAT systems have several layers:

Layered watchdog diagram from ros2_control timeout through EtherCAT missed cycle and drive watchdog to STO safety chain.
Figure 6. The fieldbus watchdog is only one layer. A robot still needs software timeouts, drive reactions, and an external safety chain.

The DS402 state machine gives you more precise language for failure handling:

  • A stale command is not the same as a drive fault.
  • A following error is not the same as a bus dropout.
  • A quick stop is not the same as disabling torque.
  • A fault reset is an explicit state transition, not "send the packet again."

Design rule:

Every layer should fail in the safest direction, but only one layer should own each decision. Let the controller handle command timeout, the master handle bus-cycle loss, the drive handle local following/fault limits, and the safety chain handle hazardous energy.

10. A practical migration path

Do not port the whole robot in one step. Port the contract.

Step 1: Freeze the interfaces

Write down the existing ros2_control interfaces:

left_wheel_joint:
  command: velocity
  state: position, velocity

right_wheel_joint:
  command: velocity
  state: position, velocity

or for an arm:

joint_i:
  command: position
  state: position, velocity, effort

The controller should not care whether the hardware layer speaks UART or EtherCAT.

Step 2: Choose DS402 mode per joint

For a differential base, CSV often maps cleanly: diff_drive_controller produces wheel velocity commands, and the drive closes the velocity loop.

For a multi-axis arm under joint_trajectory_controller, CSP is usually the more natural first port: the trajectory controller produces position targets, and the drive receives cyclic position setpoints.

For high-end model-based control, CST is attractive, but only if the robot controller owns the dynamics and your safety story is mature.

Step 3: Commission one drive outside ROS

Before writing the hardware interface, prove:

  • The drive reaches Operation Enabled.
  • The expected PDOs are mapped.
  • The selected mode appears in the mode display.
  • Actual position changes when the motor moves.
  • Following-error and quick-stop behavior are understood.
  • The process-data watchdog does what you think it does.

Step 4: Build the SystemInterface

The hot path should look boring:

return_type read(...) {
  master.process();
  for (auto& axis : axes_) {
    axis.position = counts_to_rad(axis.pdo.actual_position);
    axis.velocity = counts_to_rad_s(axis.pdo.actual_velocity);
    axis.status   = axis.pdo.statusword;
  }
  return all_axes_healthy() ? return_type::OK : return_type::ERROR;
}

return_type write(...) {
  for (auto& axis : axes_) {
    axis.pdo.controlword = axis.next_controlword();
    axis.pdo.target_position = rad_to_counts(axis.command_position);
  }
  return return_type::OK;
}

The non-boring work belongs outside the hot path: parsing ESI files, validating PDO layout, resolving drive states, logging faults, and exposing diagnostics.

Step 5: Measure before driving

Use the same discipline as the serial bring-up:

  • Spin an axis by hand and verify /joint_states.
  • Command a single low-speed move.
  • Verify signs and units.
  • Log cycle time and missed cycles.
  • Pull the cable and observe the fault path.
  • Trigger quick stop intentionally.
  • Reboot a drive and confirm the state machine recovers only when commanded.

11. Symptom table

Symptom

Drive never leaves Switch On Disabled

Almost always: Controlword sequence wrong, safety input open, or fault latched.

Fix: Decode statusword bits; verify STO/safety chain before software.

Symptom

Drive reaches Operation Enabled but does not move

Almost always: Mode not active, PDO target not mapped, or command units wrong.

Fix: Check mode display, PDO map, and scale factors.

Symptom

First move jumps

Almost always: Stale target position or missing setpoint initialization.

Fix: On activate, seed target = actual before enabling motion.

Symptom

Axes start slightly apart

Almost always: Distributed clocks not configured, not active, or not actually synchronized.

Fix: Enable DC sync, then verify the drive sync state, Sync0 period/shift, working counter, and cycle-miss diagnostics.

Symptom

Works at 250 Hz, faults at 1 kHz

Almost always: Host loop or EtherCAT stack misses cycle budget.

Fix: Measure T_compute; lower cycle rate or move to RT kernel/core isolation.

Symptom

Following error at acceleration changes

Almost always: Trajectory too aggressive or feedforward missing.

Fix: Limit jerk/acceleration; add velocity/acceleration feedforward where appropriate.

Symptom

Robot reports healthy while drive is warning

Almost always: Statusword decoded incompletely.

Fix: Promote warnings, internal limits, and following-error bits into diagnostics.

Symptom

Recovery after fault is unpredictable

Almost always: Fault reset treated as a packet, not a state transition.

Fix: Implement explicit DS402 state handling.

12. The decision

UART is a good way to make a motor controller talk to ROS. EtherCAT is a good way to make a set of drives behave like a machine.

The migration is justified when at least one of these becomes true:

  • You need more axes than point-to-point serial links can manage cleanly.
  • The loop period is short enough that bus and host jitter are now control variables.
  • Multiple axes must latch commands at the same effective time.
  • You need standard drive states, faults, quick-stop behavior, and diagnostics.
  • You want to swap drive vendors without rewriting the control contract.
  • The robot controller, not the drive, must own the trajectory.

The final design rule:

Use UART while the drive is a smart peripheral.
Use EtherCAT when the drives are distributed parts of one motion controller.

That distinction is the whole article. A serial link sends commands. A fieldbus coordinates motion.


This article is based on practical embedded motor-control experience, ROS 2 hardware-interface design, CiA 402 drive concepts, and EtherCAT fieldbus architecture. Always verify PDO maps, statusword bits, timing behavior, and controller interfaces against the specific drive and ROS 2 distribution in front of you.

Share
05 — Contact

Have a hard engineering problem?

Email
rotem@segevtech.com
Tel
+972-52-6444408
Studio
Tel Aviv, Israel