Start with the simplest case: a single rigid link of length L, attached to the world at one end by a revolute joint that rotates by angle θ about the Z-axis. Where is the other end?
step 1 — the transform matrix
A single transform T maps any point in the end-effector frame to the world frame. It packs rotation + translation together:
T = [ R | t ] — 3×3 homogeneous (2D):
cos θ
−sin θ
L·cos θ
sin θ
cos θ
L·sin θ
0
0
1
live values (θ = 35°, L = 1.0):
EE position: x = —, y = —
What the matrix tells you: The blue 2×2 block is the rotation — it says "the end-effector's x-axis points in direction (cos θ, sin θ) in world coordinates." The green column is the translation — it's exactly where the end-effector origin is. The bottom row [0 0 1] is always fixed. Points are represented as (x, y, 1) — that fake extra 1 is what activates the translation column.
step 2 — the code (2D, 3×3 matrix)
def get_T(theta_deg, L):
c = np.cos(np.deg2rad(theta_deg))
s = np.sin(np.deg2rad(theta_deg))
# 2D homogeneous: 3×3
# point (x,y) is represented as (x, y, 1)
T = np.array([[c, -s, L*c], # rotation | translation
[s, c, L*s],
[0, 0, 1]]) # always [0, 0, 1]
return T
# End-effector position is the last column (first 2 rows):
position = T[0:2, 2] # [tx, ty]
Notice: the translation column [L·cosθ, L·sinθ, 0] is just basic trigonometry — the tip of a link of length L at angle θ. The matrix is just a convenient way to package this.
Now add a second link. Joint 2 sits at the tip of link 1, and can rotate independently by angle θ₂. Where is the new tip?
We can't just add angles — the second joint's displacement is expressed in frame 1's coordinates, not the world. We need to transform it. That's exactly what matrix multiplication does.
T₁ — world to joint 1
describes link 1 (length L₁, angle θ₁)
Joint 2 position: x = —, y = —
T₁₂ — joint 1 to joint 2
describes link 2 (length L₂, angle θ₂ relative to link 1)
local offset of EE from joint 2
Key question: T₁₂ gives the EE position in link 1's frame. To get it in the world frame, we premultiply by T₁. That cancels "link 1's frame" from both sides, leaving the world frame.
T_world→EE = T₁ · T₁₂
the chain product — live values:
EE world position: x = —, y = —
the code — two links (2D, 3×3)
def get_T1(theta1_deg, L1):
c, s = np.cos(np.deg2rad(theta1_deg)), np.sin(np.deg2rad(theta1_deg))
return np.array([[c, -s, L1*c],
[s, c, L1*s],
[0, 0, 1]])
def get_T12(theta2_deg, L2): # same structure — local frame!
c, s = np.cos(np.deg2rad(theta2_deg)), np.sin(np.deg2rad(theta2_deg))
return np.array([[c, -s, L2*c],
[s, c, L2*s],
[0, 0, 1]])
# Chain them — this is the forward kinematics:
T_world_ee = get_T1(theta1, L1) @ get_T12(theta2, L2)
position = T_world_ee[0:2, 2] # top of last column = EE in world
The pattern: both functions look identical — same structure, same assembly. The only difference is which angle and which length they take. Each function only "knows about" its own joint. The chain product handles the rest.
The SO-101 has one more ingredient: each joint transform also includes a fixed displacement — the physical offset between two joint axes that doesn't depend on any joint angle. This is just the link geometry: how far apart the screws are.
The displacement goes in the translation column, alongside the rotation. The joint angle only affects the rotation block.
what each transform looks like — with displacement
For joint 2 of SO-101, the displacement d = (−0.0304, −0.0183, −0.0542) m is the fixed vector from joint 1's axis to joint 2's axis. The joint angle only enters the rotation block:
T₁₂
=
c₂
−s₂
0
d_x
s₂
c₂
0
d_y
0
0
1
d_z
0
0
0
1
The green column [d_x, d_y, d_z] is hardcoded from the URDF — it never changes. Only the blue block rotates.
live chain product — T₁ · T₁₂ · T₂₃
EE world position: x = —, y = —
the code — three links, with displacement
def get_T12(theta2_deg):
# Fixed displacement — from URDF, never changes
displacement = np.array([-0.0304, -0.0183, -0.0542]).reshape(3,1)
# Rotation — only this changes with the joint angle
c, s = np.cos(np.deg2rad(theta2_deg)), np.sin(np.deg2rad(theta2_deg))
R = np.array([[c, -s, 0],
[s, c, 0],
[0, 0, 1]])
# Assemble the 4×4 block
T = np.block([[R, displacement],
[0, 0, 0, 1]])
return T
# Full chain — same @ operator, just more terms:
T_world_ee = get_T1(th1) @ get_T12(th2) @ get_T23(th3)
position = T_world_ee[0:3, 3]
rotation = T_world_ee[0:3, 0:3]
The SO-101 has one extra ingredient on top of what we've seen: axis-alignment rotations. In a perfectly designed robot, every joint would rotate about the same axis in the same direction. In a real robot, the joints point in different directions, so each transform needs a fixed "setup" rotation before the joint variable is applied.
what we've done so far
R = Rz(θ)
Joint variable only. Works when all joints rotate about the same axis.
SO-101 pattern
R = R_align · Rz(θ)
Fixed alignment first, then joint variable. The alignment comes from the URDF.
For example, get_gw1 uses Rz(180) @ Rx(180) as the alignment — these two constant rotations flip the Z-axis to point upward from the robot base. Then Rz(θ₁) rotates the shoulder pan on top of that.
the full recipe — every get_gXY in the codebase
def get_gXY(theta_deg):
# ① Fixed — from URDF geometry
displacement = (dx, dy, dz)
# ② Fixed — re-orient axes so the joint spins about the right direction
# ③ Variable — the actual joint angle, always Rz(θ) for SO-101
rotation = R_align @ Rz(theta_deg)
# ④ Assemble
pose = np.block([[rotation, np.array(displacement).reshape(3,1)],
[0, 0, 0, 1]])
return pose
# Forward kinematics — chain all six transforms:
T = gw1 @ g12 @ g23 @ g34 @ g45 @ g5t
position = T[0:3, 3] # where is the tip?
rotation = T[0:3, 0:3] # how is the tip oriented?
The chain rule in one sentence: T_w1 carries everything downstream with it — when joint 1 rotates, joints 2–5 and the tool tip all move together, because T_w1 is the leftmost factor in the product. Each joint only "knows about" its own angle. The multiplication handles how they compose.
summary — what each part of the matrix does
| Matrix part |
What it encodes |
Changes with θ? |
In code |
| R (3×3 top-left) |
How the child frame is oriented relative to the parent |
Yes — via Rz(θ) |
R_align @ Rz(theta_deg) |
| t (3×1 top-right) |
Where the child frame's origin is in the parent frame |
No — fixed geometry |
displacement tuple |
| [0 0 0 1] (bottom) |
Homogeneous convention — always this, enables matrix chaining |
Never |
hardcoded 0,0,0,1 |
Everything in chapters 1–4 was 2D: one plane, one rotation axis, 3×3 matrices. The SO-101 lives in 3D space. Here is exactly what changes — and what stays the same.
why the homogeneous trick exists at all
A plain rotation matrix can never translate, because it always maps the origin to the origin. Multiplying any matrix M by the zero vector gives zero — no matter what M is.
❌ rotation only (2×2) — can't translate
At (0,0): result is always (0,0). Can't shift the origin.
✓ homogeneous (3×3) — rotation + translation
The fake 1 "activates" the translation column. The point is still 2D — you ignore the 1 in the output.
The +1 rule: n-dimensional space → (n+1)×(n+1) homogeneous transform. 2D needs 3×3. 3D needs 4×4. The bottom row is always [0 … 0 1].
side-by-side: 2D (3×3) vs 3D (4×4)
2D homogeneous — 3×3
point: (x, y, 1)
cos θ
−sin θ
tx
sin θ
cos θ
ty
0
0
1
R block: 2×2 — one rotation axis (Z only)
t column: 2 values — (tx, ty)
bottom row: [0, 0, 1]
3D homogeneous — 4×4
point: (x, y, z, 1)
r₁₁
r₁₂
r₁₃
tx
r₂₁
r₂₂
r₂₃
ty
r₃₁
r₃₂
r₃₃
tz
0
0
0
1
R block: 3×3 — can rotate about X, Y, or Z
t column: 3 values — (tx, ty, tz)
bottom row: [0, 0, 0, 1]
what changes: the rotation block
In 2D there is only one way to rotate — in the XY plane. In 3D you can rotate about any axis, so you need three elementary rotation matrices instead of one:
Rx(θ)
X stays fixed
Y/Z mix
Ry(θ)
Y stays fixed
X/Z mix
Rz(θ)
Z stays fixed
X/Y mix
Notice: Rz is identical to the 2D rotation matrix, just padded with a third row/column. Rotating in the XY plane is the same thing whether you're in 2D or 3D — you just now also have a Z axis that doesn't move.
code: 2D vs 3D transform function
2D — 3×3 (what chapters 1–4 use)
T = np.array([
[c, -s, tx],
[s, c, ty],
[0, 0, 1]
])
# extract position:
pos = T[0:2, 2] # [tx, ty]
3D — 4×4 (the SO-101 code)
T = np.block([
[R, t ], # R is 3×3
[0, 0, 0, 1] # t is 3×1
])
# extract position:
pos = T[0:3, 3] # [tx, ty, tz]
Everything else is the same. The chain multiplication rule is identical: T_world→EE = T₁ · T₁₂ · … The bottom row is always zeros + 1. The translation is always the last column. The rotation block is always the top-left square. Only the sizes change: 3×3 → 4×4, and the rotation block gains the ability to express rotations about all three axes, which is why you need Rx, Ry, Rz in 3D but only one rotation matrix in 2D.