Transformer Architecture 3D Visualization

Architecture Overview

The Transformer architecture revolutionized NLP by relying entirely on attention mechanisms, eliminating the need for recurrence and convolutions. This visualization shows the complete encoder-decoder architecture with all its components.

Key Dimensions

Model dimension (d_model): 512
Number of heads: 8
Head dimension: 64
Feed-forward dimension: 2048
Number of layers: 6

Multi-Head Attention

The core innovation of the Transformer, allowing the model to jointly attend to information from different representation subspaces at different positions.

Attention(Q,K,V) = softmax(QK^T/√d_k)V

Attention Types

Self-Attention: In encoder layers
Masked Self-Attention: In decoder layers
Cross-Attention: Decoder attending to encoder

Positional Encoding

Since the model contains no recurrence or convolution, positional encodings are added to give the model information about the relative or absolute position of tokens.

PE_(pos,2i) = sin(pos/10000^2i/d_model)
PE_(pos,2i+1) = cos(pos/10000^2i/d_model)

Layer Components

Sub-layers

Multi-Head Attention: Parallel attention computations
Feed Forward Network: Two linear transformations with ReLU
Layer Normalization: Applied after each sub-layer
Residual Connections: Around each sub-layer

Feed Forward Network

FFN(x) = max(0, xW₁ + b₁)W₂ + b₂

Interactive Features

Navigation

Orbit: Left mouse button + drag
Zoom: Mouse wheel or pinch
Pan: Right mouse button + drag

Visualization Modes

Full Architecture: Complete encoder-decoder
Attention Animation: Watch attention flow
Data Flow: Follow token processing

Implementation Notes

This visualization implements the architecture described in "Attention is All You Need" (Vaswani et al., 2017). The 3D representation allows for better understanding of the parallel nature of multi-head attention and the flow of information through the network.

Visual Design

Color coding follows TensorFlow conventions
Opacity indicates information flow intensity
Animations reveal computation sequence
Interactive controls for exploration

Transformer Architecture in 3D

Controls

Component Info