UFS Subsystem Architecture
Universal Flash Storage (UFS) has become the dominant storage interface for modern mobile devices, replacing eMMC with significantly higher performance and advanced features. This article explores the UFS subsystem architecture, from hardware interfaces to software drivers.
Overview of UFS
UFS is a high-performance storage specification developed by JEDEC (Joint Electron Device Engineering Council). It addresses the limitations of eMMC by providing:
- Higher Bandwidth: Up to 5.8 GB/s (UFS 4.0) vs eMMC’s 400 MB/s
- Full Duplex: Simultaneous read and write operations
- Command Queuing: Multiple outstanding commands (up to 32)
- Advanced Power Management: Multiple low-power states
- Security Features: Hardware encryption and secure storage
UFS Architecture Layers
UFS follows a layered architecture similar to networking protocols:
Application Layer
- SCSI commands (read, write, format, etc.)
- UFS-specific commands (power management, provisioning)
- Device management and configuration
Transport Layer (UFS Protocol)
- Command processing
- Task management
- Power mode transitions
Data Link Layer (UniPro)
- Error detection and correction
- Flow control
- Link initialization
Physical Layer (M-PHY)
- Signal transmission
- High-speed serial interface
- Power state management
UFS Protocol Information Unit (UPIU)
UPIU is the fundamental data structure in UFS communication. Different UPIU types serve different purposes:
Command UPIU
Sent from host to device to initiate operations:
- NOP OUT: No operation (keep-alive, ping)
- COMMAND: SCSI command execution
- TASK MGMT: Task management requests
- QUERY REQUEST: Device configuration queries
Response UPIU
Device responses to host commands:
- NOP IN: NOP response
- RESPONSE: Command completion status
- TASK MGMT RESP: Task management response
- QUERY RESPONSE: Configuration query results
Data UPIU
Carries actual data:
- DATA OUT: Host to device data transfer
- DATA IN: Device to host data transfer
UPIU structure:
+------------------+
| Transaction Code | 1 byte
+------------------+
| Flags | 1 byte
+------------------+
| LUN | 1 byte
+------------------+
| Task Tag | 1 byte
+------------------+
| Command/Response | Variable
| Specific Fields |
+------------------+
| Data Segment | Variable (optional)
+------------------+
UFS Host Controller Interface (UFS HCI)
The UFS HCI defines how host software communicates with the UFS controller hardware. Key components:
Host Controller Registers
Memory-mapped registers for:
- Controller capabilities and configuration
- Interrupt status and control
- UTRLBA (UTP Transfer Request List Base Address)
- UTMRLBA (UTP Task Management Request List Base Address)
UTP Transfer Request List (UTRL)
A circular buffer in system memory containing:
- Slots: Up to 32 transfer request entries
- Doorbell Register: Host sets bits to notify controller of new requests
- Completion: Controller clears doorbell bits upon completion
UTP Transfer Request Descriptor (UTRD)
Each UTRL entry contains a descriptor with:
struct utp_transfer_req_desc {
u32 header[2]; // Command type, data direction
u32 command_desc_base_addr_lo;
u32 command_desc_base_addr_hi;
u16 response_upiu_length;
u16 response_upiu_offset;
u16 prd_table_length; // Number of PRD entries
u16 prd_table_offset;
// ... additional fields
};
This descriptor points to:
- Command UPIU: The SCSI or UFS command
- Response UPIU: Buffer for device response
- PRDT: Physical Region Description Table for data buffers
Physical Region Description Table (PRDT)
The PRDT describes scatter-gather lists for data transfers:
struct ufshcd_sg_entry {
u32 base_addr; // Physical address (low 32 bits)
u32 upper_addr; // Physical address (high 32 bits)
u32 reserved;
u32 size; // Data buffer size - 1
};
Multiple PRDT entries enable efficient DMA transfers from non-contiguous memory regions.
SCSI in UFS Context
UFS leverages the SCSI (Small Computer System Interface) command set:
Common SCSI Commands
- READ(10)/READ(16): Read data from logical blocks
- WRITE(10)/WRITE(16): Write data to logical blocks
- INQUIRY: Query device information
- READ CAPACITY: Get device size
- SYNCHRONIZE CACHE: Flush caches
- UNMAP: TRIM/discard unused blocks
Command Descriptor Block (CDB)
SCSI commands are encoded in CDB format:
// READ(10) CDB example
struct read10_cdb {
u8 opcode; // 0x28
u8 flags;
u32 lba; // Logical block address (big-endian)
u8 group;
u16 transfer_len; // Number of blocks (big-endian)
u8 control;
};
Logical Unit Number (LUN)
UFS devices can present multiple LUNs:
- LUN 0: Boot partition A
- LUN 1: Boot partition B
- LUN 2: User data partition
- LUN 3: RPMB (Replay Protected Memory Block)
Different LUNs can have different characteristics (read-only, protected, etc.).
UFS Driver Architecture in Linux
The Linux UFS driver (ufshcd) follows a layered approach:
Core Layer (ufshcd-core.c)
- Command processing and scheduling
- Power management
- Error handling and recovery
- Device initialization
Platform Layer (ufshcd-pltfrm.c)
- Platform-specific initialization
- Clock and regulator management
- Platform device binding
Vendor Extensions
- Qualcomm specific:
ufs-qcom.c - MediaTek specific:
ufs-mediatek.c - Samsung specific:
ufs-exynos.c
Block Layer Integration
┌──────────────┐
│ File System │ (ext4, f2fs, etc.)
└──────┬───────┘
│
┌──────▼───────┐
│ Block Layer │ (I/O scheduler, request queue)
└──────┬───────┘
│
┌──────▼───────┐
│ SCSI Layer │ (sd.c - SCSI disk driver)
└──────┬───────┘
│
┌──────▼───────┐
│ UFS Driver │ (ufshcd)
└──────┬───────┘
│
┌──────▼───────┐
│ UFS HCI │ (Hardware)
└──────────────┘
Command Flow Example
A typical read operation flow:
- Application issues
read()system call - VFS Layer routes to filesystem
- Filesystem (ext4/f2fs) generates bio (block I/O)
- Block Layer creates request, passes to SCSI layer
- SCSI Disk Driver (
sd.c) builds SCSI READ command - UFS Driver (
ufshcd):- Allocates UTRD slot
- Builds Command UPIU with READ(10) CDB
- Sets up PRDT for DMA buffers
- Writes UTRLDBR (doorbell) register
- UFS Controller:
- Fetches UTRD from memory
- Sends Command UPIU to device via UniPro/M-PHY
- UFS Device:
- Processes SCSI READ command
- Fetches data from NAND flash
- Sends Data IN UPIU back to host
- UFS Controller:
- DMA transfers data to system memory
- Sends Response UPIU
- Generates interrupt
- UFS Driver completes request, returns data to block layer
Total latency: Typically 50-200μs depending on queue depth and device state.
Advanced Features
Command Queuing
UFS supports up to 32 outstanding commands, allowing:
- Parallelism: Multiple commands in flight
- Reordering: Device can optimize execution order
- Higher Throughput: Amortize command overhead
Background Operations (BKOPS)
Device can perform maintenance in background:
- Garbage collection
- Wear leveling
- Error correction table updates
Host software must periodically allow BKOPS when idle.
Write Booster
Temporary high-speed write buffer for improved burst performance:
- Small SLC (Single-Level Cell) buffer
- Automatic flush to main storage
- Configurable buffer size
Host Performance Booster (HPB)
Device shares logical-to-physical mapping with host:
- Host can issue physical address reads
- Bypasses device FTL lookup
- Reduces read latency
Power Management
UFS defines multiple power modes:
Active: Full performance
Sleep: Low power, fast resume
PowerDown: Deeper sleep, slower resume
Hibernate: Minimal power, slowest resume
Gear switching (M-PHY speed grades):
- HS-G1: 1.46 Gb/s per lane
- HS-G2: 2.9 Gb/s per lane
- HS-G3: 5.8 Gb/s per lane
- HS-G4: 11.6 Gb/s per lane
Runtime power management in Linux automatically transitions between states based on idle time.
Debugging and Tools
Kernel Tracing
# Enable UFS tracepoints
echo 1 > /sys/kernel/debug/tracing/events/ufs/enable
# Monitor UFS commands
cat /sys/kernel/debug/tracing/trace_pipe | grep ufs
Device Health
# Check UFS device attributes
cd /sys/class/scsi_device/0:0:0:0/device/
cat vendor model health_descriptor
Performance Monitoring
# I/O statistics
iostat -x 1 /dev/sda
# UFS-specific stats (if available)
cat /sys/kernel/debug/ufshcd/ufs_stats
Common Issues and Solutions
Issue: Random performance drops Solution: Check BKOPS status; ensure sufficient idle time for maintenance
Issue: High latency during writes Solution: Enable write booster if supported; check I/O scheduler settings
Issue: Device initialization failures Solution: Verify power sequencing; check clock configuration in device tree
Issue: Link training failures Solution: M-PHY signal integrity problems; check PCB layout and power supply
Conclusion
UFS represents a significant leap forward in mobile storage technology. Understanding its architecture—from SCSI commands to M-PHY signaling—is essential for:
- System Designers: Making informed trade-offs between performance, power, and cost
- Driver Developers: Implementing and debugging UFS host controllers
- Performance Engineers: Optimizing I/O workloads for UFS characteristics
- Application Developers: Understanding storage behavior for better software design
As UFS continues evolving (UFS 4.0 and beyond), mastering these fundamentals provides the foundation for working with next-generation storage systems.