UFS Subsystem Architecture Deep Dive

Understanding Universal Flash Storage from Protocol to Implementation

Posted by half cup coffee on May 28, 2024

UFS Subsystem Architecture

Universal Flash Storage (UFS) has become the dominant storage interface for modern mobile devices, replacing eMMC with significantly higher performance and advanced features. This article explores the UFS subsystem architecture, from hardware interfaces to software drivers.

Overview of UFS

UFS is a high-performance storage specification developed by JEDEC (Joint Electron Device Engineering Council). It addresses the limitations of eMMC by providing:

  • Higher Bandwidth: Up to 5.8 GB/s (UFS 4.0) vs eMMC’s 400 MB/s
  • Full Duplex: Simultaneous read and write operations
  • Command Queuing: Multiple outstanding commands (up to 32)
  • Advanced Power Management: Multiple low-power states
  • Security Features: Hardware encryption and secure storage

UFS Architecture Layers

UFS follows a layered architecture similar to networking protocols:

Application Layer

  • SCSI commands (read, write, format, etc.)
  • UFS-specific commands (power management, provisioning)
  • Device management and configuration

Transport Layer (UFS Protocol)

  • Command processing
  • Task management
  • Power mode transitions
  • Error detection and correction
  • Flow control
  • Link initialization

Physical Layer (M-PHY)

  • Signal transmission
  • High-speed serial interface
  • Power state management

UFS Protocol Information Unit (UPIU)

UPIU is the fundamental data structure in UFS communication. Different UPIU types serve different purposes:

Command UPIU

Sent from host to device to initiate operations:

  • NOP OUT: No operation (keep-alive, ping)
  • COMMAND: SCSI command execution
  • TASK MGMT: Task management requests
  • QUERY REQUEST: Device configuration queries

Response UPIU

Device responses to host commands:

  • NOP IN: NOP response
  • RESPONSE: Command completion status
  • TASK MGMT RESP: Task management response
  • QUERY RESPONSE: Configuration query results

Data UPIU

Carries actual data:

  • DATA OUT: Host to device data transfer
  • DATA IN: Device to host data transfer

UPIU structure:

+------------------+
| Transaction Code |  1 byte
+------------------+
| Flags            |  1 byte
+------------------+
| LUN              |  1 byte
+------------------+
| Task Tag         |  1 byte
+------------------+
| Command/Response |  Variable
| Specific Fields  |
+------------------+
| Data Segment     |  Variable (optional)
+------------------+

UFS Host Controller Interface (UFS HCI)

The UFS HCI defines how host software communicates with the UFS controller hardware. Key components:

Host Controller Registers

Memory-mapped registers for:

  • Controller capabilities and configuration
  • Interrupt status and control
  • UTRLBA (UTP Transfer Request List Base Address)
  • UTMRLBA (UTP Task Management Request List Base Address)

UTP Transfer Request List (UTRL)

A circular buffer in system memory containing:

  • Slots: Up to 32 transfer request entries
  • Doorbell Register: Host sets bits to notify controller of new requests
  • Completion: Controller clears doorbell bits upon completion

UTP Transfer Request Descriptor (UTRD)

Each UTRL entry contains a descriptor with:

struct utp_transfer_req_desc {
    u32 header[2];           // Command type, data direction
    u32 command_desc_base_addr_lo;
    u32 command_desc_base_addr_hi;
    u16 response_upiu_length;
    u16 response_upiu_offset;
    u16 prd_table_length;    // Number of PRD entries
    u16 prd_table_offset;
    // ... additional fields
};

This descriptor points to:

  1. Command UPIU: The SCSI or UFS command
  2. Response UPIU: Buffer for device response
  3. PRDT: Physical Region Description Table for data buffers

Physical Region Description Table (PRDT)

The PRDT describes scatter-gather lists for data transfers:

struct ufshcd_sg_entry {
    u32 base_addr;          // Physical address (low 32 bits)
    u32 upper_addr;         // Physical address (high 32 bits)
    u32 reserved;
    u32 size;               // Data buffer size - 1
};

Multiple PRDT entries enable efficient DMA transfers from non-contiguous memory regions.

SCSI in UFS Context

UFS leverages the SCSI (Small Computer System Interface) command set:

Common SCSI Commands

  • READ(10)/READ(16): Read data from logical blocks
  • WRITE(10)/WRITE(16): Write data to logical blocks
  • INQUIRY: Query device information
  • READ CAPACITY: Get device size
  • SYNCHRONIZE CACHE: Flush caches
  • UNMAP: TRIM/discard unused blocks

Command Descriptor Block (CDB)

SCSI commands are encoded in CDB format:

// READ(10) CDB example
struct read10_cdb {
    u8 opcode;          // 0x28
    u8 flags;
    u32 lba;            // Logical block address (big-endian)
    u8 group;
    u16 transfer_len;   // Number of blocks (big-endian)
    u8 control;
};

Logical Unit Number (LUN)

UFS devices can present multiple LUNs:

  • LUN 0: Boot partition A
  • LUN 1: Boot partition B
  • LUN 2: User data partition
  • LUN 3: RPMB (Replay Protected Memory Block)

Different LUNs can have different characteristics (read-only, protected, etc.).

UFS Driver Architecture in Linux

The Linux UFS driver (ufshcd) follows a layered approach:

Core Layer (ufshcd-core.c)

  • Command processing and scheduling
  • Power management
  • Error handling and recovery
  • Device initialization

Platform Layer (ufshcd-pltfrm.c)

  • Platform-specific initialization
  • Clock and regulator management
  • Platform device binding

Vendor Extensions

  • Qualcomm specific: ufs-qcom.c
  • MediaTek specific: ufs-mediatek.c
  • Samsung specific: ufs-exynos.c

Block Layer Integration

┌──────────────┐
│  File System │ (ext4, f2fs, etc.)
└──────┬───────┘
       │
┌──────▼───────┐
│  Block Layer │ (I/O scheduler, request queue)
└──────┬───────┘
       │
┌──────▼───────┐
│  SCSI Layer  │ (sd.c - SCSI disk driver)
└──────┬───────┘
       │
┌──────▼───────┐
│  UFS Driver  │ (ufshcd)
└──────┬───────┘
       │
┌──────▼───────┐
│  UFS HCI     │ (Hardware)
└──────────────┘

Command Flow Example

A typical read operation flow:

  1. Application issues read() system call
  2. VFS Layer routes to filesystem
  3. Filesystem (ext4/f2fs) generates bio (block I/O)
  4. Block Layer creates request, passes to SCSI layer
  5. SCSI Disk Driver (sd.c) builds SCSI READ command
  6. UFS Driver (ufshcd):
    • Allocates UTRD slot
    • Builds Command UPIU with READ(10) CDB
    • Sets up PRDT for DMA buffers
    • Writes UTRLDBR (doorbell) register
  7. UFS Controller:
    • Fetches UTRD from memory
    • Sends Command UPIU to device via UniPro/M-PHY
  8. UFS Device:
    • Processes SCSI READ command
    • Fetches data from NAND flash
    • Sends Data IN UPIU back to host
  9. UFS Controller:
    • DMA transfers data to system memory
    • Sends Response UPIU
    • Generates interrupt
  10. UFS Driver completes request, returns data to block layer

Total latency: Typically 50-200μs depending on queue depth and device state.

Advanced Features

Command Queuing

UFS supports up to 32 outstanding commands, allowing:

  • Parallelism: Multiple commands in flight
  • Reordering: Device can optimize execution order
  • Higher Throughput: Amortize command overhead

Background Operations (BKOPS)

Device can perform maintenance in background:

  • Garbage collection
  • Wear leveling
  • Error correction table updates

Host software must periodically allow BKOPS when idle.

Write Booster

Temporary high-speed write buffer for improved burst performance:

  • Small SLC (Single-Level Cell) buffer
  • Automatic flush to main storage
  • Configurable buffer size

Host Performance Booster (HPB)

Device shares logical-to-physical mapping with host:

  • Host can issue physical address reads
  • Bypasses device FTL lookup
  • Reduces read latency

Power Management

UFS defines multiple power modes:

Active: Full performance Sleep: Low power, fast resume PowerDown: Deeper sleep, slower resume
Hibernate: Minimal power, slowest resume

Gear switching (M-PHY speed grades):

  • HS-G1: 1.46 Gb/s per lane
  • HS-G2: 2.9 Gb/s per lane
  • HS-G3: 5.8 Gb/s per lane
  • HS-G4: 11.6 Gb/s per lane

Runtime power management in Linux automatically transitions between states based on idle time.

Debugging and Tools

Kernel Tracing

# Enable UFS tracepoints
echo 1 > /sys/kernel/debug/tracing/events/ufs/enable

# Monitor UFS commands
cat /sys/kernel/debug/tracing/trace_pipe | grep ufs

Device Health

# Check UFS device attributes
cd /sys/class/scsi_device/0:0:0:0/device/
cat vendor model health_descriptor

Performance Monitoring

# I/O statistics
iostat -x 1 /dev/sda

# UFS-specific stats (if available)
cat /sys/kernel/debug/ufshcd/ufs_stats

Common Issues and Solutions

Issue: Random performance drops Solution: Check BKOPS status; ensure sufficient idle time for maintenance

Issue: High latency during writes Solution: Enable write booster if supported; check I/O scheduler settings

Issue: Device initialization failures Solution: Verify power sequencing; check clock configuration in device tree

Issue: Link training failures Solution: M-PHY signal integrity problems; check PCB layout and power supply

Conclusion

UFS represents a significant leap forward in mobile storage technology. Understanding its architecture—from SCSI commands to M-PHY signaling—is essential for:

  • System Designers: Making informed trade-offs between performance, power, and cost
  • Driver Developers: Implementing and debugging UFS host controllers
  • Performance Engineers: Optimizing I/O workloads for UFS characteristics
  • Application Developers: Understanding storage behavior for better software design

As UFS continues evolving (UFS 4.0 and beyond), mastering these fundamentals provides the foundation for working with next-generation storage systems.

Further Reading