Blog 

Sidebar

RECENT ARTICLES

DeepSeek LLM on the Edge: A Performance Showdown - YY3588 vs. Jetson Orin vs. Raspberry Pi 5

On By laojunlin / 0 comments

The YY3588 is a high-performance AIoT development board from Youyeetoo Technology. AIoT, or Artificial Intelligence of Things, refers to the integration of artificial intelligence technology with the Internet of Things to achieve intelligent connectivity for everything.

As large language models (LLMs) continue to become more lightweight, deploying models with hundreds of millions of parameters on edge devices has become a reality. This article uses the Youyeetoo YY3588 (based on the Rockchip RK3588) as the hardware platform to test its performance when deploying models from the DeepSeek series, exploring the potential of running large models in edge computing scenarios.

1. Hardware and Software Environment

1.1 YY3588 Development Board Basic Configuration

1.1.1 Core Hardware

  • NPU: 6TOPS computing power (INT8) + Mali-G610 GPU
  • Memory & Storage: 16GB LPDDR4X (Tested bandwidth 68GB/s) | 512GB NVMe SSD (Expanded via PCIe 3.0 x4 interface)

This powerful SBC computer offers flexible memory and storage configuration options. For memory, it supports various LPDDR4 specifications up to 16GB. For storage, it provides multiple choices including eMMC, SATA SSD, and MicroSD card, with support for up to 256GB of eMMC storage, ensuring ample data space.

1.1.2 Software Stack

  • System: Ubuntu 22.04 LTS (RK3588 custom kernel 5.10)
  • Inference Framework: ONNX Runtime 1.16 + RKNN-Toolkit2 1.6
  • Optimization Tool: DeepSeek Official Quantization Toolchain v0.3

2. DeepSeek Model Deployment

2.1 Model Selection and Optimization

  • Test Model: DeepSeek-MoE-16B (4.3GB after sparsification)
  • Quantization Scheme:
python quantize.py --model deepseek-16b-fp32.onnx \ --output deepseek-16b-int8.rknn \ --dataset calibration_data/ \ --quant_type hybrid
  • Optimization Results:
    - Model size reduced to 1.2GB (72% compression rate)
    - Memory usage dropped from 12GB to 3.8GB

2.2 Key Steps for Deepseek-R1 1.5B Model Deployment

Here is a brief tutorial for those looking to deploy LLM on edge devices.

2.2.1 Ubuntu 22.04 Host Setup:

# Download rknn-llm git clone https://github.com/airockchip/rknn-llm.git # Install miniforge3 and conda wget -c https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh chmod 777 Miniforge3-Linux-x86_64.sh ./Miniforge3-Linux-x86_64.sh ## Confirm successful installation conda -V

2.2.2 Create RKLLM-Toolkit Conda Environment:

source ~/miniforge3/bin/activate conda create -n RKLLM-Toolkit python=3.8 conda activate RKLLM-Toolkit pip3 install rkllm-toolkit/packages/rkllm_toolkit-1.1.4-cp38-cp38-linux_x86_64.whl # Check for successful installation (no errors means success) python

2.2.3 Convert DeepSeek-R1-1.5B from HuggingFace to RKLLM Model:

cd examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/export/ python export_rkllm.py

The converted model is: DeepSeek-R1-Distill-Qwen-1.5B.rkllm

2.2.4 Compile Libraries and Demo

Download the cross-compilation toolchain (if a complete SDK has been downloaded, the cross-compilation toolchain within the SDK can be used).

# Modify compiler path vim examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/deploy/build-linux.sh

Start Compilation:

cd examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/deploy/ bash build-linux.sh

Generate Libraries and Demo:

rknn-llm/examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/deploy/install/demo_Linux_aarch64$ ls lib llm_demo

2.2.5 Run Model On-Device:

Push the library, demo, and converted model to the board, then execute the demo.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:./lib export RKLLM_LOG_LEVEL=1 ./llm_demo DeepSeek-R1-Distill-Qwen-1.5B.rkllm 10000 10000

2.2.6 Related Resources

Wiki: https://wiki.youyeetoo.com/YY3588

2.2.7 Running Process Screenshots

3. Performance Test and Comparison

3.1 Inference Speed Test (Input length 256 tokens)

Execution Mode First token latency Throughput (tokens/s) Power (W)
CPU (A76 Quad-core) 850ms 4.2 8.1
GPU (Mali-G610) 420ms 9.8 6.5
NPU (INT8 Quantized) 220ms 18.5 4.3

3.2 Stress Test

  • Multi-tasking: Simultaneous Q&A, summary generation, and sentiment analysis.
    - Resource Usage: NPU 85% / Memory 12GB / Temperature 72℃
    - Latency Fluctuation: ±15% (Superior to Xavier NX performance)
  • Long-text Processing: Input 4096 tokens from a legal document.
    - Memory Management: Implemented chunked loading via mmap to avoid Out-of-Memory (OOM) errors.

4. Typical Application Scenario Verification

4.1 Intelligent Customer Service System

  • Test Case: E-commerce after-sales consultation scenario.
  • Actual Results:
    - Response Time: Average 1.2 seconds/round (including network transmission)
    - Accuracy: 88.7% (Compared to 92.1% from a cloud API)
    - Offline capability: Basic services can be maintained even when disconnected from the network.

4.2 Local Knowledge Base Search (RAG)

4.2.1 Architecture Design:

Graph: User Query --> Embedding Model Embedding Model --> FAISS Vector Database FAISS Vector Database --> DeepSeek Generate Answer DeepSeek Generate Answer --> Output Response

4.2.2 Performance:

  • Latency for millions of document retrievals: <300ms
  • Supports RAG (Retrieval-Augmented Generation) mode

5. Horizontal Comparison and Scenario Recommendations

When looking for a Jetson Orin alternative or a powerful Raspberry Pi upgrade, this is how the YY3588 stacks up as a piece of edge AI hardware.

Comparison Item YY3588 + DeepSeek Raspberry Pi 5 + Llama 2-7B Jetson Orin + DeepSeek
Single Inference Power 4.3W 7.8W 12.3W
Tokens/¥ Ratio 428 196 315
Typical Scenario Enterprise Edge Gateway Education / DIY High-End Robotics

6. Conclusion

The combination of the YY3588 and DeepSeek validates the feasibility of deploying large models on the edge. The deep, synergistic optimization between its NPU and software stack demonstrates the progress of the domestic chip ecosystem. Although there are still limitations in handling ultra-long text and supporting massive-scale models, it is more than sufficient to open up new imaginative possibilities for intelligent terminal devices.

Ready to experience 6 TOPS NPU performance?

Buy YY3588 Board

Tags
Previous post
Next post

Sign-up for EllaNews

Stay informed about the latest style advice and product launches.
Learn more about our emails and our Privacy Policy.