YY3588 vs Jetson Orin vs RPi 5: DeepSeek LLM Edge Benchmark Review

DeepSeek LLM on the Edge: A Performance Showdown - YY3588 vs. Jetson Orin vs. Raspberry Pi 5

On Sep 08, 2025By laojunlin / 0 comments

The YY3588 is a high-performance AIoT development board from Youyeetoo Technology. AIoT, or Artificial Intelligence of Things, refers to the integration of artificial intelligence technology with the Internet of Things to achieve intelligent connectivity for everything.

As large language models (LLMs) continue to become more lightweight, deploying models with hundreds of millions of parameters on edge devices has become a reality. This article uses the Youyeetoo YY3588 (based on the Rockchip RK3588) as the hardware platform to test its performance when deploying models from the DeepSeek series, exploring the potential of running large models in edge computing scenarios.

1. Hardware and Software Environment

1.1 YY3588 Development Board Basic Configuration

1.1.1 Core Hardware

NPU: 6TOPS computing power (INT8) + Mali-G610 GPU
Memory & Storage: 16GB LPDDR4X (Tested bandwidth 68GB/s) | 512GB NVMe SSD (Expanded via PCIe 3.0 x4 interface)

This powerful SBC computer offers flexible memory and storage configuration options. For memory, it supports various LPDDR4 specifications up to 16GB. For storage, it provides multiple choices including eMMC, SATA SSD, and MicroSD card, with support for up to 256GB of eMMC storage, ensuring ample data space.

1.1.2 Software Stack

System: Ubuntu 22.04 LTS (RK3588 custom kernel 5.10)
Inference Framework: ONNX Runtime 1.16 + RKNN-Toolkit2 1.6
Optimization Tool: DeepSeek Official Quantization Toolchain v0.3

2. DeepSeek Model Deployment

2.1 Model Selection and Optimization

Test Model: DeepSeek-MoE-16B (4.3GB after sparsification)
Quantization Scheme:

python quantize.py --model deepseek-16b-fp32.onnx \ --output deepseek-16b-int8.rknn \ --dataset calibration_data/ \ --quant_type hybrid

Optimization Results:
- Model size reduced to 1.2GB (72% compression rate)
- Memory usage dropped from 12GB to 3.8GB

2.2 Key Steps for Deepseek-R1 1.5B Model Deployment

Here is a brief tutorial for those looking to deploy LLM on edge devices.

2.2.1 Ubuntu 22.04 Host Setup:

# Download rknn-llm git clone https://github.com/airockchip/rknn-llm.git # Install miniforge3 and conda wget -c https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh chmod 777 Miniforge3-Linux-x86_64.sh ./Miniforge3-Linux-x86_64.sh ## Confirm successful installation conda -V

2.2.2 Create RKLLM-Toolkit Conda Environment:

source ~/miniforge3/bin/activate conda create -n RKLLM-Toolkit python=3.8 conda activate RKLLM-Toolkit pip3 install rkllm-toolkit/packages/rkllm_toolkit-1.1.4-cp38-cp38-linux_x86_64.whl # Check for successful installation (no errors means success) python

2.2.3 Convert DeepSeek-R1-1.5B from HuggingFace to RKLLM Model:

cd examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/export/ python export_rkllm.py

The converted model is: DeepSeek-R1-Distill-Qwen-1.5B.rkllm

2.2.4 Compile Libraries and Demo

Download the cross-compilation toolchain (if a complete SDK has been downloaded, the cross-compilation toolchain within the SDK can be used).

# Modify compiler path vim examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/deploy/build-linux.sh

Start Compilation:

cd examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/deploy/ bash build-linux.sh

Generate Libraries and Demo:

rknn-llm/examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/deploy/install/demo_Linux_aarch64$ ls lib llm_demo

2.2.5 Run Model On-Device:

Push the library, demo, and converted model to the board, then execute the demo.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:./lib export RKLLM_LOG_LEVEL=1 ./llm_demo DeepSeek-R1-Distill-Qwen-1.5B.rkllm 10000 10000

2.2.6 Related Resources

Wiki: https://wiki.youyeetoo.com/YY3588

2.2.7 Running Process Screenshots

3. Performance Test and Comparison

3.1 Inference Speed Test (Input length 256 tokens)

Execution Mode	First token latency	Throughput (tokens/s)	Power (W)
CPU (A76 Quad-core)	850ms	4.2	8.1
GPU (Mali-G610)	420ms	9.8	6.5
NPU (INT8 Quantized)	220ms	18.5	4.3

3.2 Stress Test

Multi-tasking: Simultaneous Q&A, summary generation, and sentiment analysis.
- Resource Usage: NPU 85% / Memory 12GB / Temperature 72℃
- Latency Fluctuation: ±15% (Superior to Xavier NX performance)
Long-text Processing: Input 4096 tokens from a legal document.
- Memory Management: Implemented chunked loading via mmap to avoid Out-of-Memory (OOM) errors.

4. Typical Application Scenario Verification

4.1 Intelligent Customer Service System

Test Case: E-commerce after-sales consultation scenario.
Actual Results:
- Response Time: Average 1.2 seconds/round (including network transmission)
- Accuracy: 88.7% (Compared to 92.1% from a cloud API)
- Offline capability: Basic services can be maintained even when disconnected from the network.

4.2 Local Knowledge Base Search (RAG)

4.2.1 Architecture Design:

Graph: User Query --> Embedding Model Embedding Model --> FAISS Vector Database FAISS Vector Database --> DeepSeek Generate Answer DeepSeek Generate Answer --> Output Response

4.2.2 Performance:

Latency for millions of document retrievals: <300ms
Supports RAG (Retrieval-Augmented Generation) mode

5. Horizontal Comparison and Scenario Recommendations

When looking for a Jetson Orin alternative or a powerful Raspberry Pi upgrade, this is how the YY3588 stacks up as a piece of edge AI hardware.

Comparison Item	YY3588 + DeepSeek	Raspberry Pi 5 + Llama 2-7B	Jetson Orin + DeepSeek
Single Inference Power	4.3W	7.8W	12.3W
Tokens/¥ Ratio	428	196	315
Typical Scenario	Enterprise Edge Gateway	Education / DIY	High-End Robotics

6. Conclusion

The combination of the YY3588 and DeepSeek validates the feasibility of deploying large models on the edge. The deep, synergistic optimization between its NPU and software stack demonstrates the progress of the domestic chip ecosystem. Although there are still limitations in handling ultra-long text and supporting massive-scale models, it is more than sufficient to open up new imaginative possibilities for intelligent terminal devices.

Ready to experience 6 TOPS NPU performance?

Buy YY3588 Board

Login

Search

Your cart is empty

Estimated total

Blog

BLOG CATEGORIES

RECENT ARTICLES

Competing with NVIDIA Jetson Orin NX: A Review of the Domestic Robot Brain, RDK-S100

FEATURED PRODUCT

Tags

1. Hardware and Software Environment

1.1 YY3588 Development Board Basic Configuration

1.1.1 Core Hardware

1.1.2 Software Stack

2. DeepSeek Model Deployment

2.1 Model Selection and Optimization

2.2 Key Steps for Deepseek-R1 1.5B Model Deployment

2.2.1 Ubuntu 22.04 Host Setup:

2.2.2 Create RKLLM-Toolkit Conda Environment:

2.2.3 Convert DeepSeek-R1-1.5B from HuggingFace to RKLLM Model:

2.2.4 Compile Libraries and Demo

2.2.5 Run Model On-Device:

2.2.6 Related Resources

2.2.7 Running Process Screenshots

3. Performance Test and Comparison

3.1 Inference Speed Test (Input length 256 tokens)

3.2 Stress Test

4. Typical Application Scenario Verification

4.1 Intelligent Customer Service System

4.2 Local Knowledge Base Search (RAG)

4.2.1 Architecture Design:

4.2.2 Performance:

5. Horizontal Comparison and Scenario Recommendations

6. Conclusion

Tags

Sign-up for EllaNews

Search

Trending searches

Country/region

Blog

BLOG CATEGORIES

RECENT ARTICLES

Competing with NVIDIA Jetson Orin NX: A Review of the Domestic Robot Brain, RDK-S100

FEATURED PRODUCT

Tags

1. Hardware and Software Environment

1.1 YY3588 Development Board Basic Configuration

1.1.1 Core Hardware

1.1.2 Software Stack

2. DeepSeek Model Deployment

2.1 Model Selection and Optimization

2.2 Key Steps for Deepseek-R1 1.5B Model Deployment

2.2.1 Ubuntu 22.04 Host Setup:

2.2.2 Create RKLLM-Toolkit Conda Environment:

2.2.3 Convert DeepSeek-R1-1.5B from HuggingFace to RKLLM Model:

2.2.4 Compile Libraries and Demo

2.2.5 Run Model On-Device:

2.2.6 Related Resources

2.2.7 Running Process Screenshots

3. Performance Test and Comparison

3.1 Inference Speed Test (Input length 256 tokens)

3.2 Stress Test

4. Typical Application Scenario Verification

4.1 Intelligent Customer Service System

4.2 Local Knowledge Base Search (RAG)

4.2.1 Architecture Design:

4.2.2 Performance:

5. Horizontal Comparison and Scenario Recommendations

6. Conclusion

Tags

Sign-up for EllaNews