DeepSeek LLM on the Edge: A Performance Showdown - YY3588 vs. Jetson Orin vs. Raspberry Pi 5
The YY3588 is a high-performance AIoT development board from Youyeetoo Technology. AIoT, or Artificial Intelligence of Things, refers to the integration of artificial intelligence technology with the Internet of Things to achieve intelligent connectivity for everything.
As large language models (LLMs) continue to become more lightweight, deploying models with hundreds of millions of parameters on edge devices has become a reality. This article uses the Youyeetoo YY3588 (based on the Rockchip RK3588) as the hardware platform to test its performance when deploying models from the DeepSeek series, exploring the potential of running large models in edge computing scenarios.
1. Hardware and Software Environment
1.1 YY3588 Development Board Basic Configuration

1.1.1 Core Hardware
- NPU: 6TOPS computing power (INT8) + Mali-G610 GPU
- Memory & Storage: 16GB LPDDR4X (Tested bandwidth 68GB/s) | 512GB NVMe SSD (Expanded via PCIe 3.0 x4 interface)

This powerful SBC computer offers flexible memory and storage configuration options. For memory, it supports various LPDDR4 specifications up to 16GB. For storage, it provides multiple choices including eMMC, SATA SSD, and MicroSD card, with support for up to 256GB of eMMC storage, ensuring ample data space.


1.1.2 Software Stack
- System: Ubuntu 22.04 LTS (RK3588 custom kernel 5.10)
- Inference Framework: ONNX Runtime 1.16 + RKNN-Toolkit2 1.6
- Optimization Tool: DeepSeek Official Quantization Toolchain v0.3
2. DeepSeek Model Deployment
2.1 Model Selection and Optimization
- Test Model: DeepSeek-MoE-16B (4.3GB after sparsification)
- Quantization Scheme:
-
Optimization Results:
- Model size reduced to 1.2GB (72% compression rate)
- Memory usage dropped from 12GB to 3.8GB
2.2 Key Steps for Deepseek-R1 1.5B Model Deployment
Here is a brief tutorial for those looking to deploy LLM on edge devices.
2.2.1 Ubuntu 22.04 Host Setup:
2.2.2 Create RKLLM-Toolkit Conda Environment:
2.2.3 Convert DeepSeek-R1-1.5B from HuggingFace to RKLLM Model:

The converted model is: DeepSeek-R1-Distill-Qwen-1.5B.rkllm
2.2.4 Compile Libraries and Demo
Download the cross-compilation toolchain (if a complete SDK has been downloaded, the cross-compilation toolchain within the SDK can be used).

Start Compilation:
Generate Libraries and Demo:
2.2.5 Run Model On-Device:
Push the library, demo, and converted model to the board, then execute the demo.
2.2.6 Related Resources
Wiki: https://wiki.youyeetoo.com/YY3588
2.2.7 Running Process Screenshots



3. Performance Test and Comparison
3.1 Inference Speed Test (Input length 256 tokens)
| Execution Mode | First token latency | Throughput (tokens/s) | Power (W) |
|---|---|---|---|
| CPU (A76 Quad-core) | 850ms | 4.2 | 8.1 |
| GPU (Mali-G610) | 420ms | 9.8 | 6.5 |
| NPU (INT8 Quantized) | 220ms | 18.5 | 4.3 |
3.2 Stress Test
-
Multi-tasking: Simultaneous Q&A, summary generation, and sentiment analysis.
- Resource Usage: NPU 85% / Memory 12GB / Temperature 72℃
- Latency Fluctuation: ±15% (Superior to Xavier NX performance) -
Long-text Processing: Input 4096 tokens from a legal document.
- Memory Management: Implemented chunked loading via mmap to avoid Out-of-Memory (OOM) errors.
4. Typical Application Scenario Verification
4.1 Intelligent Customer Service System
- Test Case: E-commerce after-sales consultation scenario.
-
Actual Results:
- Response Time: Average 1.2 seconds/round (including network transmission)
- Accuracy: 88.7% (Compared to 92.1% from a cloud API)
- Offline capability: Basic services can be maintained even when disconnected from the network.
4.2 Local Knowledge Base Search (RAG)
4.2.1 Architecture Design:
4.2.2 Performance:
- Latency for millions of document retrievals: <300ms
- Supports RAG (Retrieval-Augmented Generation) mode
5. Horizontal Comparison and Scenario Recommendations
When looking for a Jetson Orin alternative or a powerful Raspberry Pi upgrade, this is how the YY3588 stacks up as a piece of edge AI hardware.
| Comparison Item | YY3588 + DeepSeek | Raspberry Pi 5 + Llama 2-7B | Jetson Orin + DeepSeek |
|---|---|---|---|
| Single Inference Power | 4.3W | 7.8W | 12.3W |
| Tokens/¥ Ratio | 428 | 196 | 315 |
| Typical Scenario | Enterprise Edge Gateway | Education / DIY | High-End Robotics |
6. Conclusion
The combination of the YY3588 and DeepSeek validates the feasibility of deploying large models on the edge. The deep, synergistic optimization between its NPU and software stack demonstrates the progress of the domestic chip ecosystem. Although there are still limitations in handling ultra-long text and supporting massive-scale models, it is more than sufficient to open up new imaginative possibilities for intelligent terminal devices.
Ready to experience 6 TOPS NPU performance?
Buy YY3588 Board