Run Large Language Models Natively In Object Pascal

Admin

3 weeks ago

Developer BeRo1985 has made available PasLLM, a high-performance LLM inference engine written entirely in Object Pascal. Unlike wrappers around llama.cpp or external runtimes, PasLLM implements model loading, quantization, and inference natively, allowing large language models to run directly from Delphi and FreePascal applications.

The framework includes support for:

Native LLM inference in Object Pascal
Local and offline AI execution
Llama, Qwen, Phi, Gemma, Mixtral, DeepSeek, and other model families
Custom high-efficiency quantization formats
Delphi and FreePascal support
CLI and GUI applications
VCL, FMX, and LCL examples
Cross-platform deployment

PasLLM introduces its own optimized quantization formats, designed to reduce model size while preserving inference quality. The project supports a wide range of modern open-weight models and can be integrated directly into Object Pascal projects without requiring Python or external inference frameworks.

Currently focused on CPU inference, PasLLM is aimed at developers who want complete control over local AI execution while staying entirely within the Delphi and FreePascal ecosystem. Future GPU acceleration is planned through the author’s PasVulkan framework.

Explore a pure Object Pascal approach to running modern LLMs locally.