TensorRT Edge-LLM is NVIDIA's high-performance C++ inference runtime for Large Language Models (LLMs) and Vision-Language Models (VLMs) on embedded platforms. It enables efficient deployment of ...
This project implements various Retrieval-Augmented Generation (RAG) techniques to analyze AWS case studies and technical blog posts using local LLMs (via Ollama) and local embeddings. It demonstrates ...