Skip to content

2025

Selection and Ensemble Strategies for Embedding Retrieval

In my previous post, I questioned that many RAG systems use embeddings 4-6x larger than necessary. Simple factual content plateaus at 256-512 dimensions. Complex technical content plateaus at 768 dimensions. Test with your own data and you will probably find you need far fewer dimensions than model and service providers recommendations suggest.

What if you cannot afford or consider even the optimal dimensions? This post covers three techniques I tested: domain-adaptive dimension selection, ensemble approaches and cascaded retrieval. These techniques help when you operate below optimal dimensions or need better performance within resource constraints.

Match Embedding Dimensions to Your Domain, Not Defaults

Vector database costs scale with embedding dimensions. Most systems use 768-3072 dimensions. You may only need 256-768.

When you choose an embedding model, you skip the most important decision. You glance at MTEB leaderboards, check the costs, then move straight into chunking strategies and RAG architecture. Most practitioners treat the embedding model as just a configuration variable. But when your embedding model cannot match relevant chunks in the vector search step, everything downstream suffers.

The Human is the Agent: How SolveIt Changed My Programming Journey After 25 Years

I have been programming for over 25 years off and on, with a background in pre-AlexNet AI and having been a technical reviewer for AI publications and courses. As an early adopter of AI tooling and LLMs I thought I had a good sense of what AI could do for programming. Then I joined the first cohort of SolveIt students and something unexpected happened. Despite my experience with AI tools, SolveIt changed how I approach programming in ways I did not anticipate.

Improving LLM & RAG Systems: Essential Concepts for Practitioners

Building effective, production-ready LLM and RAG systems requires more than just theoretical knowledge. This intermediate guide outlines concepts and techniques for overcoming real-world implementation challenges, optimising performance, and ensuring system reliability. Whether you're scaling an existing deployment or planning your first production system, these essential insights will help you navigate the complexities of modern AI LLM & RAG architecture.

Speculative Decoding: Using LLMs Efficiently

Speculative decoding makes large language models (LLMs) work more efficiently.

Large language models are transforming how we write code, but running them efficiently remains a challenge. Even with powerful hardware, code completion can feel sluggish, breaking our concentration just when we need it most. The bottleneck isn't necessarily computational power - it's how efficiently we use it. This is where speculative decoding comes in.

How to Validate AI Solutions Before Committing Resources

The biggest risk in AI projects isn't the technology - it's the gap between expectation and reality. But what if you could validate your AI solution in days, not months?

When marketing and advertising agencies develop AI-powered concepts for clients, they face a practical challenge: how to validate technical feasibility before committing significant resources. Traditional approaches involving detailed specifications and lengthy proposals often prove inefficient with AI projects, where real-world performance can differ significantly from paper specifications.

Building a Context and Style Aware Image Search Engine: Combining CLIP, Stable Diffusion, and FAISS

This is a demonstration of what is possible with rapid prototyping and iterative refinement using AI dialogue engineering tools. This is a prototype of context-aware image locally running search engine that combines CLIP content relevance and Stable Diffusion style embeddings. This type of search could be useful for anyone with large collections of images that are difficult to search, online shops selling stock images or cards, museums or cases where images can't be put into cloud searches due to being business critical or classified.