2025¶

2025/01/26
7 min read

How to sleep soundly when using LLMs in production: Evals!

It's 3 AM, and you're wide awake. Your company just deployed a ChatGPT-powered customer service bot, and your mind is racing with questions: "What if it starts giving incorrect information? What if it speaks inappropriately to customers? What if it leaks sensitive data?".

2025/01/24
6 min read

This started off as a bit of fun. I worked with an AI Assistant using dialogue engineering and iterative refinement to create a receipe for very tasty roast potatoes prepared a day in advance, so that I can just reheat them. The family thought these were the most tasty roast potatoes they had ever had. Note these where much better than my ChatGPT roast potatoes I tried a couple of months ago.

2025/01/23
2 min read

DeepSeek-R1: What You Need to Know About This New AI Development

DeepSeek, an AI research organization, has recently released a new open-source AI model that's generating interest in the tech community. It presents some interesting possibilities for businesses considering AI implementation.

2025/01/23
4 min read

DeepSeek-R1: Advancing LLM Reasoning Through Novel Reinforcement Learning Approaches

The recent release of DeepSeek-R1 and DeepSeek-R1-Zero marks a significant breakthrough in the development of Large Language Models (LLMs) with enhanced reasoning capabilities. What sets this research apart is its novel approach to using Reinforcement Learning (RL) as the primary driver for developing complex reasoning abilities, challenging the conventional wisdom that extensive Supervised Fine-Tuning (SFT) is necessary.

2025/01/18
8 min read

What Will You Use RAG for in 2025: Beyond Basic Q&A

While many businesses have successfully implemented Retrieval-Augmented Generation (RAG) for basic question-answering systems, 2025 will see this technology expand into more sophisticated applications. The foundations are already laid, and organizations are ready to build upon them with more advanced implementations.

2025/01/13
3 min read

Decoding the UK's AI Ambitions: A Critical Analysis

The UK's AI Opportunities Action Plan represents an ambitious vision for technological leadership. While the plan's goals are commendable, let's examine the key challenges and potential pitfalls.

2025/01/09
5 min read

My takes and predictions for Generative AI in 2025

As we enter 2025, the AI landscape is shifting from raw model scaling to practical implementation and efficiency. Three key trends are reshaping how we build and deploy AI systems: the emergence of dialogue engineering as a new paradigm for human-AI collaboration, the mainstream adoption of RAG, and a growing focus on model efficiency over size. Chinese AI research continues to push boundaries despite hardware constraints, while environmental concerns are driving innovation in model optimization. This analysis explores these developments and their implications for developers, businesses, and the broader tech ecosystem.

Meanwhile, the rapid evolution of AI agents and synthetic data generation is creating new opportunities and challenges - particularly around API development and authentication. Together, these trends point to a 2025 where AI becomes more practical, efficient, and deeply integrated into development workflows.

2025/01/07
3 min read

PRIME: The Secret Behind Making AI Think Better

Ever wonder why AI sometimes struggles with complex reasoning, even though it's brilliant at simple tasks? Picture teaching a child advanced calculus by showing them thousands of solved problems without explaining the steps. That sounds inefficient, doesn't it? That's exactly the challenge we face with current AI systems - until now.

Enter PRIME (Process Reinforcement through Implicit Rewards), a breakthrough approach that's changing how we teach AI to reason. The results are a relatively small 7B parameter model that achieved a 26.7% pass rate on the AIME mathematics competition - outperforming much larger models while using just 1/10^th of the training data.

2025/01/06
3 min read

Understanding RAG: How to Enhance LLMs with External Knowledge

Large Language Models (LLMs) are powerful, but they're not perfect. They can hallucinate, struggle with factual accuracy, and can't access the most current information. This is where Retrieval-Augmented Generation (RAG) comes in – a technique that significantly enhances LLMs by connecting them with external knowledge sources.

Think of RAG as a skilled research assistant working alongside an expert writer. The assistant (retrieval component) finds relevant information from reliable sources, while the writer (language model) crafts this information into coherent, contextual responses. This combination creates something powerful: a system that can generate responses that are both fluent and factually grounded.

2025/01/03
2 min read

DeepSeek-V3: Pushing the Boundaries of Open-Source Language Models

DeepSeek-V3 is a significant achievement in open-source language models, with innovative features and strong performance. The model has been released by DeepSeek, a Chinese AI firm founded and backed by the Chinese hedge fund, High-Flyer. This post explores its key aspects and impact.