Skip to content

Blog

Save time, resources and money with Latent Diffusion based image generation.

This article shows a novel approach to training a generative model for image generation at reduced training times using latents and using a pre-trained ImageNet latent classifier as a component of the loss function.

The image generation model was trained from an initialised (not pre-trained) state remarkably was less than 10 hours on a single desktop consumer NVIDIA card.

Super Resolution: Adobe Photoshop versus Leading Deep Neural Networks.

Super Resolution is a technique that enhances the quality of an image by increasing its apparent resolution, effectively imagining the detail present in a higher-resolution version. Traditional methods like bicubic interpolation often result in blurred images when upscaling. Recent advancements have introduced more sophisticated approaches, including Adobe Camera Raw's Super Resolution and deep learning models such as the Information Distillation Network (IDN).

Rapid prototyping of network architectures using Super-Convergence using Cyclical Learning Rate schedules.

Super-convergence, achieved through cyclical learning rates, is a powerful yet underutilized technique in deep learning that significantly accelerates model training. By varying the learning rate between high and low boundaries, models can converge in a fraction of the time typically required. This method facilitates rapid prototyping of network architectures, optimization of loss functions, and experimentation with data augmentation, all while reducing training time by orders of magnitude.

Insights on loss function engineering.

In the realm of deep learning for image enhancement, the design of loss functions is pivotal in guiding models toward generating high-quality outputs. Traditional metrics like Mean Squared Error (MSE) and Peak Signal-to-Noise Ratio (PSNR) have been widely used to measure the difference between predicted and target images. However, these pixel-based losses often lead to overly smoothed results that lack perceptual fidelity, as they tend to average out fine details, resulting in blurred images.

Tabular data analysis with deep neural nets.

Deep neural networks (DNNs) have emerged as a powerful tool for analyzing tabular data, offering advantages over traditional methods like Random Forests and Gradient Boosting Machines. Unlike these conventional techniques, DNNs require minimal feature engineering and maintenance, making them suitable for various applications, including fraud detection, sales forecasting, and credit risk assessment. Notably, companies like Pinterest have transitioned to neural networks from gradient boosting machines, citing improved accuracy and reduced need for feature engineering.

An introduction to Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are specialized neural networks primarily used for image processing tasks such as classification and segmentation. They operate by applying convolutional layers that use filters, or kernels, to process input data in smaller, localized regions, effectively capturing spatial hierarchies in images. This localized approach allows CNNs to detect features like edges and textures, making them highly effective for visual data analysis.

How do Deep Neural Networks work?

Deep neural networks (DNNs) are computational models that mimic the human brain's interconnected neuron structure to process complex data patterns. They consist of multiple layers of artificial neurons, each receiving inputs, applying weights, summing the results, and passing them through an activation function to produce an output. This layered architecture enables DNNs to model intricate relationships within data, making them effective for tasks such as image and speech recognition.

U-Net deep learning colourisation of greyscale images

"U-Net deep learning colourisation of greyscale images" explores the application of deep learning techniques to transform grayscale images into colorized versions. Utilizing a U-Net architecture with a ResNet-34 encoder pretrained on ImageNet, the model employs a feature loss function based on VGG-16 activations, pixel loss, and gram matrix loss to achieve high-quality colorization. The Div2k dataset serves as the training and validation source, with data augmentation techniques such as random cropping, horizontal flipping, lighting adjustments, and perspective warping enhancing the model's robustness.

The article demonstrates how time, resources and money can be saved with fine tuning existing models.

Feature based loss functions

In this article, I explores advanced loss functions for training Convolutional Neural Networks (CNNs), particularly U-Net architectures, to enhance image generation tasks. Drawing inspiration from the Fastai deep learning course and the paper "Perceptual Losses for Real-Time Style Transfer and Super-Resolution," the discussion centers on integrating feature activation losses and style losses into the training process. These techniques aim to improve the quality of generated images by focusing on perceptual features rather than solely relying on pixel-wise errors.

U-Nets with ResNet Encoders and cross connections

In this article, the author explores an advanced U-Net architecture that integrates ResNet encoders and cross connections, enhancing the model's performance in image processing tasks. This design incorporates elements from DenseNets, utilizing cross connections to facilitate efficient information flow between layers. The architecture employs a ResNet-based encoder and decoder, complemented by pixel shuffle upscaling with ICNR initialization, aiming to improve prediction accuracy and training efficiency.