Blogs

My Most Viewed Blogs


Universal Jailbreak: Bigger & Better the LLM, More Vulnerable to Jailbreak
Universal Jailbreak: Bigger & Better the LLM, More Vulnerable to Jailbreak

Researchers at Anthropic have discovered a new vulnerability in large language models called "many-shot jailbreaking" that exploits their large context windows to bypass safety constraints and generate harmful outputs. While limiting context window size could prevent the attack, it would also limit model capabilities, so the most promising mitigation is modifying prompts before passing them to the model.

April 8th, 2024

Using Qdrant on Google Cloud to Build Enterprise RAG Applications
Using Qdrant on Google Cloud to Build Enterprise RAG Applications

Step-by-step guide for building an enterprise-scale Retrieval-Augmented Generation (RAG) system using Qdrant, a high-performance vector database, on Google Cloud. By combining Qdrant's vector search capabilities with Google Cloud's scalability and integrating with language models like GPT-3.5-turbo or Gemini-Pro, businesses can create efficient RAG applications to extract insights from large volumes of unstructured data.

March 29th, 2024

Accelerating Document Embedding Generation with Ray, FastEmbed, and Qdrant
Accelerating Document Embedding Generation with Ray, FastEmbed, and Qdrant

This article explores an efficient solution for accelerating document embedding generation using Ray for parallelization, FastEmbed for faster embeddings, and Qdrant for storage and retrieval. By leveraging these technologies, the processing time for generating embeddings from thousands of documents can be significantly reduced, enabling faster insights from unstructured data.

March 13th, 2024

Dark Secrets of BERT
Dark Secrets of BERT

Despite BERT's impressive performance, it has several limitations and secrets. These include its inability to effectively leverage all its linguistic knowledge, its reliance on shallow heuristics learned from datasets, its overparameterization, and the variability in performance across fine-tuning runs.

February 6th, 2024

Fine-Tuning Large Language Models for Custom Tasks Using Hugging Face TRL
Fine-Tuning Large Language Models for Custom Tasks Using Hugging Face TRL

This post demonstrates how to fine-tune large language models (LLMs) for custom tasks using Hugging Face's Transformer Library (TRL). It walks through the process of preparing data, setting up the environment, fine-tuning CodeLlama-7B with Parameter-Efficient Fine-Tuning (PEFT), evaluating performance, and deploying the model using Text Generation Inference.

February 5th, 2024

Mixture of Experts Explained
Mixture of Experts Explained

Mixture of Experts (MoEs) are a revolutionary approach to efficiently scale transformer models by replacing dense feed-forward layers with sparse MoE layers. While MoEs enable faster pretraining and inference, serving them can be challenging due to high parameter counts, but techniques like distillation, modified routing, and expert aggregation help streamline deployment.

February 3rd, 2024

How a Slight Prompt Change Significantly Enhanced Claude 2.1 AI Performance
How a Slight Prompt Change Significantly Enhanced Claude 2.1 AI Performance

A slight change in the prompt given to Claude 2.1 AI significantly enhanced its performance in retrieving relevant information from a long document context. Anthropic responded by explaining how the prompt change leveraged Claude's capabilities more effectively, highlighting the importance of prompt engineering in AI performance.

February 1st, 2024

Run any LLM on Distributed Multiple GPUs Locally Using Llama_cpp
Run any LLM on Distributed Multiple GPUs Locally Using Llama_cpp

This tutorial demonstrates how to efficiently run fine-tuned Language Learning Models (LLMs) on distributed multiple GPUs using the Llama.cpp library. By leveraging the power of distributed computing and optimizing configurations, users can achieve ultra-fast performance when running LLMs locally on their PCs or Macs.

January 27th, 2024

Introduction to LLMs and the generative AI : Part 6— Application and Deployment
Introduction to LLMs and the generative AI : Part 6— Application and Deployment

This article covers key considerations and techniques for optimizing and deploying large language models in real-world applications. It discusses model distillation, quantization, pruning, connecting LLMs to external data sources, interacting with external applications, and leveraging frameworks like ReAct and LangChain to build intelligent applications.

December 23rd, 2023

Introduction to LLMs and the generative AI : Part 5— RLHF
Introduction to LLMs and the generative AI : Part 5— RLHF

This article explores Reinforcement Learning from Human Feedback (RLHF) as a technique to align large language models with human values and reduce harmful content generation. It covers the RLHF process, reward models, the PPO algorithm, and introduces Constitutional AI as a scalable solution for defining rules and principles to guide model behavior.

December 11th, 2023

Introduction to LLMs and the generative AI : Part 4— PEFT with with LoRA and Prompt Tuning
Introduction to LLMs and the generative AI : Part 4— PEFT with with LoRA and Prompt Tuning

This article explores Parameter Efficient Fine-Tuning (PEFT) techniques, particularly Low-rank Adaptation (LoRA) and Prompt Tuning, as solutions to the memory and computational challenges of fine-tuning large language models. These methods significantly reduce trainable parameters while maintaining performance, enabling efficient model adaptation on consumer hardware.

November 17th, 2023

Introduction to LLMs and the generative AI : Part 3— Fine Tuning LLM with Instruction and Evaluation Benchmarks
Introduction to LLMs and the generative AI : Part 3— Fine Tuning LLM with Instruction and Evaluation Benchmarks

This article explores instruction fine-tuning as a technique to enhance the performance of large language models for specific tasks, discussing challenges like catastrophic forgetting and the benefits of multitask fine-tuning. It also covers evaluation metrics like ROUGE and BLEU, and benchmarks such as GLUE, SuperGLUE, and HELM for assessing model performance and capabilities.

October 23rd, 2023

Introduction to LLMs and the generative AI : Part 2 — LLM pre-training and scaling laws
Introduction to LLMs and the generative AI : Part 2 — LLM pre-training and scaling laws

This article explores the pre-training process of large language models, comparing autoencoding, autoregressive, and sequence-to-sequence models. It also discusses techniques for optimizing memory usage and scaling model training across multiple GPUs, as well as the importance of domain adaptation and customization for specialized applications, using BloombergGPT as an example.

September 5th, 2023

Introduction to LLMs and the generative AI : Part 1 — LLM Architecture, Prompt Engineering and LLM Configuration
Introduction to LLMs and the generative AI : Part 1 — LLM Architecture, Prompt Engineering and LLM Configuration

This article explores the transformer architecture behind large language models, discussing key components like attention mechanisms, tokenization, and positional encoding. It also covers prompt engineering techniques, model configuration settings, and outlines the lifecycle of developing and deploying generative AI applications powered by LLMs.

August 23rd, 2023

Setting Up Hugging Face Text Generation Inference on EC2/local
Setting Up Hugging Face Text Generation Inference on EC2/local

This article provides a step-by-step guide for setting up Hugging Face Text Generation Inference on local devices or AWS EC2 instances. It covers installing GPU drivers, setting up Docker, and configuring the NVIDIA Container Toolkit to enable GPU support for efficient text generation inference.

August 1st, 2023

Facebook Research : Detectron Layout PDF Parser : Get Text with Bounding Box
Facebook Research : Detectron Layout PDF Parser : Get Text with Bounding Box

This article demonstrates how to analyze document layout and extract text using OCR with Detectron2 models from Facebook Research. The code converts PDFs to images, detects layout elements like text, titles, and tables, sorts text blocks, performs OCR using Tesseract, and extracts text with corresponding bounding box coordinates.

July 17th, 2023

© Yash Bhaskar