LinkedIn GitHub

Some thoughts on AI, LLMs and tech.

Demos

Vector DB · 400M vectors · SPFresh Search 400 million CLIP image embeddings. SPFresh-style IVFFlat over S3, end-to-end ~8 s. yuvals.dev/laion-search/

Writings

Jun 15, 2026 Using eBPF + MCP to let agents record HTTPs traffic Agents make network calls. Recording that traffic without changing the TLS fingerprint or installing a CA means going below the application layer. eBPF + MCP lets an agent attach to OpenSSL host-wide via uprobes, capture plaintext before encrypt / after decrypt, and read it ba... Read →

Jun 10, 2026 Training nanochat [GPT-2] LLM on cheap hardware - 'robust' skill I managed to train nanochat — Karpathy's minimal LLM trainer — on cheap RunPod hardware. After several runs full of errors and slow start times, I paused and made the process more robust: caching torch.compile output by GPU architecture, caching Flash Attention 2 as a wheel, d... Read →

Jun 2, 2026 Calibrating nprobe and nlist for billion-scale vector search If you're running an IVF-based vector index over hundreds of millions of CLIP embeddings, two knobs decide your cost and your recall: nlist and nprobe. The common advice is 'set nlist ≈ √N and pick nprobe by taste.' That's fine for a 1M-vector toy. At 100M–1B+ it stops being u... Read →

Jun 1, 2026 1B vectors database - benchmarking AWS S3 latencies, compaction strategies How many clusters do we need? How many nprobe-s? Testing for 10M, 50M, and 100M vectors. Inspired by SPFresh/SPANN — building a vector database over S3 as many small clusters, with recall@10 measured against LAION-400M ground truth computed on a RunPod A100. Read →

Mar 10, 2026 ICBMs and agentic first interfaces - headless / chat interface [Title == rage bait] What counts as a real 'agentic workflow' isn't a multi-step LLM with seventeen tools. It's a chat interface over a dataset that someone actually needs to query. The example: alarms data + Yad2 rentals + Mamad (saferoom) data, asked as 'houses for rent in southern Israel with a... Read →

Dec 29, 2025 LLM Logic Distillation: Extracting Python Rules from LLMs A pattern I'm using lately is extracting logical (Pythonic) rules from LLMs. Example 1: Extracting HTML Content Let's say you have a business process where LLM helps a lot. For example, extracting text content from... Read →

Dec 18, 2025 Expert Trajectory RAG: Claude Code-Level Quality with Internal Models How to achieve Claude Code-level quality using internal coding models? Disclosure: This approach is for specific scenarios and requires significant pre-computation (tokens). When to Use This This is for you if: You have code you... Read →

Dec 4, 2024 "Mini-Firewall" Over EC2 Machines, AKA use Lambda function to update SecGroup with your IP address Automatically update AWS Security Groups with your dynamic IP address using Lambda functions and API Gateway. Perfect for securing EC2 instances when working from home with changing IP addresses. Read →

Dec 4, 2024 Batch Processing in SQLite: A Deep Dive into Database Field Updates Learn efficient techniques for updating millions of SQLite records using batch processing and smart indexing to avoid database locks. Includes performance benchmarks and strategies for handling large-scale database operations. Read →

Nov 12, 2024 Fine-Tuning Bert using LoRA, hosting on Cloudflare using Cloudflare AI Workers Build and deploy a custom content moderation model by fine-tuning BERT with LoRA on spam detection, then serving it efficiently through Cloudflare AI Workers. Read →

Sep 26, 2024 BFS, DFS, PageRank, AKA — How to run embeddings only on important parts of a website Optimize web scraping and embedding generation by using PageRank algorithm to prioritize important pages, ensuring you process the most valuable content first. Read →

Feb 17, 2024 There are more than 2 UUID types — UUIDv4, 7, ULID, etc… Explore UUID alternatives including UUIDv7, ULID, and HashIDs to solve common problems like database performance, ID ordering, and preventing information leakage in your applications. Read →

Nov 11, 2022 Python — PDB usage and reproducing program execution Master Python debugging with PDB by capturing and reproducing program execution state, including command-line arguments and environment variables for complex debugging scenarios. Read →

Dec 24, 2021 Data Ingestion — Build Your Own “Map Reduce”? Build a lightweight custom map-reduce system in Python for small teams and startups, avoiding the complexity of Hadoop while efficiently processing large datasets with multiprocessing. Read →

Open Source

CommonCrawlParser Fast processing of CommonCrawl; stream and run fast C++ regex parsing [2021] github.com/yuval1024/ParserCC