Link List :: 2025-02-16

February 16, 2025 10 minutes read

links

https://www.runpulse.com/blog/why-llms-suck-at-ocr

We believed LLMs could solve data extraction by plugging in the latest models.
However, we found that their probabilistic approach favors common words over exact transcription.
LMMs suck at complex OCR because they are excellent for text-generation or summarization tasks but falter at precise character recognition.
Their probabilistic nature causes fatal errors in OCR tasks due to lossy transformations and blindspots.

*I. How Do LLMs “See” and Process Images?*
LLMs process images through high-dimensional embeddings prioritizing semantic understanding over precise character recognition.
The attention mechanism splits images into fixed-size patches, losing fine-grained spatial relationships and critical information about data relationships.
LMMs struggle with complex 2D relationships like table structure recognition and extraction.

*II. Real-World Failures and Hidden Risks*
LMMs fail in catastrophic ways for business-critical applications in industries like legal and healthcare.
Failures include financial and medical data corruption, equation solving problems, and prompt injection + ethical vulnerabilities.

https://github.com/ms-jpq/sad

Space Age sed is a Batch File Edit tool that shows a nice diff of proposed changes before committing them, allowing double-checking of edits.
Unlike sed, this tool enables selective replacement and choosing the clustering factor for changes using –unified=.
It can display side-by-side changes using delta if installed.
To replace all occurrences of a pattern with another in a repository, use find "$FIND_ARGS" | sad '<pattern>' '<replacement>'.

https://old.reddit.com/r/LocalLLaMA/comments/1im35yl/how_to_scale_rag_to_20_million_documents/

Popular vector databases for scaling RAG include Weaviate, PGVector, Pinecone, and Qdrant, with hybrid RAG techniques mentioned for optimization.
ParadeDB (Postgres + pg_vector) and LanceDB are suggested for scalability and hybrid search.
GraphRAG/LightRAG approaches can be used but are computationally expensive due to graph-based complexity.
Horizontal scaling is supported by modern vector DBs through sharding and replication, though graph traversal slows down as data grows.
Embedding models like stella_en_400M_v5 are recommended, with binary embeddings (quantized vectors) suggested to reduce storage and improve performance.
HNSW indexing is a popular choice, while alternatives like IVF or IVF-PQ can improve performance through cluster-based indexing.
Sparse rerankers like BM25 combined with dense embeddings enable hybrid search for better results.
Index creation is more efficient when the index is dropped before ingestion and rebuilt after data ingestion is complete.
Chunking strategies, like adaptive window chunking, dynamically adjust chunk sizes based on dataset requirements.
Proper chunk overlap is critical to avoid losing context during retrieval.
Preprocessing should include metadata extraction, semantic enrichment, and duplicate removal to optimize embeddings and indexing.
Efficient pipelines are essential for handling diverse document types such as PDFs, scanned images, and statistical data.
Segmenting documents by topic or content type and routing them to appropriate vector DBs improves pipeline efficiency.
Middleware layers can compress, deduplicate, and summarize embeddings to optimize data ingestion.
Azure AI Search is recommended for enterprise-level ingestion pipelines.
Hybrid RAG combines dense vector search and sparse retrieval (e.g., BM25) for improved performance and relevance.
Hierarchical RAG methods involve topic-based sorting before diving into chunk-level retrieval.
ElasticSearch and BM25 are suggested as alternatives to vector DBs for certain domains with better relevance.
Fine-tuning models on 20M documents can encode domain-specific knowledge but is more expensive and risks overfitting.
RAG is cheaper, faster, and more flexible than fine-tuning and is preferred for large-scale data retrieval.
Embedding 20M documents can take significant time but can be parallelized with multiple GPUs, costing between $1,000 and $20,000.
Trade-offs between speed, cost, and accuracy can be adjusted by tweaking clustering sizes or chunking strategies.
Ensure the dataset is relevant and up-to-date since larger scales do not always yield better results.
Vendor solutions like Azure AI Search are helpful but should be used cautiously to avoid vendor lock-in.
Iterative testing is necessary to identify the best combination of embeddings, chunking, and retrieval strategies for the specific dataset.
Critics argue RAG struggles at scale, but others dismiss this as overly pessimistic, citing successful implementations at larger scales.

https://old.reddit.com/r/Python/comments/1ime8ja/a_modern_python_repository_template_with_uv_and/

Showcased a modern Python repository template
Features UV and Just for self-hosting
Aims to provide a clean and minimalistic template
Includes example use cases and explanations
Meant to serve as a starting point for new projects

https://github.com/ranjanprj/agentollama

The Agentollama Framework enables creation, execution, and monitoring of intelligent Agents without writing business logic.
AI determines which functions to call dynamically, without hardcoding.
Generates code dynamically for API calls using AI.
Tools are loaded dynamically when an Agent is called to perform a task.
Can be used to integrate with legacy applications for productivity enhancement.
Real-time execution step tracking and logs of task execution and decision-making by Agents.
Helps debug why a particular Agent took a specific action.
Ensures Agents produce outputs in a predefined format.
Prevents deviation in reasoning and decision-making.
Essential for enterprise use cases where multiple API interactions must follow strict formats.
Vectorized file storage for intelligent querying.
Retrieval-Augmented Generation (RAG) Agents to enhance decision-making with external knowledge.
Loop Agents to iterate over datasets and execute workflows dynamically.
Enables seamless multi-agent orchestration for business processes.
Supports the DeepSeek R1 (8B Model) Integration for on-device tool code generation.
UI-based Agent Execution, no need to touch an IDE!
Performance Metrics & Testing (Upcoming Feature) to analyze Agent efficiency and decision-making accuracy.
Compatible with Python 3.x.
Built using Ollama (AI Model Server), Django, and Vector Database.
Clone the repository via git clone https://github.com/your-repo/agentollama.git
Create and activate virtual environment: python -m venv venv, source venv/bin/activate
Install dependencies: pip install django ollama
Run Ollama server in the background with Llama3.1 8B Model.
Run the Django server: python manage.py runserver
Define Agents through the UI and automate workflows effortlessly.
Advanced features include dynamic business testing & performance metrics, advanced agent collaboration, and enhanced RAG capabilities.

https://github.com/mpaepper/vibevoice

Hi, I’m Marc Päpper and I extended Vlad’s work to run with a local whisper model.
Original inspiration: whisper-keyboard by Vlad.
Faster Whisper implementation.

https://github.com/Pravko-Solutions/FlashLearn

This is an introduction to the FlashLearn library, which provides a simple way to integrate large language models (LLMs) into workflows. The library aims to make it easy for startups to use LLMs in their products.

The text describes how to create custom LLM transformations using FlashLearn’s skills API, which allows users to create and reuse pre-trained LLM models. It also discusses the library’s features, such as parallel processing, cost estimation, and error handling.

Here are some key points from the introduction:

Simple and Uniform Workflow: FlashLearn provides a simple and uniform workflow for using LLMs, making it easy for startups to integrate these models into their products.
Consistent JSON Output: The library ensures that all output is well-structured JSON, which makes it easy to analyze and debug the results.
Parallel Processing: FlashLearn can process multiple tasks in parallel, which significantly improves performance and reduces processing time.
Cost Estimation: The library provides an estimate of the cost associated with running LLMs, which helps users plan their workflows and budget accordingly.
Error Handling: FlashLearn includes error handling mechanisms to ensure that errors are properly reported and handled, reducing downtime and improving overall reliability.

Some examples of how to use FlashLearn’s skills API include:

Creating a classification skill: The example code shows how to create a classification skill using the ClassificationSkill class, which can be used for tasks such as sentiment analysis or text categorization.
Running multiple tasks in parallel: The example code demonstrates how to run multiple tasks in parallel using the run_tasks_in_parallel method, which significantly improves performance and reduces processing time.
Saving and loading skills: The example code shows how to save and load skills using the save and load methods, making it easy to reuse and share custom LLM transformations.

Overall, FlashLearn provides a simple and powerful way to integrate LLMs into workflows, making it an attractive option for startups looking to leverage these models in their products.

https://www.myforevernotes.com/docs/overview

Forever ✱ Notes is a simple, lightweight digital note-taking method for Apple Notes
It’s robust, versatile, and scalable to grow with you, completely free
It combines fast capturing of journal entries with fast access to connected knowledge
Modular design allows building your own system as you go
Purpose: capture thoughts quickly, reflect on them, and organize meaningfully
Avoids digital graveyard of notes, keeping only what’s truly useful
Grows with you through three levels:
- Level 1 (Beginner): ✱ Home Note, ✱ Tags, and ✱ Collections
- Level 2 (Intermediate): ✱ Hubs to create central notes that link related content
- Level 3 (Expert): ✱ Journal Notes for deeper reflection and planning
Not a replacement for tools like Bullet Journal, Obsidian, Notion, or Roam Research
Not designed for task management, use Apple Reminders instead

https://github.com/13point5/gemini-cursor

🖱️ Second AI cursor on your desktop
🚀 Multimodality: The model can see images, hear voice, and speak
⚡️ Real-time with low latency
📚 Understanding complex diagrams in Research papers, Architecture diagrams, etc
🌐 Navigating complex websites to perform a task like adding a payment method on Amazon
📝 Real time AI tutor with whiteboards
Frontend: Electron, React, TypeScript, Vite
AI: Google Gemini API
A lot of code from the Gemini Multimodal Live API Web console
Built using Google’s Multimodal Live API

https://github.com/mindsdb/flat-ai

Building AI-Agents using LLMs can be simple, but it’s often entertaining to watch them struggle.
A FLAT approach treats LLM interactions like programming constructs for control and predictability.
The F.L.A.T library provides a tiny AI library with features like binary decisions, classification, structured data extraction, tool calling, parallelization, observability, and more.
You can build powerful Agents using absolute simplicity.
Python’s if/else statements and functions are essential for building logic that controls the workflow of your Agent.
LLMs need to be controlled with life choices, like classifying messages into different categories.
You can define object schemas to fill out objects like a trained monkey with a PhD in data entry.
Calling functions is essential, but you want the LLM to figure out arguments for you.
Functions can be called sequentially or in parallel using the get_functions method.
You can also use the get_string and get_stream methods for simple string responses and streaming the response from the LLM.
Tweaking LLM parameters on the fly is possible with a slick configuration pattern that temporarily overrides any parameter without affecting the base configuration.
Logging and observability are crucial to understand what your LLM does in its spare time.
The F.L.A.T library provides methods like set_context, add_context, clear_context, and delete_from_context to manage context.

https://www.microsoft.com/en-us/research/blog/exact-improving-ai-agents-decision-making-via-test-time-compute-scaling/

Autonomous AI agents are transforming decision-making processes in tasks like web browsing, video editing, and file management.
These systems struggle in complex environments due to balancing exploitation and exploration.
A key challenge lies in adapting to unpredictable changes and generalizing knowledge across contexts.
ExACT is an approach for teaching AI agents to explore more effectively.
ExACT combines Reflective-MCTS (R-MCTS) and Exploratory Learning techniques.
R-MCTS builds on MCTS, introducing features like contrastive reflection and a multi-agent debate function.
Exploratory Learning enables agents to dynamically search and adjust computational resources during testing.
ExACT demonstrates strong computational scalability during training and testing.
R-MCTS outperforms previous best-performing methods in VisualWebArena environments.
ExACT achieves performance improvement, scalable test-time compute, and improved generalization on unseen tasks.
Continued exploration is needed to improve AI agents’ decision-making in real-world scenarios with constraints like time or resources.
The ExACT framework can be accessed at the GitHub repository.

https://x.com/mathsppblog/status/1889395510743605715?s=12

Create a Python CLI that’s globally available in your system in 5 easy steps.

Install uv
Init project with uv init --app --package mycli
Write code
Install with uv tool install . -e
Use mycli anywhere in your computer

https://www.minot.bio/home/2025/2/11/publish-interactive-data-visualizations-for-free-with-python-and-marimo

Working in data science can be hard to share insights from complex datasets using only static figures
Pre-generated figures often fail to capture all facets that describe interesting data’s shape and meaning
Powerful technologies are available for presenting interactive figures, but come with tradeoffs
A recently released Python library called marimo opens up new opportunities for publishing interactive visualizations
Marimo allows for free publication of interactive data visualizations across the entire field of data science

https://www.reddit.com/r/Python/comments/1io2ohv/segment_anything_ui_segmentation_object_detection/

Segment Anything UI is a PySide6 application for segmentation/object detection annotation tasks.
The project’s GitHub repository is https://github.com/branislavhesko/segment-anything-ui.