Link List :: 2025-03-22

March 22, 2025 6 minutes read

links

https://www.evanmiller.org/functional-tests-as-a-tree-of-continuations.html

1. **Eliminates code duplication**: No need to repeat steps when testing different actions in the same step.
2. **Eliminates most setup code**: Setup can be done as part of the testing tree with no performance penalty.
3. **Pinpoints source of failing tests**: Failing tests are immediately stopped before running child nodes, making it easier to find the source of failures.
4. **Well-structured tests**: The test hierarchy is clear and easy to understand, with a 1-1 mapping between nodes and something the user sees.
5. **Previous responses in scope**: All previous responses are available in the response variables.

**Why this approach hasn't been widely adopted:**

1. **OO language tendencies**: Many OO languages encourage a "wrecking ball" mentality when it comes to unit tests, destroying all possible state after each test.
2. **Inefficiency**: Testing every rung on a ladder can be inefficient and wasteful.

https://github.com/codespin-ai/codespin-chrome-extension

* This Chrome Extension allows you to use Claude and ChatGPT to edit your local project using new File System APIs available on Chrome.
* The extension is not yet available on the Chrome Web Store (it takes weeks for approval), so it must be installed manually.

https://github.com/patelnav/ctrlspeak

* Minimal Interface: Runs quietly in the background via the command line
* Triple-Tap Magic: Start/stop recording with a quick Ctrl triple-tap - Auto-Paste: Text lands right where you need it, no extra clicks
* Audio Cues: Hear when recording begins and ends
* Mac Optimized: Harnesses Apple Silicon's MPS for blazing performance

https://github.com/owulveryck/gomcptest

* A proof of concept demonstrating Model Context Protocol (MCP) implementation with a custom-built host
* Enabling easy testing of agentic systems through MCP
* Primarily written from scratch for clarity on underlying mechanisms

https://github.com/abilzerian/LLM-Prompt-Library

* This repository contains a curated collection of prompts for various large language models (LLMs) like Deepseek, GPT o3, Claude 3 Opus, Llama3, Gemini, and others.
* The library includes several tools to help you work with prompts:
    + Prompt Validator - Validates the format and contents of prompt files
    + Prompt Mixer - Create new prompts by mixing and matching elements from existing prompts
    + Token Counter - Analyze prompt files to count tokens and estimate API costs
    + Prompt Analyzer - Evaluate the quality of prompts and get suggestions for improvements
    + Prompt Evolution - Automatically optimize prompts through iterative self-improvement cycles
    + Financial Metacognition - Analyze AI interpretations of financial prompts to detect biases and limitations

https://github.com/casparwylie/cascii-core

* Cascii is a web-based ASCII and Unicode diagram builder written in vanilla Javascript.
* It has zero dependencies on any servers, web packing, libraries, and is no-markup and no-stylesheets.

https://github.com/trycua/computer

* Create and run high-performance macOS and Linux VMs on Apple Silicon, with built-in support for AI agents.
* Library
  * Lume: CLI for running macOS/Linux VMs with near-native performance using Apple's Virtualization.Framework
  * Computer: Computer-Use Interface (CUI) framework for interacting with macOS/Linux sandboxes
  * Agent (Experimental): Computer-Use Agent (CUA) framework for running agentic workflows in macOS/Linux dedicated sandboxes

https://arxiv.org/abs/2503.10737

* Title: Commenting Higher-level Code Unit: Full Code, Reduced Code, or Hierarchical Code Summarization
* Abstract: Commenting code is a crucial activity in software development, as it aids in facilitating future maintenance and updates.
* Researchers have proposed various automated code summarization (ACS) techniques to automatically generate comments/summaries for given code units.
* ACS techniques primarily focus on generating summaries for code units at the method level.
* Higher-level code units, such as file-level and module-level code units, are highly useful for quickly gaining a macro-level understanding of software components and architecture.
* To fill this gap, we conduct a systematic study on how to use LLMs for commenting higher-level code units.

https://www.reddit.com/r/ChatGPTCoding/comments/1jdi4o6/i_finetuned_qwen_25_coder_on_a_single_repo_and/

* Tl;dr: The fine-tuned model achieves a 47% improvement in the code completion task (tab autocomplete). Accuracy goes from 25% to 36% (exact match against ground truth) after a short training run of only 500 iterations on a single RTX 4090 GPU.
* Highlights of the experiment:
* Model: qwen2.5-coder 14b, 4-bit quantized
* Training data: Svelte source files from this repo: https://github.com/hcengineering/platform

https://claudio.uk/posts/unvibe.html

Unvibe is an open-source tool that uses large language models to generate code for a given set of specifications. Here's a summary of the article:

**Key Features**

* Unvibe can be used to generate code in various programming languages.
* It uses a simple tree search algorithm to explore the space of possible programs.
* The algorithm starts with a random initial tree spread and attempts different LLM temperatures before picking the most promising nodes.
* Unvibe can run on a Macbook or other low-power hardware.

**Models**

* Small coding models (~7B params) seem to work well for Unvibe, such as qwen2.5-coder:7b and Claude Haiku.
* Large generic models (>20B params) are also effective, but may be slower due to their larger size.
* Reasoning models can sometimes help, but are generally slower than coding models.

**Search Algorithm**

* Unvibe uses a simple tree search algorithm that is suitable for running on low-power hardware.
* The algorithm starts with a random initial tree spread and attempts different LLM temperatures before picking the most promising nodes.

**Sandboxing**

* Unvibe can run on your local machine, but this is not recommended due to the risk of running code generated by an LLM.
* Running Unvibe in a Docker container or as a separate user with limited permissions is a safer option.

**Future Features**

* HTML-based UI to explore the search graph and look at the reward function rise.
* Support for multiple LLMs, with Unvibe swapping between them if the score plateaus.
* Integration with Pytest.
* Support for other programming languages.

https://github.com/robertpiosik/gemini-coder

* Copy folders and files for chatbots or initialize them hands-free using Gemini Coder's browser extension
* Use the free Gemini API for FIM completions, file refactoring, and applying AI-suggested changes

https://github.com/maxnowack/anthropic-proxy

* A proxy server that transforms Anthropic API requests to OpenAI format and sends it to openrouter.ai.
* Enables use of Anthropic's API format with OpenAI-compatible endpoints by sending requests through the proxy server.

https://github.com/prvnsmpth/finetune-code-assistant/

* Blog post: Build Your Own GitHub Copilot
* This repo contains:
*   - Scripts for generating a fill-in-the-middle (FIM) dataset from a codebase 
*   - A Jupyter notebook for running SFT on the generated FIM dataset

https://github.com/santinic/unvibe

* Unvibe is a tool that generates alternative implementations for functions and classes annotated with `@ai`, which has been demonstrated to produce better results than traditional code generation alone.
* It's particularly effective on large projects with decent test coverage and works with most AI providers, including local Ollama, OpenAI, DeepSeek, Claude, and Gemini.
* To use Unvibe, add it as a dependency to your project with `pip install unvibe`, define a new function in your existing Python project, annotate it with `@ai`, and write unit tests to define how the function should behave.
* Use `unvibe` command to search for a valid implementation that passes all the tests, generating many alternatives and feeding back test errors to the LLM until a correct implementation is found.
* Configuration file can be created in `.unvibe.toml` with options such as provider, model, temperature, and cache settings.
* Running Unvibe on your local machine can be risky due to code generation by an LLM; recommended practice is to run it inside a Docker container or create a new user with limited permissions.