Skip to content

AI

My AI Coworker Ships Fast, Breaks Things, and Never Takes a Coffee Break

Three projects. Thirty days. Three AI approaches. This is not a tutorial.


The Setup

Over the past month, I ran three very different projects with AI as my primary collaborator (OpenClaw + Claude Opus 4.6). Each project used a different approach : AI as refactoring assistant, AI as autonomous builder, AI as research partner. Not a tool I prompted occasionally. A coworker I paired with for hours.

The projects :

  1. GIA β€” refactoring and translating an existing app. AI as assistant.
  2. Chat my Resume β€” building a chatbot from scratch. AI as autonomous builder.
  3. LLM Bench Lab β€” benchmarking GPUs and writing a technical blog. AI as research partner.

Three approaches. Three very different results. None of them went smoothly.

Blackwell GPUs for Local LLMs : RTX PRO 6000 vs RTX 5070 Ti

The Benchmark Nobody Asked For

Tested on AMD Ryzen 7 9800X3D with llama.cpp b7966 via localscore-bench, February 2026


TL;DR

We benchmarked eight LLM models (1B to 70B parameters) on two Blackwell GPUs with Vulkan and CUDA 13.1, on both Linux and Windows.

The headlines :

  • The $950 card delivers 4 to 7x more tokens per dollar. For models under 12B, the 5070 Ti is the rational choice.
  • Vulkan and CUDA perform within 5 to 15% on cold hardware. Pick whichever works for your setup.
  • The 5070 Ti just works. The PRO 6000 needs server cooling to get to its full potential. Active fans vs passive heatsink is a real differentiator.
  • OS barely matters. Linux and Windows within 5 to 10%.
  • VRAM is the main reason to buy the PRO 6000. It earns its price at 32B+, not at 12B.

Devstral 2 & vibe by Mistral AI the hidden gems of the AI Coding Agent.

For about a year, I have been working daily with various coding assistants, choosing different tools depending on my mood, needs and constraints. My journey has included testing Windsurf and Tabnine professionally, while personally transitioning from being a fervent Copilot user to adopting Claude Code.

During this exploration, I discovered Devstral 2, which ultimately replaced Claude Code in my workflow for several compelling reasons:

  1. Aesthetic Excellence: The tool offers a beautiful user experience.
    From the blog post announcement to the API documentation and vibe itself, the color scheme, visual effects, and overall polish create a distinctly pleasant working environment.

  2. Comparable Performance: In the "me, myself & I benchmark", Devstral 2 code suggestion is on par with Claude Code.
    While both trend to occasionally overlook framework documentation ; they deliver excellent results overall when refactoring, suggesting commit message, or tweaking CSS.

  3. Cost-Effective and Open Source: Devstral 2 is significantly more affordable than Claude Code and is open source.
    Users receive 1 million tokens for trial, with pricing at $0.10/$0.30 for Devstral Small 2 past the 1st million.
    With Claude Code, I frequently hit usage limits, even after employing /compact commands and tracking my /usage.
    And even if you bust the vibe usage limits it has:

  4. Local Execution Capability: Although vibe time to first token can be slower than claude, Mistral offers a crucial advantage !
    Both Devstral 2 & small version are open source with the ability to run entirely on local machines, providing greater control, privacy, and if you have the gear, blazing-fast performance⚑.

The documentation to run it locally is rather sparse and Devstral-2-small is still relatively resource-intensive, therefore needing some tweaks.

Here are the instructions for running Devstral 2 small + vibe on Ubuntu 24.04 with an NVIDIA L40S with 24GB VRAM hosted by Scaleway .