The General Problem

I was entering my credit card transactions manually into my personal finance app YNAB*, and got distracted by a mini vibe coding project — what if I made a personal web app to send screenshots of my bills (not the full PDFs of course) to a LLM to get CSVs I could upload into YNAB**?

This led me down a rabbit hole where I ended up making a side app to compare LLM processing speeds on OpenRouter in converting images of credit card transactions to CSV. The “test” was to parse one screenshot of transactions (about 11) into a given format (date, description, memo, incoming, outgoing), timed, and checked against the expected number of transactions and summed total.

Observations and conclusions:

Gemini 2.5 Flash beat the crap out of every other model I tried (the full list: Claude Haiku 3, Mistral Small and Pixtral, GPT 5-mini and 4o-mini, Llama 3.2 Vision and 4 Scout, Gemini 2.5 Flash Lite and Gemma 3, Qwen 2.5VL, Grok 4 Fast, Kimi VL, GLM 4.5V). 4 seconds to parse the page, vs. 10+ for the next nearest model! I’d previously just done this with ChatGPT, but got bored waiting for it to make me a nice Markdown table, and this is blazingly fast in comparison.
Local models are still currently awful. I fed the same prompts to some qwen and llama models and their ilk (whatever can run on my M4 Pro MacBook Pro with 64GB RAM), and they kept making up output or missing results. Maybe I just don’t have enough experience working with local LLMs, but I fear it’ll be a while before they get good enough for anything other than specialised tasks.
I was trying local LLMs because of cost, but each month of credit card bills costs me under 10 cents to process. Worth it!
With all the time I spent doing this, I could have manually entered all my card transactions by now. Cue XKCD comic about “The General Problem”! There’s always an XKCD comic.

The GitHub repo is at yjsoon/ynab-llm-formatter. Upload one or more images, and get back a CSV in the upload format YNAB expects. That’s it! Try it out, and let me know if you actually end up using it. (And if it’s not quite what you need, fork it and send it to your own AI coding agent to edit!)

* It’s designed for automatic transaction imports using Plaid, which… isn’t supported here. Sigh.

** I just re-read the second half of this sentence, and there’s way too many acronyms for this to have made sense.

LLM testing arena... for one specific task