What I've made

Apps, open-source tools, research code, and technical writing.

Open Source

Developer tools, libraries, and experiments shipped in public.

Open Source • JAX CT

TomoJAX

Differentiable CT reconstruction

A differentiable parallel-beam CT projector, reconstruction, and alignment toolkit built in JAX.

Open

Open Source • Code Search

SIFS

SIFS Is Fast Search

A fast hybrid code search tool for agents, available as a CLI, Rust crate, and local MCP server.

Open

Open Source • macOS CLI

clipmem

Local clipboard memory

A local-only macOS clipboard history backed by SQLite with a JSON-first CLI for agents and humans.

Open

Open Source • LLM Benchmark

apocalypse-bench

LLM survival-task benchmark

A benchmark runner with structured judge rubrics and a Next.js dashboard for exploring LLM survival-task runs.

Open

Open Source • React Native UI

react-native-dotgrid

Animated dot-matrix display

An animated dot-matrix display component for React Native with Skia-first rendering and UI-thread animations.

Open

Apps

Native apps for focused workflows and everyday use.

Apps

BrainFlow

AI voice notes • iOS / Android

Voice notes that become structured summaries, tasks, and tags — designed for long brain-dump recordings.

Open

Apps

Gravity Notes

Minimal notes • iOS / Android

A simple notes app based on the append & rescue method — fast capture, fast search, no distractions. macOS coming soon.

Open

Research

Scientific computing and machine-learning work from imaging research.

Research • ML Segmentation

ParticleSegmentation

Classical ML segmentation

Classical ML pipelines for particle segmentation in synchrotron X-ray tomography datasets.

Open

Research • Deep Learning

ScrambledSeg

Deep learning segmentation

A deep-learning segmentation model trained on scrambled synchrotron imaging data for improved generalisation.

Open

Writing

Technical essays and benchmark writeups.

Optimising scientific software with large language models

AI can do stuff now

Frontier large language models are now capable of performing impressive and creative feats of intellect. The amount of time models can work for, along with their general tenacity, has increased dramatically over the past several years.

Autoresearch

Andrej Karpathy's autoresearch method gives an agent a program to optimise, tooling, and a test. With minor prompt engineering, the agent can make edits, run experiments, and accumulate improvements over time.

TomoJAX

TomoJAX uses JAX to reconstruct and align tomography data by differentiating a loss between measured and predicted projections with respect to geometric parameters for each view.

Auto-tomo-research-JAX

Using GPT-5.4, Codex CLI, and Modal GPU runs, the agent explored 213 experiments and found 33 tweaks that lowered reconstruction error, including several substantial improvements.

Writing • 26th March 2026

Optimising scientific software with large language models

Technical article

How I used GPT-5.4, Codex CLI, and an autoresearch agent to improve TomoJAX, a JAX-based tomography tool for alignment and reconstruction.

Open

From finger to photon and back: the complete journey of a keystroke through an LLM

Part I: The mechanical-to-electrical boundary

Your finger descends over approximately 10 milliseconds, an eternity in computer time, but constrained by the biomechanics of muscle contraction and the travel distance of a typical key. During this press, you're closing a circuit in a keyboard matrix. Most keyboards don't give each key its own dedicated wire. Instead, keys sit at intersections of a row-column matrix while the keyboard controller rapidly scans columns and reads rows to detect which intersections are closed.

The USB HID report

Once the firmware confirms a genuine keypress, it constructs a Human Interface Device report. The keyboard doesn't send ASCII characters. It sends scan codes, position-based identifiers that mean the key in position 0x04, which happens to be A on a US QWERTY layout. Your operating system handles the mapping from scan codes to characters.

Operating system event handling

Your USB host controller receives the HID report when it completes a scheduled poll. This triggers a hardware interrupt to the CPU, saving its current state and jumping to the interrupt handler. The handler copies the report into a kernel buffer, acknowledges the interrupt, and schedules deferred work for the slower processing that turns raw physical signal into a usable input event.

The physical network layer

Your packet leaves the kernel as bytes in a DMA buffer. The network interface card reads this buffer directly and begins serialization. Each bit transition drives electrons through a differential pair before optical conversion pushes photons into glass fiber, where the signal travels through metro networks, backbone links, amplifiers, switches, and datacenter fabric.

Writing • 12th January 2026

From finger to photon and back: the complete journey of a keystroke through an LLM

Technical article

What actually happens when you press a key and an AI responds? I mean physics-wise. The version where we trace individual electrons through transistor channels, watch photons bounce through 3,000 kilometres of glass fibre, and count the floating-point operations that separate your question from the model's answer.

Open

Apocalypse-bench: Would your LLM kill you?

Welcome to apocalypse-bench.

I spent the past few days building a benchmark for a question nobody was asking: how useful are LLMs when you need to not just survive, but rebuild civilisation from the ground up? Not chatbots. Not coding helpers. Actual field guides for situations where getting the answer wrong means the survivors die, and humanity is lost for good.

How the scoring works

You can't grade 1,830 survival answers by hand, so I used an LLM-as-judge approach. Each candidate answer gets sent to a separate judge model along with the original question and a structured rubric. The judge returns scores for each criterion and flags whether any auto-fail conditions were triggered.

The quick overview

If you just want the survival rankings, OpenAI gpt-oss-20b leads the pack, but mean score only tells part of the story. The more important question is how often each model's advice would either get you killed, or be completely useless. Chemistry was brutal across the board. Ethics and Organisation saw almost no auto-fails.

The models: worst to best

Llama 3.1 finished last in every difficulty tier. Liquid was brilliant at irrigation and dangerous in medicine. Nemotron knew textbook definitions but lacked the imagination to improvise. Qwen3 was brilliant, inventive, and unstable. The results show that survival competence lives somewhere different from ordinary benchmark performance.

Writing • 22nd December 2025

Apocalypse-bench: Would your LLM kill you?

Technical article

The dust has settled from whatever satisfying calamity ended civilisation as you knew it. Survivors crawl from the rubble clutching their loved ones and their MacBook Pros. The grid is dead. The internet is a memory.

Open

Need software built?

If you have a complex product, data, or automation problem, send a short note and I can tell you how I'd approach it.

Email WhatsApp