
What I've made
Apps, open-source tools, research code, and technical writing.
Open Source
Developer tools, libraries, and experiments shipped in public.


Open Source • Code Search
SIFS
SIFS Is Fast Search
A fast hybrid code search tool for agents, available as a CLI, Rust crate, and local MCP server.

Open Source • macOS CLI
clipmem
Local clipboard memory
A local-only macOS clipboard history backed by SQLite with a JSON-first CLI for agents and humans.
Open Source • LLM Benchmark
apocalypse-bench
LLM survival-task benchmark
A benchmark runner with structured judge rubrics and a Next.js dashboard for exploring LLM survival-task runs.

Open Source • React Native UI
react-native-dotgrid
Animated dot-matrix display
An animated dot-matrix display component for React Native with Skia-first rendering and UI-thread animations.
Apps
Native apps for focused workflows and everyday use.
Apps
BrainFlow
AI voice notes • iOS / Android
Voice notes that become structured summaries, tasks, and tags — designed for long brain-dump recordings.
Apps
Gravity Notes
Minimal notes • iOS / Android
A simple notes app based on the append & rescue method — fast capture, fast search, no distractions. macOS coming soon.
Research
Scientific computing and machine-learning work from imaging research.

Research • ML Segmentation
ParticleSegmentation
Classical ML segmentation
Classical ML pipelines for particle segmentation in synchrotron X-ray tomography datasets.

Research • Deep Learning
ScrambledSeg
Deep learning segmentation
A deep-learning segmentation model trained on scrambled synchrotron imaging data for improved generalisation.
Writing
Technical essays and benchmark writeups.
Optimising scientific software with large language models
AI can do stuff now
Frontier large language models are now capable of performing impressive and creative feats of intellect. The amount of time models can work for, along with their general tenacity, has increased dramatically over the past several years.
Autoresearch
Andrej Karpathy's autoresearch method gives an agent a program to optimise, tooling, and a test. With minor prompt engineering, the agent can make edits, run experiments, and accumulate improvements over time.
TomoJAX
TomoJAX uses JAX to reconstruct and align tomography data by differentiating a loss between measured and predicted projections with respect to geometric parameters for each view.
Auto-tomo-research-JAX
Using GPT-5.4, Codex CLI, and Modal GPU runs, the agent explored 213 experiments and found 33 tweaks that lowered reconstruction error, including several substantial improvements.
Writing • 26th March 2026
Optimising scientific software with large language models
Technical article
How I used GPT-5.4, Codex CLI, and an autoresearch agent to improve TomoJAX, a JAX-based tomography tool for alignment and reconstruction.
From finger to photon and back: the complete journey of a keystroke through an LLM
Part I: The mechanical-to-electrical boundary
Your finger descends over approximately 10 milliseconds, an eternity in computer time, but constrained by the biomechanics of muscle contraction and the travel distance of a typical key. During this press, you're closing a circuit in a keyboard matrix. Most keyboards don't give each key its own dedicated wire. Instead, keys sit at intersections of a row-column matrix while the keyboard controller rapidly scans columns and reads rows to detect which intersections are closed.
The USB HID report
Once the firmware confirms a genuine keypress, it constructs a Human Interface Device report. The keyboard doesn't send ASCII characters. It sends scan codes, position-based identifiers that mean the key in position 0x04, which happens to be A on a US QWERTY layout. Your operating system handles the mapping from scan codes to characters.
Operating system event handling
Your USB host controller receives the HID report when it completes a scheduled poll. This triggers a hardware interrupt to the CPU, saving its current state and jumping to the interrupt handler. The handler copies the report into a kernel buffer, acknowledges the interrupt, and schedules deferred work for the slower processing that turns raw physical signal into a usable input event.
The physical network layer
Your packet leaves the kernel as bytes in a DMA buffer. The network interface card reads this buffer directly and begins serialization. Each bit transition drives electrons through a differential pair before optical conversion pushes photons into glass fiber, where the signal travels through metro networks, backbone links, amplifiers, switches, and datacenter fabric.
Writing • 12th January 2026
From finger to photon and back: the complete journey of a keystroke through an LLM
Technical article
What actually happens when you press a key and an AI responds? I mean physics-wise. The version where we trace individual electrons through transistor channels, watch photons bounce through 3,000 kilometres of glass fibre, and count the floating-point operations that separate your question from the model's answer.
Apocalypse-bench: Would your LLM kill you?
Welcome to apocalypse-bench.
I spent the past few days building a benchmark for a question nobody was asking: how useful are LLMs when you need to not just survive, but rebuild civilisation from the ground up? Not chatbots. Not coding helpers. Actual field guides for situations where getting the answer wrong means the survivors die, and humanity is lost for good.
How the scoring works
You can't grade 1,830 survival answers by hand, so I used an LLM-as-judge approach. Each candidate answer gets sent to a separate judge model along with the original question and a structured rubric. The judge returns scores for each criterion and flags whether any auto-fail conditions were triggered.
The quick overview
If you just want the survival rankings, OpenAI gpt-oss-20b leads the pack, but mean score only tells part of the story. The more important question is how often each model's advice would either get you killed, or be completely useless. Chemistry was brutal across the board. Ethics and Organisation saw almost no auto-fails.
The models: worst to best
Llama 3.1 finished last in every difficulty tier. Liquid was brilliant at irrigation and dangerous in medicine. Nemotron knew textbook definitions but lacked the imagination to improvise. Qwen3 was brilliant, inventive, and unstable. The results show that survival competence lives somewhere different from ordinary benchmark performance.
Writing • 22nd December 2025
Apocalypse-bench: Would your LLM kill you?
Technical article
The dust has settled from whatever satisfying calamity ended civilisation as you knew it. Survivors crawl from the rubble clutching their loved ones and their MacBook Pros. The grid is dead. The internet is a memory.
Need software built?
If you have a complex product, data, or automation problem, send a short note and I can tell you how I'd approach it.