Hi there! My name is John. I am a Machine Learning Engineer, and manager focused on AI safety, particularly in scalable oversight and adversarial robustness. I am actively engaged as an Anthropic Contractor in empirical LLM research under the supervision of Ethan Perez and recently released a paper on LLM debate which I am stoked about (check it out below!).

Concurrently, I contribute to advancing speech recognition products at Speechmatics towards the vision of creating seamless low-latency voice assistants. I am particularly proud of the release of our latest speech-to-text system Ursa, which we delivered while I was managing the Accuracy Team.

The aim of this website is for you to learn more about my projects, publications and hobbies (and maybe also to enjoy some AI generated art from Midjourney!) . All accompanying code can be found on my GitHub. Thanks for visiting!

Publications

Debating with More Persuasive LLMs Leads to More Truthful Answers

February 9th 2024 | Akbir Khan*, John Hughes*, Dan Valentine*, Laura Ruis, Kshitij Sachan, Ansh Radhakrishnan, Edward Grefenstette, Samuel R. Bowman, Tim Rocktäschel†, Ethan Perez†

Investigated the potential of weaker language models (non-experts) to assess the correctness of stronger models (experts) via LLM debate, demonstrating a significant improvement in accuracy for both non-expert models and humans in the QuALITY comprehension task. Pioneered the optimisation of expert debaters for persuasiveness in an unsupervised manner, leading to enhanced non-expert capabilities in identifying accurate answers during debates. The paper is under review for ICML 2024.

Read the paper | Visit the repo

Hierarchical Quantised Autoencoders

February 19th 2020 | Will Williams*, Sam Ringer*, Tom Ash, John Hughes, David MacLeod, Jamie Dougherty

We motivate the use of a hierarchy of VQ-VAEs to attain high factors of compression. We show that the combination of stochastic quantization and hierarchical latent structure aids likelihood-based image compression. This leads us to introduce a novel probabilistic training objective. The paper was accepted to Neurips 2020.

Read the paper | Visit the repo

Image prompt: "Stacking hierarchical quantised autoencoders, digital art"

Featured Projects

These are my top machine learning projects that I have been hacking on outside of work. They showcase my interests from self-supervised learning, large language models and AI safety. See more on the Projects page.

Analyzing LLMs' Preference for Incorrect Yet Appealing Answers

May 8th 2023 | Solo Project

Taking inspiration from Anthropic's research, this work examines the evaluation bias in large language models (LLMs) by assessing their tendency to choose pleasant-sounding but incorrect answers. The analysis involved a range of OpenAI models at different scales and revealed that all models had around a 50% chance of selecting such answers, regardless of model size or reinforcement learning from human feedback (RLHF). I am planning on working to refine the test set to further explore this phenomenon.

Visit the repo

Image prompt: "Questions that sound nice but are incorrect, digital art"

AGISF Project: Is BabyAGI a fire alarm?

April 8th 2023 | Solo Project

I am currently in the middle of this project which is part of the AGI Safety Fundamentals course I am part of this year. The aim of this project is to understand if the new auto-prompting and self-prompting frameworks which are popular (such as BabyAGI and AutoGPT) are a cause for concern in terms of AI safety.

Read the logbook | Visit the repo | Twitter thread

Image prompt: "Robot baby surrounded by fire, digital art"

Whisper Interpetability

November 13th 2022 | Team of 4

During our Alignment Jam Interpretability Hackathon project, I explored whether the concept of "logit lens" applies to the encoder and decoder layers in Whisper, an end-to-end speech recognition model. I found that removing layers from the decoder quickly degraded the output, while removing layers from the encoder gradually degraded the output. Others in the team delved deeper into attention patterns for audio examples that showed hallucinations.

Read the report | Visit the repo

Image prompt: "Whisper interpretability hallucinate, digital art"

Meaning Error Rate

October 25th 2022 | Solo Project

Meaning Error Rate (MER) is an alternative for evaluating speech recognition systems that considers changes in meaning. It is automated by using GPT-3, few shot learning, and chain of thought. It is based on NER (not to be confused for named entity recognition) and this is the first novel solution to automate it. This is exciting research for media broadcast firms (who are bound by NER scores in government regulation) as they can avoid expensive human labelling of the severity of errors.

Read the blog | Visit the repo | Watch talk @ Voice 22

Image prompt: "Brain made out of chains with a colourful background complex machine learning parts, digial art"

Emotion Recognition and CPC

September 2nd 2020 | Solo Project

Emotion detection in audio utilising self-supervised representations trained with Contrastive Predictive Coding (CPC). Results have improved from a baseline of 71% to 80% accuracy when using CPC which is a significant relative reduction in error of 30%.

Read the blog | Visit the repo

Image prompt: "Emotion recognition, cartoon"

Posts

April 29th 2023 - AGI Safety Fundamental course notes and capstone project log book

May 5th 2023 - My Thoughts on OpenAI's Alignment Plan

May 7th 2023 - My views on existential risk from AI

Masters Thesis

Automatic Lecture Captioning

June 5th 2019 | Cambridge University Engineering Department

This project developed speaker-specific automatic speech recognition systems to transcribe lectures from the Cambridge University Engineering Department. The systems used language model refinement and acoustic model adaptation to correctly decode keywords chosen from lecture handouts. The quality of the transcription was primarily assessed using keyword occurrences and corresponding recall rates for all lecturers.

Read my thesis | See the slides | Watch the demo

Image prompt: "Futuristic full lecture theatre on machine learning, digital art"