AI Safety Fundamentals

I had the opportunity to participate in the AGI Safety Fundamentals Course, where I gained valuable insights into the complexities and importance of artificial general intelligence (AGI) safety. This 12-week course starting February 2023 provided me with a strong foundation in AGI safety research and a better understanding of the AI alignment problem. Throughout the course, I engaged in weekly readings and discussions that covered a wide range of topics, from the motivations and arguments underpinning the field of AGI safety to the proposed technical solutions.


Here are my notes from the course content. They include concepts from the pre-readings, some random thoughts and parts of the discussion.

Week 1 - Artificial General Intelligence

Week 2 - Reward misspecification and instrumental convergence

Week 3 - Goal misgeneralisation

Week 4 - Task decomposition for scalable oversight

Week 5 - Adverserial techniques for scalable oversight

Week 6 - Interpretability

Week 7 - Governance

Week 8 - Careers and Projects


Capstone Project: Is BabyAGI a fire alarm?