AI Safety Fundamentals
I had the opportunity to participate in the AGI Safety Fundamentals Course, where I gained valuable insights into the complexities and importance of artificial general intelligence (AGI) safety. This 12-week course starting February 2023 provided me with a strong foundation in AGI safety research and a better understanding of the AI alignment problem. Throughout the course, I engaged in weekly readings and discussions that covered a wide range of topics, from the motivations and arguments underpinning the field of AGI safety to the proposed technical solutions.
Here are my notes from the course content. They include concepts from the pre-readings, some random thoughts and parts of the discussion.
Week 1 - Artificial General Intelligence
Week 2 - Reward misspecification and instrumental convergence
Week 3 - Goal misgeneralisation
Week 4 - Task decomposition for scalable oversight
Week 5 - Adverserial techniques for scalable oversight
Week 6 - Interpretability
Week 7 - Governance
Week 8 - Careers and Projects