Axiom Futures Week 3 Notes
AI Safety Remastered video
what’s the most important problem in your field? Why are you not working on that? Problem of AI safety:
Sooner or later, build an artificial agent with General Intelligence.
Agent:
- Has goals
- Performs actions to acheive those goals.
Intelligence: blackboxed thing that lets agent choose effective action to acheive goals. General Intelligence: ability to behave independently in a wide range of domains. Humans are most general intelligence agents we know.
Why is it a Problem?
It is difficult to choose goals. e.g. Viktoria Kraknova Deep safety.
- tetris bot pausing screen due to RLHF
This is the default of how these agents behave.
System optimizing n variables where objective depends on subset k will probably set unused variables to extreme values which will be a problem if we care about the other parts of the system. System is ready to take arbitrarily large drawbacks on some variables for arbitrarily small payoffs on others.
Convergent instrumental goal: No matter what your goal is, these behaviours will help you.
- Self preservation
- Goal preservation
- Resource Acquisition
- Self Improvement
Global Catastrophic risk Wikipedia Article
There is a difference between Catastrophic risk and existential risk. Catastrophic risk - significant damage but recoverable. Existential risk - permanently alters the way of living for a significant portion of the world’s population. There is no incentive for any country to invest in prevention because its a global public good. Cognitive biases like hyperbolic discounting further accentuate this problem.
More is different Article
AGI will be qualitatively different from the current technologies. Philosohpy camps looks at limiting thought experiments whereas engineering camp seeks to point out current problems in architectures that may be accentuated. Philosophy perspective fails to see that experimental results generalize well i.e. what we learn about problems in present models can help even the qualitatively different models. Engineering perspective fails to account for the qualitatively different nature of the AI capabilities we can develop.
Overview of Catastrophic risks paper.
- Bioterrorism
- AI can help extremist groups create bioweapons and surpass security checks to releasing biological weapons into the world which they would not have been able to do otherwise Not exactly convinced of this argument: Stuff like government classified data would be closed source. Creating such weapons would require quite sophisticated equipment still. With a little bit more robust biosecurity this seems the most preventable.
- Persuasive AI
- AI that generates misinformation for each echo chamber.
- Polarizes and fractures society until it’s “post truth”
- Usable by totalitarian actors etc. for censorship and all. Fact checking etc. becomes almost impossible. Fake video detection, Fake phone calls.
Pretty significant problem as its already rearing its head.
- It is also a technology adoption problem because a significant portion of the population needs to be educated about the way they engage with AI.
2.5 Suggestions
- Technical research on adversarially robust anomaly detection.
- Restricted Access.
- More biosecurity
-
Legal Liability for creators of general purpose AIs - purchasing insurance makes the cost of company products more realistically represent externalities.
- Military Arms Race
- Lethal armed weapons e.g. drone strikes this is already starting to be seen. It is hard to embed morality in a system optimized for winning a war.
- Cyberwarfare
-
Corporate Arms Race
- Proxy Gaming/Goodharting
- Goal Drift
- Power Seeking: Instrumental Goal
Excercise
Story: Persuasive AI As election season approaches people are fed their own personal propaganda. Opposition party leaders videos are faked and misinformation is used to incite violence along some fault lines. Social Media and corporations are incentivised towards polarizing fake content. This destabilizes a country which creates a negative feedback loop. Though this would not be a cause for existential risk in itself, this phenomenon would not be seen in only one country. It would be widespread as people seeking power would do it in anyway possible. This will have huge implications for geopolitics as well e.g. fabricating historical documents or information and has the potential to erode the function of society as a whole as rational conversation to address some underlying truth is essential for us to remain grounded in reality.
Excercise Partnero
corporate/government arms race - incentive to not Intrinsic Power seeking instrumental power seeking blackmailing to prevent switching off
Threat models
risk you are looking at Ways it manifests How to mitigate it
- Power Seeking
Adversarial: Novel ways of blackmailing. Remove the off switch.
-
Persuasive AIs Best Defense: Better conversations, more epistemic humility, ensuring critical thinking. Adversarial: Brain numbed with content
-
Bioterrorism Best Defense: Generally available LLM shouldn’t be able to train on such data. Adversarial: Data poisoning and jail breaking.
Adversarial Robustness
Metaethics: Long Termism is kinda fallacious because infinite utility assigned to some event in the future.
Understood the scoping of the AI safety problem better Need to learn more about ethical foundations of long termism Structuring better incentives for corporations is an important way to approach this problem.
- Discussion of X risk scenarios and reading more about them was interesting.
- Proxy gaming is something new I learnt.
- I need to learn much more about the ethical foundations of these systems and corporation incentives.