Axiom Futures Week 3 Notes

AI Safety Remastered video

what’s the most important problem in your field? Why are you not working on that? Problem of AI safety:

Sooner or later, build an artificial agent with General Intelligence.

Agent:

Has goals
Performs actions to acheive those goals.

Intelligence: blackboxed thing that lets agent choose effective action to acheive goals. General Intelligence: ability to behave independently in a wide range of domains. Humans are most general intelligence agents we know.

Why is it a Problem?

It is difficult to choose goals. e.g. Viktoria Kraknova Deep safety.

tetris bot pausing screen due to RLHF This is the default of how these agents behave. System optimizing n variables where objective depends on subset k will probably set unused variables to extreme values which will be a problem if we care about the other parts of the system. System is ready to take arbitrarily large drawbacks on some variables for arbitrarily small payoffs on others. Convergent instrumental goal: No matter what your goal is, these behaviours will help you.
1. Self preservation
2. Goal preservation
3. Resource Acquisition
4. Self Improvement

Global Catastrophic risk Wikipedia Article

There is a difference between Catastrophic risk and existential risk. Catastrophic risk - significant damage but recoverable. Existential risk - permanently alters the way of living for a significant portion of the world’s population. There is no incentive for any country to invest in prevention because its a global public good. Cognitive biases like hyperbolic discounting further accentuate this problem.

Overview of Catastrophic risks paper.

Bioterrorism
- AI can help extremist groups create bioweapons and surpass security checks to releasing biological weapons into the world which they would not have been able to do otherwise Not exactly convinced of this argument: Stuff like government classified data would be closed source. Creating such weapons would require quite sophisticated equipment still. With a little bit more robust biosecurity this seems the most preventable.
Persuasive AI
- AI that generates misinformation for each echo chamber.
- Polarizes and fractures society until it’s “post truth”
- Usable by totalitarian actors etc. for censorship and all. Fact checking etc. becomes almost impossible. Fake video detection, Fake phone calls.
Pretty significant problem as its already rearing its head.
- It is also a technology adoption problem because a significant portion of the population needs to be educated about the way they engage with AI.

2.5 Suggestions

Technical research on adversarially robust anomaly detection.
Restricted Access.
More biosecurity
Legal Liability for creators of general purpose AIs - purchasing insurance makes the cost of company products more realistically represent externalities.
Military Arms Race
1. Lethal armed weapons e.g. drone strikes this is already starting to be seen. It is hard to embed morality in a system optimized for winning a war.
2. Cyberwarfare
Corporate Arms Race
Proxy Gaming/Goodharting
Goal Drift
Power Seeking: Instrumental Goal

Excercise

Story: Persuasive AI As election season approaches people are fed their own personal propaganda. Opposition party leaders videos are faked and misinformation is used to incite violence along some fault lines. Social Media and corporations are incentivised towards polarizing fake content. This destabilizes a country which creates a negative feedback loop. Though this would not be a cause for existential risk in itself, this phenomenon would not be seen in only one country. It would be widespread as people seeking power would do it in anyway possible. This will have huge implications for geopolitics as well e.g. fabricating historical documents or information and has the potential to erode the function of society as a whole as rational conversation to address some underlying truth is essential for us to remain grounded in reality.

Excercise Partnero

corporate/government arms race - incentive to not Intrinsic Power seeking instrumental power seeking blackmailing to prevent switching off

Threat models

risk you are looking at Ways it manifests How to mitigate it

Power Seeking

Adversarial: Novel ways of blackmailing. Remove the off switch.

Persuasive AIs Best Defense: Better conversations, more epistemic humility, ensuring critical thinking. Adversarial: Brain numbed with content
Bioterrorism Best Defense: Generally available LLM shouldn’t be able to train on such data. Adversarial: Data poisoning and jail breaking.

Adversarial Robustness

Metaethics: Long Termism is kinda fallacious because infinite utility assigned to some event in the future.

Understood the scoping of the AI safety problem better Need to learn more about ethical foundations of long termism Structuring better incentives for corporations is an important way to approach this problem.

Discussion of X risk scenarios and reading more about them was interesting.
Proxy gaming is something new I learnt.
I need to learn much more about the ethical foundations of these systems and corporation incentives.