dfilan.github.io/index.html at master · dfilan/dfilan.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
layout: none
date:   ""
---
<!doctype html>
<html>
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="chrome=1">
    <title>Daniel Filan</title>

    <link rel="stylesheet" href="stylesheets/styles.css">
    <link rel="stylesheet" href="stylesheets/pygment_trac.css">
    <meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no">
    <!--[if lt IE 9]>
        <script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
        <![endif]-->
         </head>
  <body>
    <div class="wrapper">
      {% include sidebar.html %}
      <section>

        <h2>
          <a id="about-me" class="anchor" href="#about-me" aria-hidden="true"><span class="octicon octicon-link"></span></a>About me</h2>

        <p>I'm currently a member of technical staff at <a href="https://metr.org/">METR</a>, where I help manage our efforts to assess risk of loss of control from frontier AI development.</p>

        <!-- <p>I'm currently a senior research manager at <a href="https://www.matsprogram.org/">MATS</a>, where I chat with scholars and hopefully turn them into cool resarchers - specifically, working on AI alignment, interpretability, and/or governance - and also manage research managers.</p> -->

        <!-- <p>I'm currently a PhD candidate at UC Berkeley being supervised by <a href="https://people.eecs.berkeley.edu/~russell/">Stuart Russell</a>. I do ML research relevant to ensuring that future artifical intelligences who may be much smarter than us behave in a safe way. Discussions of problems in this area are available <a href="https://intelligence.org/technical-agenda/">here</a>, <a href="https://intelligence.org/files/AlignmentMachineLearning.pdf">here</a>, and <a href="http://arxiv.org/abs/1606.06565">here</a>.</p> -->

        <p>I have a <a href="/posts">blog</a> where I write about topics of interest to me: as of the time I write this, there are posts about <a href="https://danielfilan.com/2022/03/10/prob_smart_londoner_dies_of_russian_nuke.html">forecasting</a>, <a href="https://danielfilan.com/2022/02/11/nice_representation_laplacian.html">math</a>, and <a href="https://danielfilan.com/2021/11/21/meta_puzzle.html">puzzles</a>.</p>

        <p>I also have a <a href="https://axrp.net">podcast</a>: it's called <a href="https://axrp.net">AXRP</a>, which is short for the AI X-risk Research Podcast. You can listen to episodes on <a href="https://www.youtube.com/@axrpodcast">YouTube</a>, or by searching "AXRP" in your favourite podcast app. Alteratively, you can read transcripts <a href="https://axrp.net">here</a>.</p>

        <p>In addition to AXRP, I have another podcast called <a href="https://thefilancabinet.com/">The Filan Cabinet</a>, where I talk to people about whatever I want. Episodes are available on <a href="https://www.youtube.com/@TheFilanCabinet">YouTube</a>, or wherever you listen to podcasts.</p>

        <p>I'm interested in <a href="https://www.effectivealtruism.org/">effective altruism</a>, how we can use our limited resources to do the most good in the world. I also sometimes <a href="/bets">bet on things</a>, for reasons described by <a href="http://econlog.econlib.org/archives/2012/05/the_bettors_oat.html">Bryan Caplan</a> and <a href="http://econlog.econlib.org/archives/2014/07/kant_on_betting.html">Immanuel Kant</a>.</p>

        <p>From mid-2024 to late 2025, I was a senior research manager at <a href="https://www.matsprogram.org/">MATS</a>, where I chatted with scholars and hopefully turned them into cool resarchers - specifically, working on AI alignment, interpretability, and/or governance - and also managed other research managers.</p>

        <p>I completed my PhD in AI at UC Berkeley in 2024, where I was supervised by <a href="https://people.eecs.berkeley.edu/~russell/">Stuart Russell</a>. You can read my thesis "Structure and Representation in Neural Networks" <a href="/pdfs/phd_thesis.pdf">here</a>.</p>

        <p>I did my undergrad at the Australian National University, studying the theory of reinforcement learning, mathematics, and theoretical physics. I did my honours year (similar to a research master's degree lasting one year) under <a href="http://www.hutter1.net/">Marcus Hutter</a>; you can read my thesis "Resource-bounded Complexity-based Priors for Agents" <a href="/pdfs/thesis.pdf">here</a>.</p>

        <h3>Papers (a bit out of date, sorry)</h3>

        <p><a href="/danielfilan.bib">bibtex</a></p>

        <ul>

          <li><strong>Clusterability in Neural Networks.</strong> <a href="https://arxiv.org/abs/2103.03386">arxiv</a><br>
            With Stephen Casper, Shlomi Hod, Cody Wild, Andrew Critch, and Stuart Russell.<br>
            Introduces the task of dividing the neurons of a network into groups such that edges between neurons in the same group have higher weight than edges between neurons in different groups. Implements this using graph clustering, so 'clusterability' refers to the divisibility of networks. Shows that in many conditions, networks trained with pruning and/or dropout are more clusterable than if their weights were randomly permuted. Also introduces a method of regularizing networks for clusterability.</li>

          <li><strong>Exploring Hierarchy-Aware Inverse Reinforcement Learning.</strong> <a href="https://arxiv.org/abs/1807.05037">arxiv</a><br>
            With Chris Cundy (lead author).<br>
            Presented at GoalsRL 2018, held jointly at ICML, IJCAI, and AAMAS 2018.<br>
            Advocates for the use of hierarchical planning models of humans for use in inverse reinforcement learning as more realistic for complex tasks, showing that in one task they perform comparably to state-of-the-art models.</li>

          <li><strong>Modeling Agents with Probabilistic Programs.</strong> <a href="http://agentmodels.org">website</a><br>
            With Owain Evans (lead author), Andreas Stuhlm&uuml;ller, and John Salvatier.<br>
            Web book.<br>
            A web book explaining how to write models of agents in the <a href="http://webppl.org/">webppl</a> probabilistic programming language. Covers topics such as "planning as inference", (PO)MDPs, inverse reinforcement learning, hyperbolic discounting, myopic planning, and multi-agent planning.</li>

          <li><strong>Self-Modification of Policy and Utility Function in Rational Agents.</strong> <a href="http://arxiv.org/abs/1605.03142">arxiv</a><br>
            With Tom Everitt (lead author), Mayank Daswani, and Marcus Hutter.<br>
            Presented at AGI 2016, winner of the Kurzweil prize for best paper.<br>
            Discusses agents that can modify their source code and predict the result of these modifications, and how to define them so that they don't make modifications that stop them from optimising what we originally told them to optimise.</li>

          <li><strong>Loss Bounds and Time Complexity for Speed Priors.</strong> <a href="http://www.jmlr.org/proceedings/papers/v51/filan16.html">jmlr</a><br>
            With Jan Leike and Marcus Hutter.<br>
            Presented at AISTATS 2016.<br>
            A discussion of 'speed priors', that is to say priors over infinite sequences of bits that penalise complex strings, where complexity is measured by the length of programs that produce a string, and the time those programs take to run. Builds off J&uuml;rgen Schmidhuber's <a href = "http://link.springer.com/chapter/10.1007/3-540-45435-7_15">original paper</a> defining his Speed Prior.</li>

          <!-- <li><strong>Thesis: Resource-bounded Complexity-based Priors for Agents.</strong> <a href="/pdfs/undergrad_thesis.pdf">pdf</a><br/> -->
          <!--   Supervised by Marcus Hutter in 2015.<br> -->
          <!--   My honours thesis about speed priors, used both in sequence prediction and in an RL setting. The most interesting results are contained in the AISTATS paper above.</li> -->

        </ul>

    </div>
    <script src="javascripts/scale.fix.js"></script>
    </section>
  </body>
</html>