diff --git "a/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/3-1.png" "b/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/3-1.png" new file mode 100644 index 0000000..0977898 Binary files /dev/null and "b/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/3-1.png" differ diff --git "a/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/3-2.png" "b/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/3-2.png" new file mode 100644 index 0000000..2a63bbf Binary files /dev/null and "b/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/3-2.png" differ diff --git "a/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/3-3.png" "b/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/3-3.png" new file mode 100644 index 0000000..8ebc269 Binary files /dev/null and "b/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/3-3.png" differ diff --git "a/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/ex3-shopSche.ipynb" "b/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/ex3-shopSche.ipynb" new file mode 100644 index 0000000..7f425f8 --- /dev/null +++ "b/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/ex3-shopSche.ipynb" @@ -0,0 +1,1085 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 练习3:流水车间调度器\n", + "-------\n", + ">本节练习节选自书籍《500 lines or less》——A Flow Shop Scheduler\n", + "\n", + "## 介绍\n", + "\n", + "在本练习中,我们将介绍流水车间调度问题。流水车间调度问题是查找最优解问题的一种,本练习基于局部搜索(local search)方法,实现流水车间调度器。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1 流水车间调度\n", + "流水车间调度问题是一种优化问题,其中我们必须确定作业中各种任务的处理时间,以便调度任务以最大程度地减少完成作业所需的总时间。\n", + "\n", + "例如,一家汽车制造商拥有一条装配线,其中汽车的每个零件都在不同的机器上依次完成。不同的订单可能有自定义要求,例如,使车身涂漆的任务因一辆汽车而异。\n", + "\n", + "在我们的示例中,每辆汽车都是新**作业**,每辆汽车的零件都称为**任务**。每个作业将具有相同的任务序列来完成。\n", + "\n", + "流水车间调度的目的是最大程度地减少处理从每个作业到完成所有任务的总时间。(通常,此总时间称为完成时间。)此问题有很多应用程序,但与优化生产设备最相关。\n", + "\n", + "每个流水车间都由$n$台机器和$m$个任务组成。在我们的示例中,将有$n$个汽车工作站,总共有$m$辆汽车。每个作业都恰好由$n$个任务组成。我们可以假设一个作业的第$i$个任务必须使用机器$i$,并且需要预定数量的处理时间:$p(i,j)$是第$j$个作业的第$i$个任务的处理时间。此外,给定作业的任务顺序必须遵循可用机器的顺序。对于给定的作业,任务$i$必须在任务$i+1$开始之前完成。在我们的示例中,我们不想在组装框架之前就开始为汽车刷漆。最终的限制就是**不允许在一台机器上同时处理两个任务**。\n", + "\n", + "因为作业的任务顺序是预先确定的,所以可以将流水车间调度问题的解决方法看作是作业的排列。对于每台机器,在一台机器上处理的作业顺序将是相同的,并且经过排列后,作业$j$中的机器$i$的任务计划为以下两种情况:\n", + "\n", + "- 作业$j-1$中机器$i$的任务完成(即同一机器上的最新任务)\n", + "- 作业$j$中机器$i-1$的任务完成(即同一任务上的最新任务)\n", + "\n", + "因为我们选择这两个值中的最大值,所以将创建机器i或作业j的空闲时间。我们最终希望最小化此空闲时间,因为它将使总有效期变大。\n", + "\n", + "由于问题的简单形式,作业的任何排列都是有效的解决方案,而最佳解决方案将对应于某些排列。因此,我们通过更改作业的排列并测量相应的有效期来寻求改进的解决方案。在下文中,我们将作业的排列称为候选。\n", + "\n", + "让我们考虑一个简单的示例,其中包含两个作业和两个计算机。第一项任务有任务A和B,分别需要1分钟和2分钟才能完成。第二项任务有任务C和D,分别需要2分钟和1分钟才能完成。回想一下,A必须先于B而C必须先于D。因为有两个工作,所以我们只考虑两个排列。如果在工作1之前订购工作2,则制造跨度为5(图1);\n", + "\n", + "![](3-1.png)\n", + "\n", + "另一方面,如果我们在作业2之前订购作业1,则制造跨度仅为4(图2)。\n", + "![](3-2.png)\n", + "\n", + "请注意,没有多余的空间可以提前完成任何任务。良好排列的指导原则是最大程度地减少任何机器无需处理的时间。\n", + "\n", + "### 1.1 局部搜索\n", + "\n", + "当最佳解决方案难以计算时,局部搜索是解决优化问题的策略。直观地讲,它从一个看起来不错的解决方案转变为另一个看起来更好的解决方案。我们没有考虑所有可能的解决方案作为下一个关注点,而是定义了所谓的社区:被认为与当前解决方案相似的解决方案集。因为作业的任何排列都是有效的解决方案,所以我们可以将任何将作业混洗的机制视为本地搜索过程(实际上,这是我们在下面所做的事情)。\n", + "\n", + "要正式使用本地搜索,我们必须回答以下几个问题:\n", + "\n", + "- 我们应该从什么解决方案开始?\n", + "- 给定一个解决方案,我们应该考虑哪些邻近解决方案?\n", + "- 给定候选邻居的集合,我们应该考虑移到下一个?\n", + "\n", + "以下三个部分依次解决了这些问题。\n", + "\n", + "## 2 通用求解器\n", + "在本节中,我们提供了流水车间调度程序的一般框架。首先,我们需要导入所需的python库文件和求解器的相关参数设置:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import sys, os, time, random\n", + "\n", + "from functools import partial\n", + "from collections import namedtuple\n", + "from itertools import product\n", + "\n", + "import neighbourhood as neigh\n", + "import heuristics as heur\n", + "\n", + "##############\n", + "## Settings ##\n", + "##############\n", + "\n", + "TIME_LIMIT = 300.0 # Time (in seconds) to run the solver\n", + "TIME_INCREMENT = 13.0 # Time (in seconds) in between heuristic measurements\n", + "DEBUG_SWITCH = False # Displays intermediate heuristic info when True\n", + "MAX_LNS_NEIGHBOURHOODS = 1000 # Maximum number of neighbours to explore in LNS" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "其中,`TIME_INCREMENT`是用作动态策略选择的一部分,`MAX_LNS_NEIGHBOURHOODS`是用作邻域选择的最大数量限制。接下来会有更加详细的描述。\n", + "\n", + "这些参数的设置也可以作为命令行参数显示给用户,此处我们改为将输入数据作为提供给程序。输入的问题(来自Taillard基准集)被假定为流水车间调度的标准格式。\n", + "\n", + "以下代码为`__main__`求解器文件的方法,根据输入到程序的参数数量来选择调用合适的函数:\n", + "\n", + "```python\n", + "if __name__ == '__main__':\n", + "\n", + " if len(sys.argv) == 2:\n", + " data = parse_problem(sys.argv[1], 0)\n", + " elif len(sys.argv) == 3:\n", + " data = parse_problem(sys.argv[1], int(sys.argv[2]))\n", + " else:\n", + " print(\"\\nUsage: python flow.py []\\n\")\n", + " sys.exit(0)\n", + "\n", + " (perm, ms) = solve(data)\n", + " print_solution(data, perm)\n", + "```\n", + "在上述代码中:\n", + "- `parse_problem`函数为Taillard基准集问题文件的解析函数\n", + "- `solve`函数为求解函数\n", + "- `print_problem`函数为最终结果的输出函数\n", + "\n", + "### 2.1 求解逻辑\n", + "我们先来看`solve`函数,该方法期望`data`变量是一个整数列表,其中包含每个作业的活动持续时间。其开始于初始化一组全局策略。关键是我们使用`start_*`来命名的变量,这些变量用来维护每个策略的统计数据,有助于在求解过程中动态选择策略。\n", + "\n", + "其中:\n", + "- `start_improvements`: 通过某策略可以改善解决方案的数量\n", + "- `start_time_spent`: 在某策略上花费的时间\n", + "- `start_weights`: 与某策略的好坏相对应的权重\n", + "- `start_usage`: 我们使用某策略的次数\n", + "\n", + "```python\n", + "def solve(data):\n", + " \"\"\"Solves an instance of the flow shop scheduling problem\"\"\"\n", + "\n", + " # We initialize the strategies here to avoid cyclic import issues\n", + " initialize_strategies()\n", + " global STRATEGIES\n", + "\n", + " strat_improvements = {strategy: 0 for strategy in STRATEGIES}\n", + " strat_time_spent = {strategy: 0 for strategy in STRATEGIES}\n", + " strat_weights = {strategy: 1 for strategy in STRATEGIES}\n", + " strat_usage = {strategy: 0 for strategy in STRATEGIES}\n", + "```\n", + "\n", + "流水车间调度问题的一个特征是:每个排列都是有效的解决方案,并且至少有一个将具有最佳的制造期。值得庆幸的是,当从一种排列转换到另一种排列时,这可以使我们放弃检查是否处于可行解的空间之内,因为所有一切都是可行的。\n", + "\n", + "但是,要在置换空间中开始局部搜索时,我们必须**具有初始置换**。为简单起见,我们选择**随机排列作业列表**来初始化局部搜索。\n", + "\n", + "```python\n", + " perm = range(len(data))\n", + " random.shuffle(perm)\n", + "```\n", + "\n", + "接下来,我们需要对变量进行初始化,以使我们能够跟踪到目前为止找到的最佳排列以及提供输出的时序信息。\n", + "\n", + "```python\n", + " # Keep track of the best solution\n", + " best_make = makespan(data, perm)\n", + " best_perm = perm\n", + " res = best_make\n", + "\n", + " # Maintain statistics and timing for the iterations\n", + " iteration = 0\n", + " time_limit = time.time() + TIME_LIMIT\n", + " time_last_switch = time.time()\n", + "\n", + " time_delta = TIME_LIMIT / 10\n", + " checkpoint = time.time() + time_delta\n", + " percent_complete = 10\n", + "\n", + " print(\"\\nSolving...\")\n", + "```\n", + "\n", + "由于这是局部搜索解决方案,因此只要没有达到时间限制,我们就可以继续尝试并改进解决方案。我们提供指示求解器进度的输出,并跟踪我们计算出的迭代次数:\n", + "```python\n", + " while time.time() < time_limit:\n", + "\n", + " if time.time() > checkpoint:\n", + " print \" %d %%\" % percent_complete\n", + " percent_complete += 10\n", + " checkpoint += time_delta\n", + "\n", + " iteration += 1\n", + "```\n", + "\n", + "下面我们描述如何选择策略,但是到目前为止,知道策略提供了两个功能库:\n", + "- 一个`neighbourhood`功能为我们提供了一组下一个要考虑的候选人\n", + "- 一个`heuristic`功能则从集合中选择了最佳候选人\n", + "\n", + "从这些函数中,我们得到一个新的排列(`perm`)和一个新的制造期结果(`res`):\n", + "```python\n", + " # Heuristically choose the best strategy\n", + " strategy = pick_strategy(STRATEGIES, strat_weights)\n", + "\n", + " old_val = res\n", + " old_time = time.time()\n", + "\n", + " # Use the current strategy's heuristic to pick the next permutation from\n", + " # the set of candidates generated by the strategy's neighbourhood\n", + " candidates = strategy.neighbourhood(data, perm)\n", + " perm = strategy.heuristic(data, candidates)\n", + " res = makespan(data, perm)\n", + "```\n", + "\n", + "计算工期的代码非常简单:我们可以通过评估最终作业的完成时间来从排列中进行计算。" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "def makespan(data, perm):\n", + " \"\"\"Computes the makespan of the provided solution\"\"\"\n", + " return compile_solution(data, perm)[-1][-1] + data[perm[-1]][-1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "我们将在后面看到`compile_solution`工作原理,但是现在足以知道已返回2D数组,并且坐标为[-1][-1]的元素与计划中最终作业的开始时间相对应。\n", + "\n", + "为了帮助选择策略,我们保留以下统计数据:\n", + "1. 该策略改善了解决方案的数量;\n", + "2. 该策略花费了多少时间计算信息;\n", + "3. 使用该策略的次数。如果偶然发现更好的解决方案,我们还将更新变量以获得最佳排列:\n", + "\n", + "```python\n", + " # Record the statistics on how the strategy did\n", + " strat_improvements[strategy] += res - old_val\n", + " strat_time_spent[strategy] += time.time() - old_time\n", + " strat_usage[strategy] += 1\n", + "\n", + " if res < best_make:\n", + " best_make = res\n", + " best_perm = perm[:]\n", + " \n", + "```\n", + "定期更新用于策略的统计信息。为了便于阅读,我们删除了相关代码段,并在下面详细说明了代码。\n", + "\n", + "作为最后一步,一旦`while`循环完成(即达到了时间限制),我们将输出有关求解过程的一些统计信息,并返回最佳排列及其`makepan`:\n", + "```python\n", + " print(\" %d %%\\n\" % percent_complete)\n", + " print(\"\\nWent through %d iterations.\" % iteration)\n", + "\n", + " print(\"\\n(usage) Strategy:\")\n", + " results = sorted([(strat_weights[STRATEGIES[i]], i)\n", + " for i in range(len(STRATEGIES))], reverse=True)\n", + " for (w, i) in results:\n", + " print(\"(%d) \\t%s\" % (strat_usage[STRATEGIES[i]], STRATEGIES[i].name))\n", + "\n", + " return (best_perm, best_make)\n", + "```\n", + "\n", + "最终的求解函数`solve`为:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "def solve(data):\n", + " \"\"\"Solves an instance of the flow shop scheduling problem\"\"\"\n", + "\n", + " #我们在这里初始化策略以避免周期性导入问题\n", + " initialize_strategies()\n", + " global STRATEGIES\n", + " \n", + " strat_improvements = {strategy: 0 for strategy in STRATEGIES}\n", + " strat_time_spent = {strategy: 0 for strategy in STRATEGIES}\n", + " strat_weights = {strategy: 1 for strategy in STRATEGIES}\n", + " strat_usage = {strategy: 0 for strategy in STRATEGIES}\n", + "\n", + " #随机开始\n", + " perm = list(range(len(list(data))))\n", + " random.shuffle(perm)\n", + "\n", + " #记录最优策略\n", + " best_make = makespan(data, perm)\n", + " best_perm = perm\n", + " res = best_make\n", + "\n", + " #维护迭代的统计信息和时序\n", + " iteration = 0\n", + " time_limit = time.time() + TIME_LIMIT\n", + " time_last_switch = time.time()\n", + "\n", + " time_delta = TIME_LIMIT / 10\n", + " checkpoint = time.time() + time_delta\n", + " percent_complete = 10\n", + "\n", + " print (\"\\nSolving...\")\n", + "\n", + " while time.time() < time_limit:\n", + "\n", + " if time.time() > checkpoint:\n", + " print (\" %d %%\" % percent_complete)\n", + " percent_complete += 10\n", + " checkpoint += time_delta\n", + "\n", + " iteration += 1\n", + "\n", + " # 启发式的选择最优策略\n", + " strategy = pick_strategy(STRATEGIES, strat_weights)\n", + "\n", + " old_val = res\n", + " old_time = time.time()\n", + "\n", + " # 使用当前策略的heuristic方法,从策略邻域生成的候选集合中选择下一个排列\n", + " candidates = strategy.neighbourhood(data, perm)\n", + " perm = strategy.heuristic(data, candidates)\n", + " res = makespan(data, perm)\n", + "\n", + " #记录有关策略执行情况的统计信息\n", + " strat_improvements[strategy] += res - old_val\n", + " strat_time_spent[strategy] += time.time() - old_time\n", + " strat_usage[strategy] += 1\n", + "\n", + " if res < best_make:\n", + " best_make = res\n", + " best_perm = perm[:]\n", + "\n", + " #定期更改可用策略的权重,这样搜索可以动态地转向最近被证明更有效的策略。\n", + " if time.time() > time_last_switch + TIME_INCREMENT:\n", + "\n", + " #将改进所需的时间归一化\n", + " results = sorted([(float(strat_improvements[s]) / max(0.001, strat_time_spent[s]), s)\n", + " for s in STRATEGIES])\n", + "\n", + " if DEBUG_SWITCH:\n", + " print (\"\\nComputing another switch...\")\n", + " print (\"Best performer: %s (%d)\" % (results[0][1].name, results[0][0]))\n", + " print (\"Worst performer: %s (%d)\" % (results[-1][1].name, results[-1][0]))\n", + "\n", + " #提升成功策略的分量\n", + " for i in range(len(STRATEGIES)):\n", + " strat_weights[results[i][1]] += len(STRATEGIES) - i\n", + "\n", + " #此外,加强未使用的策略以避免饥饿\n", + " if results[i][0] == 0:\n", + " strat_weights[results[i][1]] += len(STRATEGIES)\n", + "\n", + " time_last_switch = time.time()\n", + "\n", + " if DEBUG_SWITCH:\n", + " print (results)\n", + " print (sorted([strat_weights[STRATEGIES[i]] for i in range(len(STRATEGIES))]))\n", + "\n", + " strat_improvements = {strategy: 0 for strategy in STRATEGIES}\n", + " strat_time_spent = {strategy: 0 for strategy in STRATEGIES}\n", + "\n", + "\n", + " print (\" %d %%\\n\" % percent_complete)\n", + " print (\"\\nWent through %d iterations.\" % iteration)\n", + "\n", + " print (\"\\n(usage) Strategy:\")\n", + " results = sorted([(strat_weights[STRATEGIES[i]], i)\n", + " for i in range(len(STRATEGIES))], reverse=True)\n", + " for (w, i) in results:\n", + " print (\"(%d) \\t%s\" % (strat_usage[STRATEGIES[i]], STRATEGIES[i].name))\n", + "\n", + " return (best_perm, best_make)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.2 问题解析\n", + "\n", + "作为解析过程的输入,我们提供了可在其中找到输入的文件名以及应使用的示例编号。(每个文件包含许多实例,如下图所示。)\n", + "\n", + "\n", + "我们通过读取文件并标识分隔每个问题实例的行来开始解析:\n", + "```python\n", + "def parse_problem(filename, k=1):\n", + " with open(filename, 'r') as f:\n", + " # Identify the string that separates instances\n", + " problem_line = ('/number of jobs, number of machines, initial seed, '\n", + " 'upper bound and lower bound :/')\n", + "\n", + " # Strip spaces and newline characters from every line\n", + " lines = map(str.strip, f.readlines()) \n", + "```\n", + "为了更容易查找正确的实例,我们假设行将以'/'字符分隔。这样,我们就可以根据出现在每个实例顶部的通用字符串来分割文件,并且在第一行的开头添加一个'/'字符可以使得无论我们选择哪种实例,下面的字符串处理工作都会正常进行。\n", + "\n", + "给定文件中找到的实例集合,我们还将检测提供的实例号何时超出范围。\n", + "```python\n", + " # We prep the first line for later\n", + " lines[0] = '/' + lines[0]\n", + "\n", + " # We also know '/' does not appear in the files, so we can use it as\n", + " # a separator to find the right lines for the kth problem instance\n", + " try:\n", + " lines = '/'.join(lines).split(problem_line)[k].split('/')[2:]\n", + " except IndexError:\n", + " max_instances = len('/'.join(lines).split(problem_line)) - 1\n", + " print(\"\\nError: Instance must be within 1 and %d\\n\" % max_instances)\n", + " sys.exit(0)\n", + "```\n", + "\n", + "我们直接解析数据,将每个任务的处理时间转换为整数并将其存储在列表中。最后,我们压缩数据以反转行和列,以使格式符合上述求解代码的期望。(其中的每个项目data都应对应于特定的工作。)\n", + "```python\n", + " # Split every line based on spaces and convert each item to an int\n", + " data = [map(int, line.split()) for line in lines]\n", + "\n", + " # We return the zipped data to rotate the rows and columns, making each\n", + " # item in data the durations of tasks for a particular job\n", + " return zip(*data)\n", + "```\n", + "\n", + "最终的解析函数`parse_problem`为:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "def parse_problem(filename, k=1):\n", + " \"\"\"\n", + " 解析Taillard问题文件的第k个实例\n", + " Taillard问题文件是流水车间调度问题的标准基准集。\n", + " \"\"\"\n", + "\n", + " print (\"\\nParsing...\")\n", + "\n", + " with open(filename, 'r') as f:\n", + " #标识分隔实例的字符串\n", + " problem_line = '/number of jobs, number of machines, initial seed, upper bound and lower bound :/'\n", + "\n", + " #删除每行中的空格和换行符\n", + " lines = list(map(str.strip, f.readlines()))\n", + "\n", + " # We prep the first line for later\n", + " lines[0] = '/' + lines[0]\n", + "\n", + " #我们也知道'/'不会出现在文件中\n", + " #因此我们可以将其用作分隔符以查找第k个问题实例的正确行\n", + " try:\n", + " lines = '/'.join(lines).split(problem_line)[k].split('/')[2:]\n", + " except IndexError:\n", + " max_instances = len('/'.join(lines).split(problem_line)) - 1\n", + " print (\"\\nError: Instance must be within 1 and %d\\n\" % max_instances)\n", + " sys.exit(0)\n", + "\n", + " #根据空格分割每一行并将每一项转换为一个整数\n", + " data = [map(int, line.split()) for line in lines]\n", + "\n", + " #我们返回压缩后的数据以旋转行和列\n", + " #从而使数据中的每个项目都成为特定任务的任务持续时间\n", + " return zip(*data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.3 编译解决方案\n", + "流水车间调度问题的解决方案包括为每个作业中的每个任务提供精确的时间安排。因为我们隐含地表示工作的排列,所以我们引入了`compile_solution`将排列转换为精确时间的函数。\n", + "\n", + "作为输入,该函数接收问题数据(为我们提供每个任务的持续时间)和工作排列的数据。\n", + "该函数首先初始化用于存储每个任务的开始时间的数据结构,然后将第一个作业中的任务包括在排列中。\n", + "\n", + "然后,我们为其余作业添加所有任务。\n", + "\n", + "作业中的第一个任务将始终在上一个作业中的第一个任务完成后立即开始。对于其余任务,其开始时间是**同一作业中上一个任务的完成时间和同一台计算机上上一个任务的完成时间中的最大值**。\n", + "最终返回每个任务的时间安排。\n", + "\n", + "该函数代码如下:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "def compile_solution(data, perm):\n", + " \"\"\"编译给定作业排列的机器上的调度\"\"\"\n", + " num_machines = len(list(data[0]))\n", + "\n", + " #注意,使用[[]]* m是不正确的\n", + " #因为它只是复制相同的列表m次(而不是创建m个不同的列表)。\n", + " machine_times = [[] for _ in range(num_machines)]\n", + "\n", + " #将初始作业分配给机器\n", + " machine_times[0].append(0)\n", + " for mach in range(1,num_machines):\n", + " #前一个任务完成后,开始工作中的下一个任务\n", + " machine_times[mach].append(machine_times[mach-1][0] +\n", + " data[perm[0]][mach-1])\n", + "\n", + " #分配剩余的工作\n", + " for i in range(1, len(perm)):\n", + "\n", + " #第一台机器从不包含任何空闲时间\n", + " job = perm[i]\n", + " machine_times[0].append(machine_times[0][-1] + data[perm[i-1]][0])\n", + "\n", + " #对于其余的机器,启动时间是作业中的前一个任务何时完成,\n", + " #或当前机器何时完成前一个作业的任务的最大值。\n", + " for mach in range(1, num_machines):\n", + " machine_times[mach].append(max(machine_times[mach-1][i] + data[perm[i]][mach-1],\n", + " machine_times[mach][i-1] + data[perm[i-1]][mach]))\n", + "\n", + " return machine_times" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.4 印刷解决方案\n", + "\n", + "求解过程完成后,程序将以紧凑形式输出有关解决方案的信息。我们没有提供每项任务的每个任务的准确时间安排,而是输出以下信息:\n", + "\n", + "1. 产生最佳生产期的工作排列\n", + "2. 排列的计算的计算跨度\n", + "3. 每台机器的开始时间,完成时间和空闲时间\n", + "4. 每个作业的开始时间,完成时间和空闲时间\n", + "\n", + "其中:\n", + "\n", + "- **作业或机器的开始时间**对应于作业或机器上第一个任务的开始。\n", + "- **作业或机器的完成时间**对应于作业或机器上最终任务的结束。\n", + "- **空闲时间**是特定作业或机器的任务之间的松弛时间。\n", + "\n", + "理想情况下,我们希望减少空闲时间,因为这意味着总的处理时间也将减少。\n", + "\n", + "我们已经讨论了用于编译解决方案的代码(即,计算每个任务的开始时间),并且输出排列和`makepan`。\n", + "\n", + "接着,我们使用Python中的字符串格式化功能来打印每个机器和作业的开始,结束和空闲时间表。\n", + "\n", + "请注意,作业的空闲时间是从作业开始到完成为止的时间,减去该作业中每个任务的处理时间之和。我们以类似的方式计算机器的空闲时间。" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "def print_solution(data, perm):\n", + " \"\"\"打印计算出的解决方案的统计信息\"\"\"\n", + "\n", + " sol = compile_solution(data, perm)\n", + "\n", + " print (\"\\nPermutation: %s\\n\" % str([i+1 for i in perm]))\n", + "\n", + " print (\"Makespan: %d\\n\" % makespan(data, perm))\n", + "\n", + " row_format =\"{:>15}\" * 4\n", + " print( row_format.format('Machine', 'Start Time', 'Finish Time', 'Idle Time'))\n", + " for mach in range(len(data[0])):\n", + " finish_time = sol[mach][-1] + data[perm[-1]][mach]\n", + " idle_time = (finish_time - sol[mach][0]) - sum([job[mach] for job in data])\n", + " print (row_format.format(mach+1, sol[mach][0], finish_time, idle_time))\n", + "\n", + " results = []\n", + " for i in range(len(data)):\n", + " finish_time = sol[-1][i] + data[perm[i]][-1]\n", + " idle_time = (finish_time - sol[0][i]) - sum([time for time in data[perm[i]]])\n", + " results.append((perm[i]+1, sol[0][i], finish_time, idle_time))\n", + "\n", + " print (\"\\n\")\n", + " print( row_format.format('Job', 'Start Time', 'Finish Time', 'Idle Time'))\n", + " for r in sorted(results):\n", + " print (row_format.format(*r))\n", + "\n", + " print( \"\\n\\nNote: Idle time does not include initial or final wait time.\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3 策略\n", + "接下来,我们对本练习开始时所使用的\n", + "```\n", + "import neighbourhood as neigh\n", + "import heuristics as heur\n", + "```\n", + "中的两个功能类进行介绍,并使用代码实现如何进行动态策略的选择。\n", + "\n", + "### 3.1 Neighbourhood\n", + "局部搜索的思想是将局部解决方案从一个解决方案迁移到附近的其他解决方案。我们将给定解决方案的**邻域**称为局部的其他解决方案。\n", + "\n", + "在本节中,我们详细介绍四个潜在的领域,每个领域的复杂性都在不断提高。\n", + "\n", + "**第一邻域产生给定数量的随机排列。**这个邻里甚至没有考虑我们开始时使用的解决方案,因此“邻里”一词扩展了事实。但是,在搜索中包括一些随机性是一种好习惯,因为它可以促进对搜索空间的探索。" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "def neighbours_random(data, perm, num = 1):\n", + " #返回个随机作业排列,包括当前的排列\n", + " candidates = [perm]\n", + " for i in range(num):\n", + " candidate = perm[:]\n", + " random.shuffle(candidate)\n", + " candidates.append(candidate)\n", + " return candidates" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**第二领域考虑交换排列中的任何两个作业。**通过使用`itertools`包中的`combinations`函数,我们可以轻松地遍历每对索引,并创建对应于交换位于每个索引处的作业的新排列。从某种意义上说,这个邻域产生的排列与我们开始时的排列非常相似。" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "def neighbours_swap(data, perm):\n", + " #返回与交换每一对作业相对应的排列\n", + " candidates = [perm]\n", + " for (i,j) in combinations(range(len(perm)), 2):\n", + " candidate = perm[:]\n", + " candidate[i], candidate[j] = candidate[j], candidate[i]\n", + " candidates.append(candidate)\n", + " return candidates" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**第三领域使用特定于当前问题的信息。**我们发现空闲时间最多的作业,并考虑以各种可能的方式交换它们。我们采用的值`size`是我们考虑的作业数量:`size`最空闲的作业。\n", + "\n", + "该函数的执行过程如下:\n", + "1. 计算排列中每个作业的空闲时间;\n", + "2. 计算出`size`空闲时间最多的作业列表;\n", + "3. 通过考虑已确定的最闲置作业的每个排列来构造邻域。为了找到排列,我们利用`itertools`包中的`permutations`函数。\n" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "def neighbours_idle(data, perm, size=4):\n", + " #返回最空闲作业的排列\n", + " candidates = [perm]\n", + "\n", + " #计算每个作业的空闲时间\n", + " sol = flow.compile_solution(data, perm)\n", + " results = []\n", + "\n", + " for i in range(len(data)):\n", + " finish_time = sol[-1][i] + data[perm[i]][-1]\n", + " idle_time = (finish_time - sol[0][i]) - sum([t for t in data[perm[i]]])\n", + " results.append((idle_time, i))\n", + " \n", + " #以最空闲的作业为例\n", + " subset = [job_ind for (idle, job_ind) in reversed(sorted(results))][:size]\n", + " \n", + " #枚举空闲作业的排列\n", + " for ordering in permutations(subset):\n", + " candidate = perm[:]\n", + " for i in range(len(ordering)):\n", + " candidate[subset[i]] = perm[ordering[i]]\n", + " candidates.append(candidate)\n", + "\n", + " return candidates" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**第四邻域通常称为大邻域搜索(LNS)。**直观上,LNS通过隔离考虑当前排列的较小子集来工作,找到工作子集的最佳排列将为我们提供LNS领域的单个候选者。\n", + "\n", + "通过对特定大小的几个(或全部)子集重复此过程,我们可以增加邻域中的候选者数量。我们通过`MAX_LNS_NEIGHBOURHOODS`参数考虑限制的数目,因为邻居的数目可以快速增长。\n", + "\n", + "LNS的计算过程如下:\n", + "1. 计算随机的作业集列表,我们将考虑使用`itertools`包的`combinations`功能进行交换;\n", + "2. 遍历子集以找到每个作业的最佳排列。我们在上面已经看到了类似的代码,可以循环访问最空闲的作业的所有排列。此处的主要区别在于,**我们仅记录该子集的最佳排列**,因为通过为所考虑的工作的每个子集选择一个排列来构建更大的邻域。\n", + "\n", + "如果我们将`size`参数设置为作业数,则将考虑每个排列并选择最佳排列。但是,实际上,我们需要将子集的大小限制为3或4;任何较大的值都会导致该`neighbours_LNS`功能花费大量时间。" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "def neighbours_LNS(data, perm, size = 2):\n", + " #返回大LNS搜索领域\n", + " candidates = [perm]\n", + "\n", + " #限制社区的数量,以防有太多的工作机会\n", + " neighbourhoods = list(combinations(range(len(perm)), size))\n", + " random.shuffle(neighbourhoods)\n", + " \n", + " for subset in neighbourhoods[:flow.MAX_LNS_NEIGHBOURHOODS]:\n", + "\n", + " #记录每个领域的最佳排列\n", + " best_make = flow.makespan(data, perm)\n", + " best_perm = perm\n", + "\n", + " #枚举所选邻域的每个排列\n", + " for ordering in permutations(subset):\n", + " candidate = perm[:]\n", + " for i in range(len(ordering)):\n", + " candidate[subset[i]] = perm[ordering[i]]\n", + " res = flow.makespan(data, candidate)\n", + " if res < best_make:\n", + " best_make = res\n", + " best_perm = candidate\n", + "\n", + " #将最佳候选人记录为更大社区的一部分\n", + " candidates.append(best_perm)\n", + "\n", + " return candidates" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.2 Heuristics\n", + "\n", + "启发式方法从一组提供的候选中返回单个候选排列。试探法还被授予对问题数据的访问权限,以便评估哪个候选者可能是首选。\n", + "\n", + "我们考虑的**第一个启发式方法是`heur_random`**。这种试探法从列表中随机选择一个候选人,而不评估哪个人可能更受欢迎:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "def heur_random(data, candidates):\n", + " #返回一个随机的候选选项\n", + " return random.choice(candidates)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**下一个启发式`heur_hillclimbing`使用另一个极端**。与其随机选择一个候选者,不如选择具有最广有效期的候选者。\n", + "\n", + "请注意,列表`scores`将包含以下形式的元组,`(make,perm)`其中`make`为置换的`perm`的`makepan`值。对这样的列表进行排序会将具有最佳生成时间的元组放在列表的开头;从这个元组中,我们返回排列。" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "def heur_hillclimbing(data, candidates):\n", + " #返回列表中的最佳候选者\n", + " scores = [(flow.makespan(data, perm), perm) for perm in candidates]\n", + " return sorted(scores)[0][1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**最终的启发式`heur_random_hillclimbing`结合了上面两种方法**。\n", + "\n", + "执行本地搜索时,您可能并不总是希望选择一个随机的候选者,甚至最好的候选者。通过选择概率为0.5的最佳候选者,然后选择概率为0.25的次优候选者,heur_random_hillclimbing启发式方法返回“相当好”的解决方案,依此类推。\n", + "\n", + "while循环实质上是在每次迭代时翻转硬币,以查看它是否应继续增加索引(限制列表的大小)。选择的最终索引对应于启发式选择的候选。" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "def heur_random_hillclimbing(data, candidates):\n", + " #返回与排序的质量成正比的概率的候选人\n", + " scores = [(flow.makespan(data, perm), perm) for perm in candidates]\n", + " i = 0\n", + " while (random.random() < 0.5) and (i < len(scores) - 1):\n", + " i += 1\n", + " return sorted(scores)[i][1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "由于makepan是我们正在尝试优化的标准,因此爬坡将引导本地搜索过程转向具有更佳makepan的解决方案。 引入随机性使我们能够探索邻居,而不是在每一步都盲目追求最佳外观的解决方案。\n", + "\n", + "### 3.3 动态策略选择\n", + "\n", + "本地搜索的核心是**使用特定的启发式和邻域函数从一种解决方案跳到另一种解决方案**。 \n", + "\n", + "我们如何选择一组选项而不是另一组? \n", + "\n", + "实际上,在搜索过程中切换策略经常会有所回报。我们使用的动态策略选择将在试探性功能和邻域功能的组合之间切换,以尝试动态地转移到效果最佳的那些策略上。对我们来说,策略是启发式和邻域函数(包括其参数值)的特定配置。\n", + "\n", + "1. 首先,我们的代码构建了我们在求解过程中要考虑的策略范围。在策略初始化中,我们使用`functools`包中的`partial`函数为每个街区部分分配参数。\n", + "2. 此外,我们构造了启发式函数列表。\n", + "3. 最后,我们使用乘积运算符将邻域和启发式函数的每种组合添加为新策略。\n" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "################\n", + "## Strategies ##\n", + "#################################################\n", + "## 策略是Neighbourhoods生成器(用于计算下一组候选对象)\n", + "## 和Heuristics计算(用于选择最佳候选对象)的特定配置。\n", + "##\n", + "\n", + "STRATEGIES = []\n", + "\n", + "#使用命名字典比使用字典更干净。 例如,strategy ['name']与strategy.name\n", + "Strategy = namedtuple('Strategy', ['name', 'neighbourhood', 'heuristic'])\n", + "\n", + "def initialize_strategies():\n", + "\n", + " global STRATEGIES\n", + "\n", + " #定义我们要使用的邻域(和参数)\n", + " neighbourhoods = [\n", + " ('Random Permutation', partial(neighbours_random, num=100)),\n", + " ('Swapped Pairs', neighbours_swap),\n", + " ('Large Neighbourhood Search (2)', partial(neighbours_LNS, size=2)),\n", + " ('Large Neighbourhood Search (3)', partial(neighbours_LNS, size=3)),\n", + " ('Idle Neighbourhood (3)', partial(neighbours_idle, size=3)),\n", + " ('Idle Neighbourhood (4)', partial(neighbours_idle, size=4)),\n", + " ('Idle Neighbourhood (5)', partial(neighbours_idle, size=5))\n", + " ]\n", + "\n", + " #定义我们要使用的启发式\n", + " heuristics = [\n", + " ('Hill Climbing',heur_hillclimbing),\n", + " ('Random Selection',heur_random),\n", + " ('Biased Random Selection',heur_random_hillclimbing)\n", + " ]\n", + "\n", + " #组合每一个邻域和启发式策略\n", + " for (n, h) in product(neighbourhoods, heuristics):\n", + " STRATEGIES.append(Strategy(\"%s / %s\" % (n[0], h[0]), n[1], h[1]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "一旦定义了策略,我们就不必在搜索过程中坚持使用单个选项。取而代之的是,我们随机选择任何一种策略,但要根据策略的执行情况对选择进行加权。\n", + "\n", + "我们在下面描述权重,但是对于`pick_strategy`功能,我们只需要策略列表和相应的相对权重列表(任何数字都可以)。" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "def pick_strategy(strategies, weights):\n", + " #根据权重选择随机策略:轮盘赌选择并非完全随机选择策略,\n", + " #而是将随机选择偏向于过去效果良好的策略(根据权重值)。\n", + " total = sum([weights[strategy] for strategy in strategies])\n", + " pick = random.uniform(0, total)\n", + " count = weights[strategies[0]]\n", + "\n", + " i = 0\n", + " while pick > count:\n", + " count += weights[strategies[i+1]]\n", + " i += 1\n", + "\n", + " return strategies[i]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "为了选择具有给定权重的随机策略,我们统一选择一个介于0和所有权重之和之间的数字。随后,我们找到最低的索引i这样索引的所有权重之和小于i大于我们选择的随机数。该技术有时称为**轮盘赌选择**,将为我们随机选择一项策略,并为那些权重较高的策略提供更大的机会。\n", + "\n", + "剩下的就是描述在寻找解决方案的过程中如何增加权重。这在**`solve`函数**的主`while`循环中以规则的时间间隔发生(由`TIME_INCREMENT`变量定义)。\n", + "```python\n", + " if time.time() > time_last_switch + TIME_INCREMENT:\n", + "\n", + " time_last_switch = time.time()\n", + "```\n", + "回想一下,`strat_improvements`存储策略已进行的所有改进的总和,而`strat_time_spent存`储最后一次间隔中给出策略的时间。 我们将每种策略所花费的总时间进行归一化,以衡量每个策略在上一个时间间隔内的效果。由于策略可能根本没有机会执行,因此我们选择少量时间作为默认值。\n", + "\n", + "```python\n", + " results = sorted([\n", + " (float(strat_improvements[s]) / max(0.001, strat_time_spent[s]), s)\n", + " for s in STRATEGIES])\n", + "```\n", + "\n", + "现在我们已经对每个策略的执行情况进行了排名,我们将$k$添加到最佳策略的权重(假设我们有$k$个策略),将$k-1$添加到次佳策略,等等。每个策略都有其自己的权重增加,列表中最差的策略只会增加1。\n", + "```python\n", + " # Boost the weight for the successful strategies\n", + " for i in range(len(STRATEGIES)):\n", + " strat_weights[results[i][1]] += len(STRATEGIES) - i\n", + "```\n", + "\n", + "作为一项额外措施,我们人为地提高了所有未使用的策略。 这样做是为了确保我们不会完全忘记策略。 尽管一种策略在开始时似乎表现不佳,但后来在搜索中却证明是非常有用的。\n", + "\n", + "```python\n", + " if results[i][0] == 0:\n", + " strat_weights[results[i][1]] += len(STRATEGIES)\n", + "```\n", + "\n", + "最后,我们输出有关策略排名的一些信息(如果设置了`DEBUG_SWITCH`标志),然后在下一个时间间隔重置`strat_improvements`和`strat_time_spent`变量。\n", + "\n", + "```python\n", + " if DEBUG_SWITCH:\n", + " print (\"\\nComputing another switch...\")\n", + " print (\"Best: %s (%d)\" % (results[0][1].name, results[0][0]))\n", + " print (\"Worst: %s (%d)\" % (results[-1][1].name, results[-1][0]))\n", + " print (results)\n", + " print (sorted([strat_weights[STRATEGIES[i]] \n", + " for i in range(len(STRATEGIES))]))\n", + "\n", + " strat_improvements = {strategy: 0 for strategy in STRATEGIES}\n", + " strat_time_spent = {strategy: 0 for strategy in STRATEGIES}\n", + "```\n", + "\n", + "接下来,我们以小文件来测试一下调度算法的运行结果。" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Parsing...\n", + "\n", + "Solving...\n", + " 10 %\n", + " 20 %\n", + " 30 %\n", + " 40 %\n", + " 50 %\n", + " 60 %\n", + " 70 %\n", + " 80 %\n", + " 90 %\n", + " 100 %\n", + "\n", + "\n", + "Went through 3625 iterations.\n", + "\n", + "(usage) Strategy:\n", + "(264) \tIdle Neighbourhood (3) / Hill Climbing\n", + "(250) \tLarge Neighbourhood Search (2) / Random Selection\n", + "(223) \tIdle Neighbourhood (4) / Hill Climbing\n", + "(232) \tRandom Permutation / Hill Climbing\n", + "(213) \tIdle Neighbourhood (5) / Hill Climbing\n", + "(247) \tIdle Neighbourhood (4) / Biased Random Selection\n", + "(216) \tIdle Neighbourhood (5) / Biased Random Selection\n", + "(230) \tSwapped Pairs / Hill Climbing\n", + "(234) \tSwapped Pairs / Biased Random Selection\n", + "(217) \tIdle Neighbourhood (3) / Biased Random Selection\n", + "(154) \tLarge Neighbourhood Search (3) / Random Selection\n", + "(160) \tLarge Neighbourhood Search (2) / Biased Random Selection\n", + "(188) \tLarge Neighbourhood Search (2) / Hill Climbing\n", + "(146) \tLarge Neighbourhood Search (3) / Biased Random Selection\n", + "(164) \tLarge Neighbourhood Search (3) / Hill Climbing\n", + "(129) \tIdle Neighbourhood (3) / Random Selection\n", + "(93) \tRandom Permutation / Biased Random Selection\n", + "(66) \tIdle Neighbourhood (4) / Random Selection\n", + "(48) \tIdle Neighbourhood (5) / Random Selection\n", + "(84) \tSwapped Pairs / Random Selection\n", + "(67) \tRandom Permutation / Random Selection\n", + "\n", + "Permutation: [13, 8, 11, 17, 14, 4, 9, 6, 15, 16, 18, 7, 5, 12, 3, 1, 19, 10, 2, 20]\n", + "\n", + "Makespan: 1297\n", + "\n", + " Machine Start Time Finish Time Idle Time\n", + " 1 0 1121 0\n", + " 2 14 1198 184\n", + " 3 87 1238 204\n", + " 4 150 1269 38\n", + " 5 189 1297 104\n", + "\n", + "\n", + " Job Start Time Finish Time Idle Time\n", + " 1 735 1123 115\n", + " 2 944 1267 34\n", + " 3 720 1065 219\n", + " 4 189 611 84\n", + " 5 552 973 68\n", + " 6 287 715 151\n", + " 7 499 920 143\n", + " 8 14 289 54\n", + " 9 260 680 213\n", + " 10 857 1204 42\n", + " 11 52 419 110\n", + " 12 629 1045 182\n", + " 13 0 197 0\n", + " 14 160 526 131\n", + " 15 323 762 214\n", + " 16 335 849 249\n", + " 17 128 477 158\n", + " 18 412 867 112\n", + " 19 789 1191 133\n", + " 20 1027 1297 0\n", + "\n", + "\n", + "Note: Idle time does not include initial or final wait time.\n", + "\n" + ] + } + ], + "source": [ + "!python flow.py tai20_5.txt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 总结\n", + "\n", + "在本练习中,我们看到了用相对少量的代码可以解决流水车间调度的复杂优化问题的方法。很难找到诸如流水车间之类的大型优化问题的最佳解决方案。在这种情况下,我们可以求助于近似技术(例如局部搜索)来计算足够好的解。通过局部搜索,我们可以从一种解决方案转到另一种解决方案,以期找到质量好的解决方案。\n", + "\n", + "局部搜索的一般直觉可以应用于各种各样的问题。 我们专注于:\n", + "- 从一个候选解决方案生成问题的相关解决方案的邻域\n", + "- 建立评估和比较解决方案的方法。 \n", + "\n", + "有了这两个组成部分,当难以找到最优方案时,我们可以使用局部搜索范式找到有价值的解决方案。\n", + "\n", + "我们没有使用任何一种策略来解决问题,而是看到了如何在解决过程中动态选择一种策略来进行转移。这种简单而强大的技术使程序能够混合和匹配针对当前问题的部分策略,这也意味着开发人员不必手工定制该策略。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "cgsource", + "language": "python", + "name": "cgsource" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.1" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git "a/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/flow.py" "b/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/flow.py" new file mode 100644 index 0000000..777ec6c --- /dev/null +++ "b/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/flow.py" @@ -0,0 +1,311 @@ +import sys, os, time, random + +from functools import partial +from collections import namedtuple +from itertools import product + +import neighbourhood as neigh +import heuristics as heur + +############## +## Settings ## +############## +TIME_LIMIT = 300.0 # Time (in seconds) to run the solver +TIME_INCREMENT = 13.0 # Time (in seconds) in between heuristic measurements +DEBUG_SWITCH = False # Displays intermediate heuristic info when True +MAX_LNS_NEIGHBOURHOODS = 1000 # Maximum number of neighbours to explore in LNS + + +################ +## Strategies ## +################################################# +## A strategy is a particular configuration +## of neighbourhood generator (to compute +## the next set of candidates) and heuristic +## computation (to select the best candidate). +## + +STRATEGIES = [] + +# Using a namedtuple is a little cleaner than dictionaries. +# E.g., strategy['name'] versus strategy.name +Strategy = namedtuple('Strategy', ['name', 'neighbourhood', 'heuristic']) + +def initialize_strategies(): + + global STRATEGIES + + # Define the neighbourhoods (and parameters) we would like to use + neighbourhoods = [ + ('Random Permutation', partial(neigh.neighbours_random, num=100)), + ('Swapped Pairs', neigh.neighbours_swap), + ('Large Neighbourhood Search (2)', partial(neigh.neighbours_LNS, size=2)), + ('Large Neighbourhood Search (3)', partial(neigh.neighbours_LNS, size=3)), + ('Idle Neighbourhood (3)', partial(neigh.neighbours_idle, size=3)), + ('Idle Neighbourhood (4)', partial(neigh.neighbours_idle, size=4)), + ('Idle Neighbourhood (5)', partial(neigh.neighbours_idle, size=5)) + ] + + # Define the heuristics we would like to use + heuristics = [ + ('Hill Climbing', heur.heur_hillclimbing), + ('Random Selection', heur.heur_random), + ('Biased Random Selection', heur.heur_random_hillclimbing) + ] + + # Combine every neighbourhood and heuristic strategy + for (n, h) in product(neighbourhoods, heuristics): + STRATEGIES.append(Strategy("%s / %s" % (n[0], h[0]), n[1], h[1])) + + + +def solve(data): + """Solves an instance of the flow shop scheduling problem""" + + # We initialize the strategies here to avoid cyclic import issues + initialize_strategies() + global STRATEGIES + + # Record the following for each strategy: + # improvements: The amount a solution was improved by this strategy + # time_spent: The amount of time spent on the strategy + # weights: The weights that correspond to how good a strategy is + # usage: The number of times we use a strategy + strat_improvements = {strategy: 0 for strategy in STRATEGIES} + strat_time_spent = {strategy: 0 for strategy in STRATEGIES} + strat_weights = {strategy: 1 for strategy in STRATEGIES} + strat_usage = {strategy: 0 for strategy in STRATEGIES} + + # Start with a random permutation of the jobs + perm = list(range(len(list(data)))) + random.shuffle(perm) + + # Keep track of the best solution + best_make = makespan(data, perm) + best_perm = perm + res = best_make + + # Maintain statistics and timing for the iterations + iteration = 0 + time_limit = time.time() + TIME_LIMIT + time_last_switch = time.time() + + time_delta = TIME_LIMIT / 10 + checkpoint = time.time() + time_delta + percent_complete = 10 + + print ("\nSolving...") + + while time.time() < time_limit: + + if time.time() > checkpoint: + print (" %d %%" % percent_complete) + percent_complete += 10 + checkpoint += time_delta + + iteration += 1 + + # Heuristically choose the best strategy + strategy = pick_strategy(STRATEGIES, strat_weights) + + old_val = res + old_time = time.time() + + # Use the current strategy's heuristic to pick the next permutation from + # the set of candidates generated by the strategy's neighbourhood + candidates = strategy.neighbourhood(data, perm) + perm = strategy.heuristic(data, candidates) + res = makespan(data, perm) + + # Record the statistics on how the strategy did + strat_improvements[strategy] += res - old_val + strat_time_spent[strategy] += time.time() - old_time + strat_usage[strategy] += 1 + + if res < best_make: + best_make = res + best_perm = perm[:] + + # At regular intervals, switch the weighting on the strategies available. + # This way, the search can dynamically shift towards strategies that have + # proven more effective recently. + if time.time() > time_last_switch + TIME_INCREMENT: + + # Normalize the improvements made by the time it takes to make them + results = sorted([(float(strat_improvements[s]) / max(0.001, strat_time_spent[s]), s) + for s in STRATEGIES]) + + if DEBUG_SWITCH: + print ("\nComputing another switch...") + print ("Best performer: %s (%d)" % (results[0][1].name, results[0][0])) + print ("Worst performer: %s (%d)" % (results[-1][1].name, results[-1][0])) + + # Boost the weight for the successful strategies + for i in range(len(STRATEGIES)): + strat_weights[results[i][1]] += len(STRATEGIES) - i + + # Additionally boost the unused strategies to avoid starvation + if results[i][0] == 0: + strat_weights[results[i][1]] += len(STRATEGIES) + + time_last_switch = time.time() + + if DEBUG_SWITCH: + print (results) + print (sorted([strat_weights[STRATEGIES[i]] for i in range(len(STRATEGIES))])) + + strat_improvements = {strategy: 0 for strategy in STRATEGIES} + strat_time_spent = {strategy: 0 for strategy in STRATEGIES} + + + print (" %d %%\n" % percent_complete) + print ("\nWent through %d iterations." % iteration) + + print ("\n(usage) Strategy:") + results = sorted([(strat_weights[STRATEGIES[i]], i) + for i in range(len(STRATEGIES))], reverse=True) + for (w, i) in results: + print ("(%d) \t%s" % (strat_usage[STRATEGIES[i]], STRATEGIES[i].name)) + + return (best_perm, best_make) + + +def parse_problem(filename, k=1): + """Parse the kth instance of a Taillard problem file + + The Taillard problem files are a standard benchmark set for the problem + of flow shop scheduling. They can be found online at the following address: + - http://mistic.heig-vd.ch/taillard/problemes.dir/ordonnancement.dir/ordonnancement.html""" + + print ("\nParsing...") + + with open(filename, 'r') as f: + # Identify the string that separates instances + problem_line = '/number of jobs, number of machines, initial seed, upper bound and lower bound :/' + + # Strip spaces and newline characters from every line + lines = list(map(str.strip, f.readlines())) + + # We prep the first line for later + lines[0] = '/' + lines[0] + + # We also know '/' does not appear in the files, so we can use it as + # a separator to find the right lines for the kth problem instance + try: + lines = '/'.join(lines).split(problem_line)[k].split('/')[2:] + except IndexError: + max_instances = len('/'.join(lines).split(problem_line)) - 1 + print ("\nError: Instance must be within 1 and %d\n" % max_instances) + sys.exit(0) + + # Split every line based on spaces and convert each item to an int + data = [map(int, line.split()) for line in lines] + + # We return the zipped data to rotate the rows and columns, making each + # item in data the durations of tasks for a particular job + return zip(*data) + + +def pick_strategy(strategies, weights): + # Picks a random strategy based on its weight: roulette wheel selection + # Rather than selecting a strategy entirely at random, we bias the + # random selection towards strategies that have worked well in the + # past (according to the weight value). + total = sum([weights[strategy] for strategy in strategies]) + pick = random.uniform(0, total) + count = weights[strategies[0]] + + i = 0 + while pick > count: + count += weights[strategies[i+1]] + i += 1 + + return strategies[i] + + +def makespan(data, perm): + """Computes the makespan of the provided solution + + For scheduling problems, the makespan refers to the difference between + the earliest start time of any job and the latest completion time of + any job. Minimizing the makespan amounts to minimizing the total time + it takes to process all jobs from start to finish.""" + return compile_solution(data, perm)[-1][-1] + data[perm[-1]][-1] + + +def compile_solution(data, perm): + """Compiles a scheduling on the machines given a permutation of jobs""" + num_machines = len(list(data[0])) + + # Note that using [[]] * m would be incorrect, as it would simply + # copy the same list m times (as opposed to creating m distinct lists). + machine_times = [[] for _ in range(num_machines)] + + # Assign the initial job to the machines + machine_times[0].append(0) + for mach in range(1,num_machines): + # Start the next task in the job when the previous finishes + machine_times[mach].append(machine_times[mach-1][0] + + data[perm[0]][mach-1]) + + # Assign the remaining jobs + for i in range(1, len(perm)): + + # The first machine never contains any idle time + job = perm[i] + machine_times[0].append(machine_times[0][-1] + data[perm[i-1]][0]) + + # For the remaining machines, the start time is the max of when the + # previous task in the job completed, or when the current machine + # completes the task for the previous job. + for mach in range(1, num_machines): + machine_times[mach].append(max(machine_times[mach-1][i] + data[perm[i]][mach-1], + machine_times[mach][i-1] + data[perm[i-1]][mach])) + + return machine_times + + + +def print_solution(data, perm): + """Prints statistics on the computed solution""" + + sol = compile_solution(data, perm) + + print ("\nPermutation: %s\n" % str([i+1 for i in perm])) + + print ("Makespan: %d\n" % makespan(data, perm)) + + row_format ="{:>15}" * 4 + print( row_format.format('Machine', 'Start Time', 'Finish Time', 'Idle Time')) + for mach in range(len(data[0])): + finish_time = sol[mach][-1] + data[perm[-1]][mach] + idle_time = (finish_time - sol[mach][0]) - sum([job[mach] for job in data]) + print (row_format.format(mach+1, sol[mach][0], finish_time, idle_time)) + + results = [] + for i in range(len(data)): + finish_time = sol[-1][i] + data[perm[i]][-1] + idle_time = (finish_time - sol[0][i]) - sum([time for time in data[perm[i]]]) + results.append((perm[i]+1, sol[0][i], finish_time, idle_time)) + + print ("\n") + print( row_format.format('Job', 'Start Time', 'Finish Time', 'Idle Time')) + for r in sorted(results): + print (row_format.format(*r)) + + print( "\n\nNote: Idle time does not include initial or final wait time.\n") + + +if __name__ == '__main__': + + if len(sys.argv) == 2: + data = parse_problem(sys.argv[1]) + elif len(sys.argv) == 3: + data = parse_problem(sys.argv[1], int(sys.argv[2])) + else: + print ("\nUsage: python flow.py []\n") + sys.exit(0) + data1 = list(data) + data = [list(d) for d in data1] + (perm, ms) = solve(list(data)) + print_solution(list(data), perm) diff --git "a/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/heuristics.py" "b/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/heuristics.py" new file mode 100644 index 0000000..e6e0d72 --- /dev/null +++ "b/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/heuristics.py" @@ -0,0 +1,30 @@ +import random + +import flow + +################ +## Heuristics ## +################ + +################################################################ +## A heuristic returns a single candidate permutation from +## a set of candidates that is given. The heuristic is also +## given access to the problem data in order to evaluate +## which candidate might be preferred. + +def heur_hillclimbing(data, candidates): + # Returns the best candidate in the list + scores = [(flow.makespan(data, perm), perm) for perm in candidates] + return sorted(scores)[0][1] + +def heur_random(data, candidates): + # Returns a random candidate choice + return random.choice(candidates) + +def heur_random_hillclimbing(data, candidates): + # Returns a candidate with probability proportional to its rank in sorted quality + scores = [(flow.makespan(data, perm), perm) for perm in candidates] + i = 0 + while (random.random() < 0.5) and (i < len(scores) - 1): + i += 1 + return sorted(scores)[i][1] \ No newline at end of file diff --git "a/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/neighbourhood.py" "b/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/neighbourhood.py" new file mode 100644 index 0000000..4e90000 --- /dev/null +++ "b/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/neighbourhood.py" @@ -0,0 +1,81 @@ + +import random +from itertools import combinations, permutations + +import flow + +############################## +## Neighbourhood Generators ## +############################## + +def neighbours_random(data, perm, num = 1): + # Returns random job permutations, including the current one + candidates = [perm] + for i in range(num): + candidate = perm[:] + random.shuffle(candidate) + candidates.append(candidate) + return candidates + +def neighbours_swap(data, perm): + # Returns the permutations corresponding to swapping every pair of jobs + candidates = [perm] + for (i,j) in combinations(range(len(perm)), 2): + candidate = perm[:] + candidate[i], candidate[j] = candidate[j], candidate[i] + candidates.append(candidate) + return candidates + +def neighbours_LNS(data, perm, size = 2): + # Returns the Large Neighbourhood Search neighbours + candidates = [perm] + + # Bound the number of neighbourhoods in case there are too many jobs + neighbourhoods = list(combinations(range(len(perm)), size)) + random.shuffle(neighbourhoods) + + for subset in neighbourhoods[:flow.MAX_LNS_NEIGHBOURHOODS]: + + # Keep track of the best candidate for each neighbourhood + best_make = flow.makespan(data, perm) + best_perm = perm + + # Enumerate every permutation of the selected neighbourhood + for ordering in permutations(subset): + candidate = perm[:] + for i in range(len(ordering)): + candidate[subset[i]] = perm[ordering[i]] + res = flow.makespan(data, candidate) + if res < best_make: + best_make = res + best_perm = candidate + + # Record the best candidate as part of the larger neighbourhood + candidates.append(best_perm) + + return candidates + +def neighbours_idle(data, perm, size=4): + # Returns the permutations of the most idle jobs + candidates = [perm] + + # Compute the idle time for each job + sol = flow.compile_solution(data, perm) + results = [] + + for i in range(len(data)): + finish_time = sol[-1][i] + data[perm[i]][-1] + idle_time = (finish_time - sol[0][i]) - sum([t for t in data[perm[i]]]) + results.append((idle_time, i)) + + # Take the most idle jobs + subset = [job_ind for (idle, job_ind) in reversed(sorted(results))][:size] + + # Enumerate the permutations of the idle jobs + for ordering in permutations(subset): + candidate = perm[:] + for i in range(len(ordering)): + candidate[subset[i]] = perm[ordering[i]] + candidates.append(candidate) + + return candidates diff --git "a/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/tai20_5.txt" "b/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/tai20_5.txt" new file mode 100644 index 0000000..a2fedb3 --- /dev/null +++ "b/Flow Shop\350\260\203\345\272\246\351\227\256\351\242\230 A Flow Shop Scheduler/tai20_5.txt" @@ -0,0 +1,80 @@ +number of jobs, number of machines, initial seed, upper bound and lower bound : + 20 5 873654221 1278 1232 +processing times : + 54 83 15 71 77 36 53 38 27 87 76 91 14 29 12 77 32 87 68 94 + 79 3 11 99 56 70 99 60 5 56 3 61 73 75 47 14 21 86 5 77 + 16 89 49 15 89 45 60 23 57 64 7 1 63 41 63 47 26 75 77 40 + 66 58 31 68 78 91 13 59 49 85 85 9 39 41 56 40 54 77 51 31 + 58 56 20 85 53 35 53 41 69 13 86 72 8 49 47 87 58 18 68 28 +number of jobs, number of machines, initial seed, upper bound and lower bound : + 20 5 379008056 1359 1290 +processing times : + 26 38 27 88 95 55 54 63 23 45 86 43 43 40 37 54 35 59 43 50 + 59 62 44 10 23 64 47 68 54 9 30 31 92 7 14 95 76 82 91 37 + 78 90 64 49 47 20 61 93 36 47 70 54 87 13 40 34 55 13 11 5 + 88 54 47 83 84 9 30 11 92 63 62 75 48 23 85 23 4 31 13 98 + 69 30 61 35 53 98 94 33 77 31 54 71 78 9 79 51 76 56 80 72 +number of jobs, number of machines, initial seed, upper bound and lower bound : + 20 5 1866992158 1081 1073 +processing times : + 77 94 9 57 29 79 55 73 65 86 25 39 76 24 38 5 91 29 22 27 + 39 31 46 18 93 58 85 58 97 10 79 93 2 87 17 18 10 50 8 26 + 14 21 15 10 85 46 42 18 36 2 44 89 6 3 1 43 81 57 76 59 + 11 2 36 30 89 10 88 22 31 9 43 91 26 3 75 99 63 83 70 84 + 83 13 84 46 20 33 74 42 33 71 32 48 42 99 7 54 8 73 30 75 +number of jobs, number of machines, initial seed, upper bound and lower bound : + 20 5 216771124 1293 1268 +processing times : + 53 19 99 62 88 93 34 72 42 65 39 79 9 26 72 29 36 48 57 95 + 93 79 88 77 94 39 74 46 17 30 62 77 43 98 48 14 45 25 98 30 + 90 92 35 13 75 55 80 67 3 93 54 67 25 77 38 98 96 20 15 36 + 65 97 27 25 61 24 97 61 75 92 73 21 29 3 96 51 26 44 56 31 + 64 38 44 46 66 31 48 27 82 51 90 63 85 36 69 67 81 18 81 72 +number of jobs, number of machines, initial seed, upper bound and lower bound : + 20 5 495070989 1236 1198 +processing times : + 61 86 16 42 14 92 67 77 46 41 78 3 72 95 53 59 34 66 42 63 + 27 92 8 65 34 6 42 39 2 7 85 32 14 74 59 95 48 37 59 4 + 42 93 32 30 16 95 58 12 95 21 74 38 4 31 62 39 97 57 9 54 + 13 47 6 70 19 97 41 1 57 60 62 14 90 76 12 89 37 35 91 69 + 55 48 56 84 22 51 43 50 62 61 10 87 99 40 91 64 62 53 33 16 +number of jobs, number of machines, initial seed, upper bound and lower bound : + 20 5 402959317 1195 1180 +processing times : + 71 27 55 90 11 18 42 64 73 95 22 53 32 5 94 12 41 85 75 38 + 13 11 73 43 27 33 57 42 71 3 11 49 8 3 47 58 23 79 99 23 + 61 25 52 72 89 75 60 28 94 95 18 73 40 61 68 75 37 13 65 7 + 21 8 5 8 58 59 85 35 84 97 93 60 99 29 94 41 51 87 97 11 + 91 13 7 95 20 69 45 44 29 32 94 84 60 49 49 65 85 52 8 58 +number of jobs, number of machines, initial seed, upper bound and lower bound : + 20 5 1369363414 1239 1226 +processing times : + 15 64 64 48 9 91 27 34 42 3 11 54 27 30 9 15 88 55 50 57 + 28 4 43 93 1 81 77 69 52 28 28 77 42 53 46 49 15 43 65 41 + 77 36 57 15 81 82 98 97 12 35 84 70 27 37 59 42 57 16 11 34 + 1 59 95 49 90 78 3 69 99 41 73 28 99 13 59 47 8 92 87 62 + 45 73 59 63 54 98 39 75 33 8 86 41 41 22 43 34 80 16 37 94 +number of jobs, number of machines, initial seed, upper bound and lower bound : + 20 5 2021925980 1206 1170 +processing times : + 34 20 57 47 62 40 74 94 9 62 86 13 78 46 83 52 13 70 40 60 + 5 48 80 43 34 2 87 68 28 84 30 35 42 39 85 34 36 9 96 84 + 86 35 5 93 74 12 40 95 80 6 92 14 83 49 36 38 43 89 94 33 + 28 39 55 21 25 88 59 40 90 18 33 10 59 92 15 77 31 85 85 99 + 8 91 45 55 75 18 59 86 45 89 11 54 38 41 64 98 83 36 61 19 +number of jobs, number of machines, initial seed, upper bound and lower bound : + 20 5 573109518 1230 1206 +processing times : + 37 36 1 4 64 74 32 67 73 7 78 64 98 60 89 49 2 79 79 53 + 59 16 90 3 76 74 22 30 89 61 39 15 69 57 9 13 71 2 34 49 + 65 94 96 47 35 34 84 3 60 34 70 57 8 74 13 37 87 71 89 57 + 70 3 43 14 26 83 26 65 47 94 75 30 1 71 46 87 78 76 75 55 + 94 98 63 83 19 79 54 78 29 8 38 97 61 10 37 16 78 96 9 91 +number of jobs, number of machines, initial seed, upper bound and lower bound : + 20 5 88325120 1108 1082 +processing times : + 27 92 75 94 18 41 37 58 56 20 2 39 91 81 33 14 88 22 36 65 + 79 23 66 5 15 51 2 81 12 40 59 32 16 87 78 41 43 94 1 93 + 22 93 62 53 30 34 27 30 54 77 24 47 39 66 41 46 24 23 68 50 + 93 22 64 81 94 97 54 82 11 91 23 32 26 22 12 23 34 87 59 2 + 38 84 62 10 11 93 57 81 10 40 62 49 90 34 11 81 51 21 39 27 diff --git a/README.md b/README.md index 4e2c9db..99d5467 100644 --- a/README.md +++ b/README.md @@ -17,18 +17,19 @@ |[A Web Crawler With asyncio Coroutines](http://aosabook.org/en/500L/pages/a-web-crawler-with-asyncio-coroutines.html)|A. Jesse Jiryu Davis and Guido van Rossum|[高效爬虫与协程](https://linux.cn/article-8265-1.html)| |[Dagoba: an in-memory graph database](http://aosabook.org/en/500L/pages/dagoba-an-in-memory-graph-database.html)|Dann Toliver|Dagoba:内存中的图形数据库| |[DBDB: Dog Bed Database](http://aosabook.org/en/500L/pages/dbdb-dog-bed-database.html)|Taavi Burns|[DBDB:非关系型数据库](https://github.com/HT524/500LineorLess_CN/blob/master/DBDB_Dog%20Bed%20Database/DBDB_%E9%9D%9E%E5%85%B3%E7%B3%BB%E5%9E%8B%E6%95%B0%E6%8D%AE%E5%BA%93.md)| -|[A Flow Shop Scheduler](http://aosabook.org/en/500L/pages/a-flow-shop-scheduler.html)|Dr. Christian Muise|Flow Shop 调度问题| +|[A Flow Shop Scheduler](http://aosabook.org/en/500L/pages/a-flow-shop-scheduler.html)|Dr. Christian Muise|[Flow Shop 调度问题](https://github.com/JoyceCoder/500LineorLess_CN/blob/master/Flow%20Shop%E8%B0%83%E5%BA%A6%E9%97%AE%E9%A2%98%20A%20Flow%20Shop%20Scheduler)| |[An Archaeology-Inspired Database](http://aosabook.org/en/500L/pages/an-archaeology-inspired-database.html)|Yoav Rubin|Python 实现数据库| -|[ A Python Interpreter Written in Python](http://aosabook.org/en/500L/pages/a-python-interpreter-written-in-python.html)|Allison Kaptur|[Python 解析器](https://linux.cn/article-7753-1.html)| +|[A Python Interpreter Written in Python](http://aosabook.org/en/500L/pages/a-python-interpreter-written-in-python.html)|Allison Kaptur|[Python 解析器](https://linux.cn/article-7753-1.html)| |[A 3D Modeller](http://aosabook.org/en/500L/pages/a-3d-modeller.html)|Erick Dransch|3D建模| -|[ A Simple Object Model](http://aosabook.org/en/500L/pages/a-simple-object-model.html)|Carl Friedrich Bolz|[简易对象模型](http://manjusaka.itscoder.com/2016/12/15/A-Simple-Object-Model/)| +|[A Simple Object Model](http://aosabook.org/en/500L/pages/a-simple-object-model.html)|Carl Friedrich Bolz|[简易对象模型](http://manjusaka.itscoder.com/2016/12/15/A-Simple-Object-Model/)| |[Optical Character Recognition (OCR)](http://aosabook.org/en/500L/pages/optical-character-recognition-ocr.html)|Marina Samuel|[光学文字识别](https://github.com/HT524/500LineorLess_CN/blob/master/%E5%85%89%E5%AD%A6%E6%96%87%E5%AD%97%E8%AF%86%E5%88%AB%20Optical%20Character%20Recognition%20(OCR)%2F%E5%85%89%E5%AD%A6%E6%96%87%E5%AD%97%E8%AF%86%E5%88%AB.md)| |[A Pedometer in the Real World](http://aosabook.org/en/500L/pages/a-pedometer-in-the-real-world.html)|Dessy Daskalov|现实计步器| -|[ A Rejection Sampler](http://aosabook.org/en/500L/pages/a-rejection-sampler.html)|Jessica B. Hamrick|[决策取样器](https://github.com/HT524/500LineorLess_CN/blob/master/%E5%86%B3%E7%AD%96%E9%87%87%E6%A0%B7%E5%99%A8_A_Rejection_Sampler/%E5%86%B3%E7%AD%96%E9%87%87%E6%A0%B7%E5%99%A8_A_Rejection_Sampler.md)| -|[ Web Spreadsheet](http://aosabook.org/en/500L/pages/web-spreadsheet.html)|Audrey Tang|Web 电子表格[(繁体中文版)](https://github.com/aosabook/500lines/blob/master/spreadsheet/spreadsheet.zh-tw.markdown)| +|[A Rejection Sampler](http://aosabook.org/en/500L/pages/a-rejection-sampler.html)|Jessica B. Hamrick|[决策取样器](https://github.com/HT524/500LineorLess_CN/blob/master/%E5%86%B3%E7%AD%96%E9%87%87%E6%A0%B7%E5%99%A8_A_Rejection_Sampler/%E5%86%B3%E7%AD%96%E9%87%87%E6%A0%B7%E5%99%A8_A_Rejection_Sampler.md)| +|[Web Spreadsheet](http://aosabook.org/en/500L/pages/web-spreadsheet.html)|Audrey Tang|Web 电子表格[(繁体中文版)](https://github.com/aosabook/500lines/blob/master/spreadsheet/spreadsheet.zh-tw.markdown)| |[Static Analysis](http://aosabook.org/en/500L/pages/static-analysis.html)|Leah Hanson|静态检查| |[A Template Engine](http://aosabook.org/en/500L/pages/a-template-engine.html)|Ned Batchelder|[模板引擎](http://www.jianshu.com/p/b5d4aa45e771)| -|[ A Simple Web Server](http://aosabook.org/en/500L/pages/a-simple-web-server.html)|Greg Wilson|[简易Web服务器](https://github.com/HT524/500LineorLess_CN/blob/master/%E7%AE%80%E6%98%93web%E6%9C%8D%E5%8A%A1%E5%99%A8%20A%20simple%20web%20server/%E7%AE%80%E6%98%93web%E6%9C%8D%E5%8A%A1%E5%99%A8.md)| +|[A Simple Web Server](http://aosabook.org/en/500L/pages/a-simple-web-server.html)|Greg Wilson|[简易Web服务器](https://github.com/HT524/500LineorLess_CN/blob/master/%E7%AE%80%E6%98%93web%E6%9C%8D%E5%8A%A1%E5%99%A8%20A%20simple%20web%20server/%E7%AE%80%E6%98%93web%E6%9C%8D%E5%8A%A1%E5%99%A8.md)| +|[Contingent: A Fully Dynamic System](http://aosabook.org/en/500L/contingent-a-fully-dynamic-build-system.html)|Brandon Rhodes and Daniel Rocco|[动态构建系统](https://github.com/JoyceCoder/500LineorLess_CN/blob/master/%E5%8A%A8%E6%80%81%E6%9E%84%E5%BB%BA%E7%B3%BB%E7%BB%9FContingent%20A%20Fully%20Dynamic%20Build%20System)| ## 术语库 @@ -50,7 +51,7 @@ |A Web Crawler With asyncio Coroutines|[harold](https://github.com/haroldrandom) , [skhe](https://github.com/skhe)|翻译中| |Dagoba: an in-memory graph database|[yanwang10](https://github.com/yanwang10)|翻译中| |DBDB: Dog Bed Database|[JinXJinX](https://github.com/JinXJinX)|已完成| -|A Flow Shop Scheduler|待认领|| +|A Flow Shop Scheduler|[JoyceCoder](https://github.com/JoyceCoder)|已完成| |An Archaeology-Inspired Database|待认领|| |A Python Interpreter Written in Python|[qingyunha](https://github.com/qingyunha)|已完成| |A 3D Modeller|[Kalung Tsang](https://github.com/TsangKalung)|翻译中| @@ -62,3 +63,5 @@ |Static Analysis|待认领|| |A Template Engine|[treelake](http://www.jianshu.com/users/66f24f2c0f36/latest_articles)|已完成| |A Simple Web Server|[skhe](https://github.com/skhe)|已完成| +|Contingent:A Fully Dynamic System|[JoyceCoder](https://github.com/JoyceCoder)|已完成| + diff --git "a/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/6-1.png" "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/6-1.png" new file mode 100644 index 0000000..e637ba3 Binary files /dev/null and "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/6-1.png" differ diff --git "a/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/contingent/__init__.py" "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/contingent/__init__.py" new file mode 100644 index 0000000..143c3e9 --- /dev/null +++ "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/contingent/__init__.py" @@ -0,0 +1,23 @@ +"""Contingent: a build system that responds to changes with minimal rebuilding. + +Most build systems allow only absolute rules, like "If ``aux.c`` has +been modified, then *always* rebuild ``aux.o``." But when documents are +built from a collection of files that contain cross references, absolute +rules are overly limiting. + +Contingent allows a programmer to instead describe a build process as a +network of Python function calls, that each describe one task. The +return values alone determine whether a given downstream task needs to +be rebuilt. Contingent will: + +* Automatically learn the dependencies between tasks. + +* Re-run every downstream task when an upstream output changes. + +* Prevent a routine from re-running if none of its inputs has changed. + +* Re-learn the inputs to a step that is re-run, in case dependencies + between resources are added or deleted in the course of a project's + history. + +""" diff --git "a/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/contingent/graphlib.py" "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/contingent/graphlib.py" new file mode 100644 index 0000000..45e5909 --- /dev/null +++ "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/contingent/graphlib.py" @@ -0,0 +1,106 @@ +"""A directed graph of tasks that use one another as inputs.""" + +from collections import defaultdict + +class Graph: + """A directed graph of the relationships among build tasks. + + A task can be identified by any hashable value that is eligible to + act as a Python dictionary key. If the user has a preferred order + for tasks when the graph is otherwise agnostic about output order, + they may set the ``sort_key`` attribute of their ``Graph`` instance + to a ``sorted()`` key function. + + """ + sort_key = None + + def __init__(self): + self._inputs_of = defaultdict(set) + self._consequences_of = defaultdict(set) + + def sorted(self, nodes, reverse=False): + """Try sorting `nodes`, else return them in iteration order. + + Graph methods that return a list of tasks but do not care about + their order can use this method to impose a user-selected order + instead. In particular, doctests benefit from the imposition of + a gratuitous stable order on sequences that otherwise lack any + stable order from one run to the next. This method tries to use + this Graph's ``sort_key`` function to order the given `nodes`. + If sorting does not succeed, then the nodes are returned in + their natural iteration order instead. + + """ + nodes = list(nodes) # grab nodes in one pass, in case it's a generator + try: + nodes.sort(key=self.sort_key, reverse=reverse) + except TypeError: + pass + return nodes + + def add_edge(self, input_task, consequence_task): + """Add an edge: `consequence_task` uses the output of `input_task`.""" + self._consequences_of[input_task].add(consequence_task) + self._inputs_of[consequence_task].add(input_task) + + def remove_edge(self, input_task, consequence_task): + """Remove an edge.""" + self._consequences_of[input_task].remove(consequence_task) + self._inputs_of[consequence_task].remove(input_task) + + def inputs_of(self, task): + """Return the tasks that are inputs to `task`.""" + return self.sorted(self._inputs_of[task]) + + def clear_inputs_of(self, task): + """Remove all edges leading to `task` from its previous inputs.""" + input_tasks = self._inputs_of.pop(task, ()) + for input_task in input_tasks: + self._consequences_of[input_task].remove(task) + + def tasks(self): + """Return all task identifiers.""" + return self.sorted(set(self._inputs_of).union(self._consequences_of)) + + def edges(self): + """Return all edges as ``(input_task, consequence_task)`` tuples.""" + return [(a, b) for a in self.sorted(self._consequences_of) + for b in self.sorted(self._consequences_of[a])] + + def immediate_consequences_of(self, task): + """Return the tasks that use `task` as an input.""" + return self.sorted(self._consequences_of[task]) + + def recursive_consequences_of(self, tasks, include=False): + """Return the topologically-sorted consequences of the given `tasks`. + + Returns an ordered list of every task that can be reached by + following consequence edges from the given `tasks` down to the + tasks that use them as inputs. The order of the returned list + is chosen so that all of the inputs to a consequence precede it + in the list. This means that if you run through the list + executing tasks in the given order, that tasks should find that + the inputs they need (or at least that they needed last time) + are already computed and available. + + If the flag `include` is true, then the `tasks` themselves will + be correctly sorted into the resulting sequence. Otherwise they + will be omitted. + + """ + def visit(task): + visited.add(task) + consequences = self._consequences_of[task] + for consequence in self.sorted(consequences, reverse=True): + if consequence not in visited: + yield from visit(consequence) + yield consequence + + def generate_consequences_backwards(): + for task in self.sorted(tasks, reverse=True): + yield from visit(task) + if include: + yield task + + visited = set() + return list(generate_consequences_backwards())[::-1] diff --git "a/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/contingent/io.py" "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/contingent/io.py" new file mode 100644 index 0000000..30eff9a --- /dev/null +++ "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/contingent/io.py" @@ -0,0 +1,58 @@ +import ctypes +import os +import struct +import time + +# Experimental inotify support for the sake of illustration. + +IN_MODIFY = 0x02 +_libc = None + +def _setup_libc(): + global _libc + if _libc is not None: + return + ctypes.cdll.LoadLibrary('libc.so.6') + _libc = ctypes.CDLL('libc.so.6', use_errno=True) + _libc.inotify_add_watch.argtypes = [ + ctypes.c_int, ctypes.c_char_p, ctypes.c_uint32] + _libc.inotify_add_watch.restype = ctypes.c_int + +def wait_on(paths): + # TODO: auto-detect when the OS does not offer libc or libc does not + # offer inotify_wait, and fall back to looping_wait_on(). + _setup_libc() + return inotify_wait_on(paths) + +def looping_wait_on(paths): + start = time.time() + changed_paths = [] + while not changed_paths: + time.sleep(0.5) + changed_paths = [path for path in paths + if os.stat(path).st_mtime > start] + return changed_paths + +def inotify_wait_on(paths): + paths = [path.encode('ascii') for path in paths] + fd = _libc.inotify_init() + descriptors = {} + if fd == -1: + raise OSError('inotify_init() error: {}'.format( + os.strerror(ctypes.get_errno()))) + try: + for path in paths: + rv = _libc.inotify_add_watch(fd, path, 0x2) + if rv == -1: + raise OSError('inotify_add_watch() error: {}'.format( + os.strerror(ctypes.get_errno()))) + descriptors[rv] = path + buf = os.read(fd, 1024) + # TODO: continue with some more reads with 0.1 second timeouts + # to empty the list of roughly-simultaneous events before + # closing our file descriptor and returning? + finally: + pass #os.close(fd) + time.sleep(0.1) # until above TODO is done + wd, mask, cookie, name_length = struct.unpack('iIII', buf) + return [descriptors[wd].decode('ascii')] diff --git "a/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/contingent/projectlib.py" "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/contingent/projectlib.py" new file mode 100644 index 0000000..5557494 --- /dev/null +++ "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/contingent/projectlib.py" @@ -0,0 +1,212 @@ +"""Provide a Project of related tasks that can be rebuilt when inputs change. + +""" +from contextlib import contextmanager +from collections import namedtuple +from functools import wraps +from .graphlib import Graph + +_unavailable = object() + +class Project: + """A collection of tasks that are related as inputs and consequences.""" + + def __init__(self): + self._graph = Graph() + self._graph.sort_key = task_key + self._cache = {} + self._cache_on = True + self._task_stack = [] + self._todo = set() + self._trace = None + + def start_tracing(self): + """Start recording every task that is invoked by this project.""" + self._trace = [] + + def stop_tracing(self, verbose=False): + """Stop recording task invocations, and return the trace as text. + + By default, the trace only shows those tasks that were invoked, + because no up-to-date return value was available for them in the + cache. But if the optional argument `verbose` is true then the + trace will also include tasks which experienced a cache hit, not + of a miss, and therefore did not need to be re-invoked. + + """ + text = '\n'.join( + '{}{} {}'.format( + '. ' * depth, + 'calling' if not_available else 'returning cached', + task) + for (depth, not_available, task) in self._trace + if verbose or not_available) + + self._trace = None + return text + + def _add_task_to_trace(self, task, return_value): + """Add a task to the currently running task trace.""" + tup = (len(self._task_stack), return_value is _unavailable, task) + self._trace.append(tup) + + def task(self, task_function): + """Decorate a function that defines one of the tasks for this project. + + The `task_function` should be a function that the programmer + wants to add to this project. This decorator will return a + wrapped version of the function. Each time the wrapper is + called with a particular argument list, it checks our internal + cache of previous calls to find out if we already know what this + function returns for those particular arguments. If we already + know, then the wrapper skips the function call itself and simply + returns the cached value. + + If the cache does not have an up-to-date return value, then the + wrapper invokes `task_function` and saves its return value to + the cache for future use before returning it to the caller. + + If it must invoke the `task_function`, then the wrapper places + the task atop the current stack of executing tasks. This makes + sure that if `task_function` invokes any further tasks, we can + remember that it used their return values and that it will need + to be re-invoked again in the future if any of those other tasks + changes its return value. + + """ + @wraps(task_function) + def wrapper(*args): + task = Task(wrapper, args) + + if self._task_stack: + self._graph.add_edge(task, self._task_stack[-1]) + + return_value = self._get_from_cache(task) + if self._trace is not None: + self._add_task_to_trace(task, return_value) + + if return_value is _unavailable: + self._graph.clear_inputs_of(task) + self._task_stack.append(task) + try: + return_value = task_function(*args) + finally: + self._task_stack.pop() + self.set(task, return_value) + + return return_value + + return wrapper + + def _get_from_cache(self, task): + """Return the output of the given `task`. + + If we do not have a current, valid cached value for `task`, + returns the singleton `_unavailable` instead. + + """ + if not self._cache_on: + return _unavailable + if task in self._todo: + return _unavailable + return self._cache.get(task, _unavailable) + + @contextmanager + def cache_off(self): + """Context manager that forces tasks to really be invoked. + + Even if the project has already cached the output of a + particular task, re-running the task inside of this context + manager will make the project re-invoke the task:: + + with project.cache_off(): + my_task() + + """ + original_value = self._cache_on + self._cache_on = False + try: + yield + finally: + self._cache_on = original_value + + def set(self, task, return_value): + """Add the `return_value` of `task` to our cache of return values. + + This gives us the opportunity to compare the new value against + the old one that had previously been returned by the task, to + determine whether the tasks that themselves use `task` as input + must be added to the to-do list for re-computation. + + """ + self._todo.discard(task) + if (task not in self._cache) or (self._cache[task] != return_value): + self._cache[task] = return_value + self._todo.update(self._graph.immediate_consequences_of(task)) + + def invalidate(self, task): + """Mark `task` as requiring recomputation on the next `rebuild()`. + + There are two ways that code preparing for a call to `rebuild()` + can signal that the value we have cached for a given task is no + longer valid. The first is to run the task manually and then + use `set()` to unilaterally install the new value in our cache. + The other is to call this method to simply invalidate the `task` + and let `rebuild()` itself call it when it next runs. + + """ + self._todo.add(task) + + def rebuild(self): + """Repeatedly rebuild every out-of-date task until all are current. + + If nothing has changed recently, our to-do list will be empty, + and this call will return immediately. Otherwise we take the + tasks in the current to-do list, along with every consequence + anywhere downstream of them, and call `get()` on every single + one to force re-computation of the tasks that are either already + invalid or that become invalid as the first few in the list are + recomputed. + + Unless there are cycles in the task graph, this will eventually + return. + + """ + while self._todo: + tasks = self._graph.recursive_consequences_of(self._todo, True) + for function, args in tasks: + function(*args) + +# Helper functions. + +def task_key(task): + """Return a sort key for a given task.""" + function, args = task + return function.__name__, args + +class Task(namedtuple('Task', ('task_function', 'args'))): + """Turn a call to a function into a task 2-tuple. + + Given a task function and an argument list, returns a task 2-tuple + that encapsulates the call as a single object. `Project` uses these + task objects for consequence tracking and caching. + + Raises `ValueError` if `args` is not hashable. + + """ + __slots__ = () + + def __new__(cls, task_function, args): + try: + hash(args) + except TypeError as e: + raise ValueError('arguments to project tasks must be immutable' + ' and hashable, not the {}'.format(e)) + + return super().__new__(cls, task_function, args) + + def __repr__(self): + "Produce a “syntactic,” source-like representation of the task." + + return '{}({})'.format(self.task_function.__name__, + ', '.join(repr(arg) for arg in self.args)) diff --git "a/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/contingent/rendering.py" "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/contingent/rendering.py" new file mode 100644 index 0000000..75e6c33 --- /dev/null +++ "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/contingent/rendering.py" @@ -0,0 +1,44 @@ +"""Output routines related to the graph type.""" + +def as_graphviz(graph): + """Render this ``contingent.Graph`` object as graphviz code. + + To turn the output of this routine into an image, you might save the + text in a file named "output.dot" and then run: + + $ dot -Tpng output.dot > output.png + + """ + edges = graph.edges() + inputs = set(input for input, consequence in edges) + consequences = set(consequence for input, consequence in edges) + lines = ['digraph {', 'graph [rankdir=LR];'] + append = lines.append + + def node(task): + return '"{}"'.format(task) + + append('node [fontname=Arial shape=rect penwidth=2 color="#DAB21D"') + append(' style=filled fillcolor="#F4E5AD"]') + + append('{rank=same') + for task in graph.sorted(inputs - consequences): + append(node(task)) + append('}') + + append('node [shape=rect penwidth=2 color="#708BA6"') + append(' style=filled fillcolor="#DCE9ED"]') + + append('{rank=same') + for task in graph.sorted(consequences - inputs): + append(node(task)) + append('}') + + append('node [shape=oval penwidth=0 style=filled fillcolor="#E8EED2"') + append(' margin="0.05,0"]') + + for task, consequence in edges: + append('{} -> {}'.format(node(task), node(consequence))) + + append('}') + return '\n'.join(lines) diff --git "a/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/ex6-contigent.ipynb" "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/ex6-contigent.ipynb" new file mode 100644 index 0000000..24e34b4 --- /dev/null +++ "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/ex6-contigent.ipynb" @@ -0,0 +1,1386 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 练习6:动态构建系统\n", + "-------\n", + ">本节练习节选自书籍《500 lines or less》——Contingent: A Fully Dynamic Build System\n", + "\n", + "## 介绍\n", + "\n", + "构建系统(build system)用于将源代码生成用户可用的目标(如库、可执行文件、脚本等),常见的有 GNU Make、CMake、Apache Ant 等。Python 中的 PyInstaller 也是构建系统的一种。本练习中,我们将实现一个构建系统,且试图对“动态交叉引用”问题提出一个解决方案。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "长期以来,构建系统一直是计算机编程中的标准工具。\n", + "\n", + "标准make构建系统的作者赢得了ACM软件系统奖,该标准构建系统于1976年首次开发。它不仅可以让您声明输出文件取决于一个(或多个)输入,还可以递归地进行操作。例如,程序可能取决于目标文件,而目标文件本身取决于相应的源代码:\n", + "```shell\n", + " prog: main.o\n", + " cc -o prog main.o\n", + "\n", + " main.o: main.c\n", + " cc -C -o main.o main.c\n", + "``` \n", + "如果make在下次调用时发现main.c源代码文件的修改时间比main.o的更新时间更长,那么它不仅会重建main.o对象文件,而且还会重建它。 也将自己重建prog。\n", + "\n", + "构建系统是分配给本科计算机科学专业学生的一个普通的学期项目,这不仅是因为构建系统几乎用在所有软件项目中,而且因为构建系统涉及基本数据结构和涉及有向图的算法(本章将在后面详细讨论) )。\n", + "\n", + "在构建系统背后经过数十年的使用和实践之后,人们可能会希望它们已完全成为通用的系统,甚至可以满足最奢侈的需求。\n", + "\n", + "但是,实际上,构建构件之间的一种常见交互作用(动态交叉引用问题)在大多数构建系统中都处理得很差,以至于在本章中,我们受到启发,不仅要练习经典的解决方案和用于解决问题的数据结构,的make问题,而是要显着延长该解决方案,以一个更为苛刻的领域。\n", + "\n", + "问题又是交叉引用。交叉引用会在哪里出现?在文本文档,文档和印刷书籍中!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1 问题:构建文档系统\n", + "\n", + "从源重建格式化文档的系统似乎总是做太多或做很少的工作。\n", + "\n", + "当他们响应较小的编辑时,使您等待不相关的章节被重新解析和重新设置格式时,它们会执行过多的工作。但是它们也可能重建得很少,从而给您带来不一致的最终产品。\n", + "\n", + "考虑一下Sphinx,它是用于正式Python语言文档和Python社区中许多其他项目的文档构建器。Sphinx项目的`index.rst`内容通常包括一个目录:\n", + "\n", + "```\n", + " Table of Contents\n", + " =================\n", + "\n", + " .. toctree::\n", + "\n", + " install.rst\n", + " tutorial.rst\n", + " api.rst\n", + "```\n", + "\n", + "该章节文件名列表告诉Sphinx在构建`index.html`输出文件时,包括指向三个命名章节中每个章节的链接。它还将包含指向每一章中任何部分的链接。除去其标记,上述标题和`toctree`命令产生的文本可能是:\n", + "\n", + "```\n", + " Table of Contents\n", + "\n", + " • Installation\n", + "\n", + " • Newcomers Tutorial\n", + " • Hello, World\n", + " • Adding Logging\n", + "\n", + " • API Reference\n", + " • Handy Functions\n", + " • Obscure Classes\n", + " `\n", + "```\n", + "如您所见,此目录是来自四个不同文件的信息的汇总。它的基本顺序和结构来自`index.rst`,而每章和节的实际标题均从这三章源文件本身中提取。\n", + "\n", + "如果您以后重新考虑本教程的章节标题,那么您将编辑第一行`tutorial.rst` 并写点更好的东西:\n", + "```\n", + " -Newcomers Tutorial\n", + " +Beginners Tutorial\n", + " ==================\n", + "\n", + " Welcome to the tutorial!\n", + " This text will take you through the basics of...\n", + "```\n", + "\n", + "当您准备重建时,Sphinx会做正确的事!它将重新构建教程章节本身和索引。(将输出管道输入到`cat`中使Sphinx成为在单独的行中宣布每个重建的文件,而不是使用空回车,用这些进度更新重复覆盖一行。)\n", + "\n", + "```\n", + " $ make html | cat\n", + " writing output... [ 50%] index\n", + " writing output... [100%] tutorial\n", + "``` \n", + "\n", + "因为Sphinx选择重建两个文档,所以`tutorial.html`现在不仅将其新标题放在顶部,而且输出`index.html`还将在目录中显示更新的章节标题。 Sphinx重建了所有内容,以使输出保持一致。\n", + "\n", + "如果对`tutorial.rst`的编辑较小,该怎么办?\n", + "\n", + "```\n", + " Beginners Tutorial\n", + " ==================\n", + "\n", + " -Welcome to the tutorial!\n", + " +Welcome to our project tutorial!\n", + " This text will take you through the basics of...\n", + "```\n", + "在这种情况下,无需重建`index.html`,因为对段落内部进行的较小编辑不会更改目录中的任何信息。\n", + "\n", + "但是事实证明,Sphinx并不像刚出现时那样聪明!\n", + "\n", + "即使结果完全一样,它将继续执行重建`index.html`的多余工作。\n", + "\n", + "```\n", + " writing output... [ 50%] index\n", + " writing output... [100%] tutorial\n", + "```\n", + "\n", + "您可以在`index.html`的“之前”和“之后”版本上运行“ diff”,以确认您的小修改对首页没有影响-但是Sphinx还是让您等待它的重建。\n", + "\n", + "您甚至可能没有注意到对于易于编译的小型文档的额外重建工作。但是,当您频繁调整和编辑冗长,复杂的文档或涉及诸如绘图或动画之类的多媒体生成的文档时,对工作流程的延迟会变得非常重要。\n", + "\n", + "尽管Sphinx至少会在不做任何更改的情况下努力不重建每一章-例如,它并没有响应`“ tutorial.rst”`编辑而重建`install.html`或`api.html`, 它所做的超出了必要。\n", + "\n", + "但是事实证明,Sphinx的作用甚至更糟:有时它做得太少,使您看到的输出不一致,用户可能会注意到。\n", + "\n", + "要查看其最简单的故障之一,请首先在您的API文档的顶部添加一个交叉引用:\n", + "\n", + "```\n", + " API Reference\n", + " =============\n", + "\n", + " +Before reading this, try reading our :doc:`tutorial`!\n", + " +\n", + " The sections below list every function\n", + " and every single class and method offered...\n", + "```\n", + "\n", + "对于目录,Sphinx通常会谨慎行事,将尽职地重建此API参考文档以及项目的`index.html`主页:\n", + "\n", + "\n", + "```\n", + " writing output... [ 50%] api\n", + " writing output... [100%] index\n", + "```\n", + "\n", + "在`api.html`输出文件中,您可以确认Sphinx是否已将标题包含在交叉引用的定位标记中:\n", + "\n", + "```html\n", + "

Before reading this, try reading our\n", + " \n", + " Beginners Tutorial\n", + " !

\n", + "```\n", + "\n", + "如果您现在再次对“ tutorial.rst”文件顶部的标题进行编辑怎么办?\n", + "\n", + "您将使*三个*输出文件无效:\n", + "\n", + "1.现在`tutorial.html`顶部的标题已过期,因此需要重建文件。\n", + "\n", + "2. `index.html`中的目录仍然具有旧标题,因此需要重建文档。\n", + "\n", + "3.`api.html`第一段中的嵌入式交叉引用仍然具有旧的章节标题,因此也需要重新构建。\n", + "\n", + "Sphinx会做什么呢?\n", + "```\n", + " writing output... [ 50%] index\n", + " writing output... [100%] tutorial\n", + "```\n", + "哎呀\n", + "\n", + "仅重建了两个文件,而不是三个。\n", + "\n", + "Sphinx无法正确重建您的文档。\n", + "\n", + "如果您现在将`HTML`推送到网络上,则用户将在`api.html`顶部的交叉引用中看到旧标题,但是一旦链接将其带到`tutorial.html`,用户将看到另一个标题(新标题)。 `本身。\n", + "\n", + "Sphinx支持的多种交叉引用可能会发生这种情况:章标题,节标题,段落,类,方法和函数。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2 构建系统和一致性\n", + "\n", + "上面概述的问题并非特定于Sphinx。它不仅困扰着其他文档系统(例如LaTeX),而且甚至会困扰那些只是试图用古老的make工具指导编译步骤的项目,如果它们的资产碰巧以交叉方式进行了交叉引用。\n", + "\n", + "由于该问题是古老且普遍存在的,因此其解决方案具有同样长的沿袭:\n", + "\n", + "```bash\n", + " $ rm -r _build /\n", + " $ make html\n", + "```\n", + "如果删除所有输出,则可以保证完全重建!有些项目甚至将别名`rm -r`命名为一个目标,`clean`因此只需快速`make clean`擦拭即可。\n", + "\n", + "通过消除每一个中间或输出资产的每个副本,一个庞大的`rm -r`团队能够迫使该构建重新开始,而不会缓存任何内容,而不会存储可能会导致产品过时的早期状态。\n", + "\n", + "但是我们可以开发出更好的方法吗?\n", + "\n", + "如果您的构建系统是一个持续的过程,当它从一个文档的源代码传递到另一个文档的文本时,注意到每个章节标题,每个章节标题和每个交叉引用的短语,该怎么办?它关于更改单个源文件后是否重建其他文档的决定可以是精确的,而不是仅仅猜测,而是可以纠正的,而不是使输出保持不一致状态。\n", + "\n", + "结果将是一个像旧的静态`make`工具一样的系统,但是该系统在构建文件时就了解了文件之间的依赖关系-在添加,更新和删除交叉引用时动态地添加和删除了依赖关系。\n", + "\n", + "在以下各节中,我们将**使用Python构造一个名为Contingent的工具**。\n", + "\n", + "Contingent在存在动态依赖项的情况下保证正确性,同时执行最少的重建步骤。尽管它可以应用于任何问题领域,但我们将针对上面概述的一小部分问题运行它。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3 链接任务以制作图形\n", + "任何构建系统都需要一种链接输入和输出的方法。例如,在我们上面的讨论中,三个标记文本分别产生一个相应的HTML输出文件。表达这些关系的最自然的方法是将它们组合成一个盒子和箭头(或者用数学术语来说是节点和边缘)来形成图形。\n", + "\n", + "![](figure1.png) \n", + "\n", + "通过解析三个输入文本生成的三个文件。\n", + "\n", + "程序员用来解决构建系统问题的每种语言都将提供各种数据结构,用这些数据结构可以表示节点和边的图形。\n", + "\n", + "我们如何用Python表示这样的图?\n", + "\n", + "Python语言通过直接支持四种通用数据结构的语言语法来赋予它们优先级。您可以通过简单地在源代码中键入它们的文字表示形式来创建这些四大数据结构的新实例,并且它们的四个类型对象可以作为内置符号使用,而无需导入即可使用。\n", + "\n", + "该元组是用于保存异构数据只读序列-在元组中的每个时隙典型地是指不同的东西。在这里,元组将主机名和端口号放在一起,如果重新排序元素,它将失去其含义:\n", + "\n", + "```python\n", + "('dropbox.com', 443)\n", + "```\n", + "\n", + "**list**是用于保存同质数据的可变序列-每个项目通常具有与对等项目相同的结构和含义。\n", + "\n", + "列表既可以用于保留数据的原始输入顺序,也可以重新排列或排序以建立新的更有用的顺序。\n", + "\n", + "```python\n", + "['C', 'Awk', 'TCL', 'Python', 'JavaScript']\n", + "```\n", + "\n", + "**set**不保留顺序。 集合仅记住是否已添加给定值,而不记住多少次,因此记住用于从数据流中删除重复项的数据结构。 例如,以下两个集合将各自包含三个元素:\n", + "\n", + "```python\n", + "{3, 4, 5}\n", + "{3, 4, 5, 4, 4, 3, 5, 4, 5, 3, 4, 5}\n", + "```\n", + "\n", + "**dict**是用于存储键可访问值的关联数据结构。Dicts允许程序员选择索引每个值的键,而不是像tuple和list那样使用自动整数索引。查找由一个散列表支持,这意味着无论dict有12个键还是有100万个键,查找dict键的速度都是相同的。\n", + "\n", + "\n", + "```python\n", + "{'ssh': 22, 'telnet': 23, 'domain': 53, 'http': 80}\n", + "```\n", + "\n", + "Python灵活性的关键在于这四个数据结构是可组合的。 程序员可以将它们彼此任意嵌套以产生更复杂的数据存储,其规则和语法仍然是基本元组,列表,集合和字典中的简单规则。\n", + "\n", + "假设我们的每个图形边缘都需要至少知道其原始节点和目标节点,那么最简单的表示可能就是元组。\n", + "\n", + "顶部可能看起来像:\n", + "\n", + "```python\n", + " ('tutorial.rst', 'tutorial.html')\n", + "```\n", + "\n", + "我们如何存储多个边缘? 虽然我们最初的冲动可能只是简单地将所有边缘元组放入列表中,但这会带来不利条件。 列表会谨慎地保持顺序,但是谈论图形中边的绝对顺序没有意义。 即使我们只希望能够在`tutorial.rst`和` tutorial.html`之间绘制单个箭头,列表也会非常乐意保存完全相同的边缘的多个副本。 因此,正确的选择是集合,这将使我们表示为:\n", + "\n", + "```python\n", + " {('tutorial.rst', 'tutorial.html'),\n", + " ('index.rst', 'index.html'),\n", + " ('api.rst', 'api.html')}\n", + "```\n", + "\n", + "这将允许我们所有边缘的快速迭代,单个边缘的快速插入和删除操作,以及一种检查特定边缘是否存在的快速方法。\n", + "\n", + "不幸的是,这些并不是我们唯一需要的操作。\n", + "\n", + "像Contingent这样的构建系统需要了解给定节点与连接到该节点的所有节点之间的关系。 例如,当`api.rst`更改时,Contingent需要知道哪些资产(如果有)受该更改影响,以最大程度地减少执行的工作并确保完整的构建。 要回答这个问题-`api.rst`下游有哪些节点?” —我们需要检查`api.rst`中的“出局”边缘。\n", + "\n", + "但是构建依赖关系图需要Contingent也要考虑节点的`inputs`。 例如,当构建系统组装输出文档`tutorial.html`时,使用了哪些输入? 通过观察每个节点的输入,Contingent可以知道`api.html`依赖于`api.rst`,而`tutorial.html`则不依赖。\n", + "\n", + "当源发生更改并进行重建时,Contingent会重建每个更改的节点的传入边缘,以删除潜在的陈旧边缘,并重新学习任务这次使用的资源。\n", + "\n", + "我们的元组组很难回答这些问题中的任何一个。 如果我们需要了解`api.html`与图的其余部分之间的关系,则需要遍历整个集合以查找以`api.html`节点开头或结尾的边。\n", + "\n", + "像Python的`dict`这样的关联数据结构将允许直接从特定节点中查找所有边缘,从而使这些琐事变得更加容易:\n", + "```python\n", + " {'tutorial.rst': {('tutorial.rst', 'tutorial.html')},\n", + " 'tutorial.html': {('tutorial.rst', 'tutorial.html')},\n", + " 'index.rst': {('index.rst', 'index.html')},\n", + " 'index.html': {('index.rst', 'index.html')},\n", + " 'api.rst': {('api.rst', 'api.html')},\n", + " 'api.html': {('api.rst', 'api.html')}}\n", + "```\n", + "查找特定节点的边缘现在将非常快,其代价是必须将每个边缘存储两次:一次存储在一组传入边缘中,一次存储在一组向外边缘中。\n", + "\n", + "但是必须手动检查每组中的边缘,以查看哪些入站和哪些出站。 在节点的边缘集中不断重复命名节点也是有点多余的。\n", + "\n", + "这两个反对意见的解决方案是将传入和传出的边放置在它们自己单独的数据结构中,这也将使我们不必为涉及的每个边都一遍又一遍地提及该节点。\n", + "\n", + "```python\n", + " incoming = {\n", + " 'tutorial.html': {'tutorial.rst'},\n", + " 'index.html': {'index.rst'},\n", + " 'api.html': {'api.rst'},\n", + " }\n", + "\n", + " outgoing = {\n", + " 'tutorial.rst': {'tutorial.html'},\n", + " 'index.rst': {'index.html'},\n", + " 'api.rst': {'api.html'},\n", + " }\n", + "```\n", + "注意,“ outgoing”直接用Python语法表示了我们之前所写的内容:构建系统会将左侧的源文档转换为右侧的输出文档。\n", + "\n", + "对于这个简单的示例,每个源仅指向一个输出-所有输出集都只有一个元素-但是不久之后我们将看到示例,其中单个输入节点具有多个下游后果。\n", + "\n", + "该集合字典数据结构中的每个边都得到两次表示,一次是从一个节点的出站边缘(` tutorial.rst`→` tutorial.html`),另一次是到另一节点的传入边缘(`tutorial.html`←`tutorial.rst`)。\n", + "\n", + "只是从边缘任一端的两个节点的相反角度来看,这两种表示形式捕获了完全相同的关系。\n", + "\n", + "但是作为这种冗余的回报,数据结构支持Contingent需要的快速查找。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4 Class的使用\n", + "\n", + "您可能对以上关于Python数据结构的讨论中缺少类感到惊讶。毕竟,类是构建应用程序的一种常见机制,并且在其拥护者和批评者之间进行激烈辩论的频率并不高。曾经有人认为班级很重要,可以围绕它们设计整个教育课程,并且大多数流行的编程语言都包含用于定义和使用它们的专用语法。\n", + "\n", + "但是事实证明,类通常与数据结构设计问题正交。类没有为我们提供完全替代的数据建模范例,而是仅重复了我们已经看到的数据结构:\n", + "\n", + "- 类实例被*实现*为字典。\n", + "- 类实例的*使用*就像可变的元组。\n", + "该类通过更漂亮的语法提供键查找,您可以在其中用`graph.incoming`代替`graph[\"incoming\"]`。但是,实际上,类实例几乎从未用作通用键值存储。相反,它们用于按属性名称组织相关但异构的数据,实现细节封装在一致且令人难忘的接口后面。\n", + "\n", + "因此,您不必创建一个主机名和一个端口号在元组中,而是必须记住哪个名在前,哪个名在后,而创建一个`Address`类,其实例分别具有`host`和`port`属性。然后,您可以将`Address`对象传递到否则会有匿名元组的位置。代码变得更易于阅读和编写。但是,使用类实例并不能真正改变我们在进行数据设计时遇到的任何问题。它只是提供了一个更漂亮,更匿名的容器。\n", + "\n", + "因此,类的真正价值不是在于它们改变了数据设计的科学。类的价值在于它们使您可以从程序的其余部分隐藏数据设计!\n", + "\n", + "成功的应用程序设计取决于我们利用Python提供的强大的内置数据结构的能力,同时最大程度地减少了随时需要记住的细节量。类提供了解决这一明显难题的机制:有效使用类,可以围绕系统整体设计的一些小子集提供外观。在一个子集(`Graph`例如`a`)中工作时,只要记住其他子集的接口,我们就可以忘记其他实现的细节。这样,程序员通常会发现自己在编写系统的过程中处于多个抽象层次之间,现在正在使用特定子系统的特定数据模型和实现细节,现在通过其接口连接了较高层次的概念。\n", + "\n", + "例如,从外部,代码可以简单地请求一个新`Graph`实例:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "from contingent import graphlib\n", + "g = graphlib.Graph()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "无需了解Graph工作原理的详细信息。仅使用图形的代码在处理图形时(例如添加边或执行其他一些操作时)仅看到接口动词(即方法调用):" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "g.add_edge('index.rst', 'index.html')\n", + "g.add_edge('tutorial.rst', 'tutorial.html')\n", + "g.add_edge('api.rst', 'api.html')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "在没有显式创建“ node”和“ edge”对象的情况下,我们在图形中添加了边,并且在这些早期示例中,节点本身只是字符串。\n", + "\n", + "Python语言和社区明确并有目的地强调使用简单的通用数据结构来解决问题,而不是为要解决的问题的每一个细节创建自定义类。这是`Pythonic`解决方案概念的一个方面:Pythonic解决方案试图最大程度地减少语法开销,并利用Python强大的内置工具和广泛的标准库。\n", + "\n", + "考虑到这些考虑因素,让我们回到`Graph`类,检查其设计和实现,以查看数据结构和类接口之间的相互作用。\n", + "\n", + "Graph构造新实例时,已经使用上一节中概述的逻辑构建了一对字典来存储边:\n", + "\n", + "```python\n", + "class Graph:\n", + " \"\"\"A directed graph of the relationships among build tasks.\"\"\"\n", + "\n", + " def __init__(self):\n", + " self._inputs_of = defaultdict(set)\n", + " self._consequences_of = defaultdict(set)\n", + "```\n", + "\n", + "在属性名称前面的前导下划线`_inputs_of`和`_consequences_of `是在Python社区信号共同约定的属性是私有的。这种约定是社区建议程序员通过空间和时间彼此传递消息和警告的一种方式。认识到需要指出公共对象属性和内部对象属性之间的差异,社区采用了单个前导下划线作为对其他程序员(包括我们将来的自己)的简洁一致的指示,即该属性最好被视为内部无形内部机制的一部分。班级。\n", + "\n", + "为什么我们使用`defaultdict`标准指令而不是标准指令?将字典与其他数据结构组成时的常见问题是处理缺少的键。在正常情况下,检索不存在的键将引发`KeyError`:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "ename": "KeyError", + "evalue": "'index.rst'", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mKeyError\u001b[0m Traceback (most recent call last)", + "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[0mconsequences_of\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;33m{\u001b[0m\u001b[1;33m}\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mconsequences_of\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'index.rst'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0madd\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'index.html'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[1;31mKeyError\u001b[0m: 'index.rst'" + ] + } + ], + "source": [ + "consequences_of = {}\n", + "consequences_of['index.rst'].add('index.html')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "这种需求非常普遍,以至于Python包含一个特殊的实用工具,`defaultdict`您可以通过它提供一个返回缺少键值的函数。当我们询问`Graph`尚未看到的边缘时,我们将得到一个空的`set`而不是一个异常:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "set()" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from collections import defaultdict\n", + "consequences_of = defaultdict(set)\n", + "consequences_of['api.rst']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "通过这种方式来构造我们的实现,意味着每个键的首次使用看上去都与使用特定键的第二次及以后相同:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "consequences_of['index.rst'].add('index.html')\n", + "'index.html' in consequences_of['index.rst']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "有了这些技术,我们就可以检查的实现`add_edge`,我们之前曾使用它来构建图形。\n", + "\n", + "```python\n", + " def add_edge(self, input_task, consequence_task):\n", + " \"\"\"Add an edge: `consequence_task` uses the output of `input_task`.\"\"\"\n", + " self._consequences_of[input_task].add(consequence_task)\n", + " self._inputs_of[consequence_task].add(input_task)\n", + "```\n", + "这种方法掩盖了以下事实:每个新边都需要两个(而不是一个)存储步骤,以便我们在两个方向上都知道。并注意如何`add_edge()`不知道或不在乎之前是否曾见过任何一个节点。由于输入和后果数据结构均为`a defaultdict(set)`,因此该`add_edge()`方法对于节点的新颖性仍然一无所知- `defaultdict`通过动态创建新`set`对象来解决差异。正如我们在上面看到的,如果不使用`defaultdict`,`add_edge()`时间将增加三倍。更重要的是,对结果代码的理解和推理将更加困难。此实现演示了`Pythonic`解决问题的方法:简单,直接和简洁。\n", + "\n", + "还应该为调用者提供一种访问每个边缘的简单方法,而不必学习如何遍历我们的数据结构:\n", + "```python\n", + " def edges(self):\n", + " \"\"\"Return all edges as ``(input_task, consequence_task)`` tuples.\"\"\"\n", + " return [(a, b) for a in self.sorted(self._consequences_of)\n", + " for b in self.sorted(self._consequences_of[a])]\n", + "```\n", + "\n", + "该`Graph.sorted()`方法尝试按照可以为用户提供稳定输出顺序的自然排序顺序(例如字母顺序)对节点进行排序。\n", + "\n", + "通过使用这种遍历方法,我们可以看到,在前面的三个` add`方法调用之后, `g`现在表示如下:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[('api.rst', 'api.html'),\n", + " ('index.rst', 'index.html'),\n", + " ('tutorial.rst', 'tutorial.html')]\n" + ] + } + ], + "source": [ + "from pprint import pprint\n", + "pprint(g.edges())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "由于我们现在有了一个真实的实时Python对象,而不仅仅是一个图形,因此我们可以向它提出有趣的问题!例如,当Contingent从源文件构建博客时,它将需要知道诸如“什么取决于`api.rst`?”之类的内容。当`api.rst`内容更改时:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['api.html']" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "g.immediate_consequences_of('api.rst')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`g`告诉Contingent,当`api.rst`更改时,`api.html`就会过时,必须重新构建。\n", + "\n", + "`index.html`呢?" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[]" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "g.immediate_consequences_of('index.html')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "返回了一个空列表,表示`index.html`在图的右边缘,因此如果更改,则无需再构建任何东西。由于已经进行了布局数据的工作,因此可以非常简单地表示此查询:\n", + "```python\n", + " def immediate_consequences_of(self, task):\n", + " \"\"\"Return the tasks that use `task` as an input.\"\"\"\n", + " return self.sorted(self._consequences_of[task])\n", + "```\n" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "from contingent.rendering import as_graphviz\n", + "open('figure1.dot', 'w').write(as_graphviz(g)) and None" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "我们将为每个需要通过解析输入文件生成然后传递给我们的其他例程之一的标题字符串创建一个节点:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "g.add_edge('api.rst', 'api-title')\n", + "g.add_edge('api-title', 'index.html')\n", + "g.add_edge('tutorial.rst', 'tutorial-title')\n", + "g.add_edge('tutorial-title', 'index.html')" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[('api-title', 'index.html'),\n", + " ('api.rst', 'api-title'),\n", + " ('api.rst', 'api.html'),\n", + " ('index.rst', 'index.html'),\n", + " ('tutorial-title', 'index.html'),\n", + " ('tutorial.rst', 'tutorial-title'),\n", + " ('tutorial.rst', 'tutorial.html')]\n" + ] + } + ], + "source": [ + "pprint(g.edges())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![](figure2.png)\n", + "\n", + "只要提及的标题发生变化,`index.html`随时准备重建。\n", + "本手册演练说明了Contingent最终将为我们做些什么:该图`g`捕获了项目文档中各种工件的输入和后果。\n", + "\n", + "## 5 学习联系\n", + "现在,我们有了一种方法,让Contingent可以跟踪任务及其之间的关系。但是,如果我们更仔细地查看上图,我们会发现它实际上有点波折和模糊:`api.rst`是怎么产生`api.html`的?我们如何知道`index.html`需要教程中的标题?以及如何解决这种依赖性?\n", + "\n", + "当我们手动构建后果图时,我们对这些想法的直觉概念就起作用了,但是不幸的是,计算机并不是非常直观的,因此我们需要更精确地了解我们想要的东西。\n", + "\n", + "从源产生输出需要采取什么步骤?如何定义和执行这些步骤?Contingent如何知道它们之间的联系?\n", + "\n", + "在Contingent中,构建任务被定义为“函数加参数”。\n", + "\n", + "- 这些函数定义特定项目理解如何执行的动作。\n", + "- 这些参数提供了具体信息:应阅读哪个源文档,需要哪个博客标题。\n", + "\n", + "当它们运行时,这些函数可以依次调用其他任务函数,并传递它们需要答案的任何参数。\n", + "\n", + "为了了解它是如何工作的,我们现在实际上将实现开头描述的文档构建器。为了避免陷入困境,在本例中,我们将使用简化的输入和输出文档格式。我们的输入文档将在第一行包含一个标题,其余文本构成正文。交叉引用将只是反引号中包含的源文件名,在输出中将其替换为输出中相应文档的标题。\n", + "\n", + "以下是示例`index.txt`,`api.txt`和`tutorial.txt`的内容,包括格式的标题,文档正文和交叉引用:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "index = \"\"\"\n", + " Table of Contents\n", + " -----------------\n", + " * `tutorial.txt`\n", + " * `api.txt`\n", + " \"\"\"\n", + "\n", + "tutorial = \"\"\"\n", + " Beginners Tutorial\n", + " ------------------\n", + " Welcome to the tutorial!\n", + " We hope you enjoy it.\n", + " \"\"\"\n", + "\n", + "api = \"\"\"\n", + " API Reference\n", + " -------------\n", + " You might want to read\n", + " the `tutorial.txt` first.\n", + " \"\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "现在我们有了一些可以使用的原始资料,基于Contingent的博客构建者需要哪些功能?\n", + "\n", + "在上面的简单示例中,HTML输出文件直接从源代码开始,但是在实际的系统中,将源代码转换为标记涉及几个步骤:\n", + "- 从磁盘读取原始文本\n", + "- 将文本解析为方便的内部表示形式\n", + "- 处理所有指令。\n", + "\n", + "作者可能已经指定,解决了交叉引用或其他外部依赖项(例如include文件),并应用了一个或多个视图转换将内部表示形式转换为其输出形式。\n", + "\n", + "Contingent通过将任务分组到一个“Project”来管理任务,这是一种构建系统的多管闲事者,它将自己注入到构建过程的中间,注意到每次一个任务与另一个任务对话,以构建所有任务之间的关系图。\n" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "from contingent.projectlib import Project, Task\n", + "project = Project()\n", + "task = project.task" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "本练习开头给出的示例的构建系统可能涉及一些任务。\n", + "\n", + "我们的`read()`任务将假装从磁盘读取文件。 由于我们确实在变量中定义了源文本,因此只需将文件名转换为相应的文本即可。" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "filesystem = {'index.txt': index,\n", + " 'tutorial.txt': tutorial,\n", + " 'api.txt': api}\n", + "@task\n", + "def read(filename):\n", + " return filesystem[filename]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`parse()`任务根据我们文档格式的规范解释文件内容的原始文本。\n", + "\n", + "我们的格式非常简单:文档标题显示在第一行,其余内容被视为文档正文。" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "@task\n", + "def parse(filename):\n", + " lines = read(filename).strip().splitlines()\n", + " title = lines[0]\n", + " body = '\\n'.join(lines[2:])\n", + " return title, body" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "请注意`parse()`和`read()`之间的解析的第一个任务是将给定的文件名传递给`read()`,文件名将查找并返回该文件的内容。\n", + "\n", + "`title_of()`给定源文件名称的任务将返回文档的标题:" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "@task\n", + "def title_of(filename):\n", + " title, body = parse(filename)\n", + " return title" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "最后的任务, `render()`将文档的内存表示形式转换为输出形式。实际上,它是`parse()`的倒数。而`parse()`采用符合规范的输入文档并将其转换为内存中表示, `render()`采用内存中表示并生成符合某些规范的输出文档。" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "import re\n", + "LINK = '{}'\n", + "PAGE = '

{}

\\n

\\n{}\\n

'\n", + "\n", + "def make_link(match):\n", + " filename = match.group(1)\n", + " return LINK.format(filename, title_of(filename))\n", + "\n", + "@task\n", + "def render(filename):\n", + " title, body = parse(filename)\n", + " body = re.sub(r'`([^`]+)`', make_link, body)\n", + " return PAGE.format(title, body)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "

Beginners Tutorial

\n", + "

\n", + " Welcome to the tutorial!\n", + " We hope you enjoy it.\n", + "

\n" + ] + } + ], + "source": [ + "print(render('tutorial.txt'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![](figure3.png)\n", + "说明任务图,该任务图可过渡地连接生成输出所需的所有任务,从读取输入文件到解析和转换文档并呈现文档。\n", + "\n", + "每次调用新任务时,Contingent都可以假定当前位于堆栈顶部的任务已调用该任务,并且将使用其输出。维护堆栈将需要几个额外的步骤来围绕任务$T$的调用:\n", + "\n", + "- 将$T$推入堆栈。\n", + "- 执行$T$,让它调用它需要的任何其他任务。\n", + "- 将$T$弹出堆栈。\n", + "- 返回其结果。\n", + "\n", + "为了拦截任务调用,Project利用了Python的一项关键功能:`function decorators`。装饰器可以在定义函数时对其进行处理或转换。该`Project.task`装饰用这个机会来包装的另一个功能,里面每个任务的包装,这使得包装之间的责任完全分离-这有可能会担心图形和堆栈管理代表项目-而我们的任务功能专注于文档处理。\n", + "\n", + "这是`task`装饰器样板的外观:\n", + "\n", + "```python\n", + " from functools import wraps\n", + "\n", + " def task(function):\n", + " @wraps(function)\n", + " def wrapper(*args):\n", + " # wrapper body, that will call function()\n", + " return wrapper\n", + "```\n", + "这是一个典型的Python装饰器声明。然后,可以通过`@`在`def`创建函数的字符顶部命名该函数,将其应用于函数:\n", + "\n", + "```python\n", + " @task\n", + " def title_of(filename):\n", + " title, body = parse(filename)\n", + " return title\n", + "```\n", + "完成此定义后,`title_of`将引用该函数的包装版本。包装器可以通过名称访问函数的原始版本`function`,并在适当的时间对其进行调用。Contingent包装器的主体运行如下内容:\n", + "\n", + "```python\n", + " def task(function):\n", + " @wraps(function)\n", + " def wrapper(*args):\n", + " #----------------\n", + " task = Task(wrapper, args)\n", + " if self.task_stack:\n", + " self._graph.add_edge(task, self.task_stack[-1])\n", + " self._graph.clear_inputs_of(task)\n", + " self._task_stack.append(task)\n", + " try:\n", + " value = function(*args)\n", + " finally:\n", + " self._task_stack.pop()\n", + "\n", + " return value\n", + " #---------------\n", + " return wrapper\n", + "```\n", + "\n", + "该包装器执行几个关键的维护步骤:\n", + "\n", + "1. 为方便起见,将任务(一个函数及其参数)打包到一个小对象中。wrapper在此命名为函数的包装版本。\n", + "\n", + "2. 如果此任务已由正在执行的当前任务调用,则添加一条边,以捕获该任务是已在运行的任务的输入这一事实。\n", + "\n", + "3. 忘记我们上一次可能在该任务上学到的知识,因为这一次可能会做出新的决定-例如,如果API指南的源文本不再提及该Tutorial,那么`render()`它将不再请求该`Tutorial`文档`title_of()`。\n", + "\n", + "4. 将该任务推入任务堆栈的顶部,以防其决定在执行工作时调用其他任务。\n", + "\n", + "5. 在`try...finally`块内调用任务,以确保我们正确完成了从堆栈中删除的任务,即使该任务因引发异常而死亡。\n", + "\n", + "6. 返回任务的返回值,以便此包装的调用者将无法得知他们没有简单地调用普通任务函数本身。\n", + "\n", + "步骤4和5维护任务堆栈本身,然后由步骤2用于执行结果跟踪,这是我们首先构建任务堆栈的全部原因。\n", + "\n", + "由于每个任务都被其自身的包装函数副本所包围,因此,正常任务堆栈的单纯调用和执行将产生关系图,这是看不见的副作用。因此,我们谨慎地在定义的每个处理步骤周围使用包装器:\n", + "```python\n", + " @task\n", + " def read(filename):\n", + " # body of read\n", + "\n", + " @task\n", + " def parse(filename):\n", + " # body of parse\n", + "\n", + " @task\n", + " def title_of(filename):\n", + " # body of title_of\n", + "\n", + " @task\n", + " def render(filename):\n", + " # body of render\n", + "```\n", + "\n", + "当我们调用`parse('tutorial.txt') `装饰器时,我们了解了`parse`和`read`之间的联系。我们可以通过建立另一个`Task`元组来询问这种关系,并询问如果其输出值更改会带来什么后果:" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "read('tutorial.txt')\n" + ] + } + ], + "source": [ + "task = Task(read, ('tutorial.txt',))\n", + "print(task)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[parse('tutorial.txt')]" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "project._graph.immediate_consequences_of(task)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "重新读取`tutorial.txt`文件并发现其内容已更改的结果时,我们需要重新执行该文档的`parse()`例程。\n", + "\n", + "如果我们渲染整个文档集会怎样?Contingent是否能够学习整个构建过程?" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "

Table of Contents

\n", + "

\n", + " * Beginners Tutorial\n", + " * API Reference\n", + "

\n", + "==============================\n", + "

Beginners Tutorial

\n", + "

\n", + " Welcome to the tutorial!\n", + " We hope you enjoy it.\n", + "

\n", + "==============================\n", + "

API Reference

\n", + "

\n", + " You might want to read\n", + " the Beginners Tutorial first.\n", + "

\n", + "==============================\n" + ] + } + ], + "source": [ + "for filename in 'index.txt','tutorial.txt','api.txt':\n", + " print(render(filename))\n", + " print('=' * 30)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It works!\n", + "\n", + "从输出中,我们可以看到转换将源文档中的指令标题替换为文档标题,表明Contingent能够发现构建文档所需的各种任务之间的联系。\n", + "\n", + "![](figure4.png)\n", + "\n", + "通过观察一个任务,通过task包装机调用另一个任务, Project就自动了解了输入和后果图。由于它具有完整的结果图可供使用,如果任何任务的输入发生变化,Contingent都知道要重建的所有事物。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 6 追踪后果\n", + "\n", + "初始构建运行完成后,Contingent需要监视输入文件的更改。当用户完成一个新的编辑并运行“保存”时,该`read()`方法及其*后果*都需要被调用。\n", + "\n", + "这将要求我们以与创建图形相反的顺序移动图形。您会回想起,它是通过为API参考调用`render()`和`parse()`,并最终调用该`read()`任务而构建的。现在我们朝另一个方向前进:我们知道`read()`现在将返回新的内容,并且我们需要弄清楚其将产生什么后果。\n", + "\n", + "编译结果的过程是一个递归过程,因为每个结果本身可以有其他依赖于此的任务。我们可以通过重复调用图形来手动执行此递归。(请注意,我们在这里利用了Python提示符保存名称下显示的最后一个值_供后续表达式使用的事实。)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[parse('api.txt')]" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "task = Task(read, ('api.txt',))\n", + "project._graph.immediate_consequences_of(task)" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[render('api.txt'), title_of('api.txt')]" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "t1, = _\n", + "project._graph.immediate_consequences_of(t1)" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[]" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "t2, t3 = _\n", + "project._graph.immediate_consequences_of(t2)" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[render('index.txt')]" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "project._graph.immediate_consequences_of(t3)" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[]" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "t4, = _\n", + "project._graph.immediate_consequences_of(t4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "这种递归任务不断地寻找直接的结果,只有当我们到达没有进一步结果的任务时才停止,这是一种足够基本的图形操作,`Graph`类中的一个方法直接支持它:" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[parse('api.txt'),\n", + " render('api.txt'),\n", + " title_of('api.txt'),\n", + " render('index.txt')]\n" + ] + } + ], + "source": [ + "# Secretly adjust pprint to a narrower-than-usual width:\n", + "_pprint = pprint\n", + "pprint = lambda x: _pprint(x, width=40)\n", + "pprint(project._graph.recursive_consequences_of([task]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "实际上,`recursive_consequences_of()尝`试变得聪明一点。如果某个特定任务由于其他多个任务的下游结果而重复出现,则应注意在输出列表中仅提及一次,并将其移至末尾,以便仅在作为其输入的任务之后出现。这种智能由拓扑排序的经典深度优先实现实现,该算法通过隐藏的递归辅助函数在Python中编写起来相当容易。查看[graphlib.py](contingent/graphlib.py)源代码以获取详细信息。\n", + "\n", + "如果在检测到更改后,我们谨慎地重新运行递归结果中的每个任务,那么Contingent将能够避免重建得太少。但是,我们的第二个挑战是避免重建过多。再次参考下图。\n", + "![](figure4.png)\n", + "\n", + "我们希望避免每次`tutorial.txt`更改时都重建所有三个文档,因为大多数编辑可能不会影响其标题,而只会影响其正文。如何做到这一点?\n", + "\n", + "解决方案是使图形重新计算依赖于缓存。当逐步解决更改的递归结果时,我们将仅调用输入与上次不同的任务。\n", + "\n", + "此优化将涉及最终的数据结构。我们将提供`Project`一个`_todo`集合,用于记住每个至少更改了一个输入值并因此需要重新执行的任务。因为只有`_todo`已过期的任务,所以构建过程可以跳过运行任何任务,除非它们出现在其中。\n", + "\n", + "同样,Python方便且统一的设计使这些功能非常易于编码。由于任务对象是可散列的,因此 `_todo`可以简单地设置为一组集合,该集合可以通过标识记住任务项--保证任务永远不会出现两次--并且`_cache`先前运行的返回值可以是将任务作为键的命令。\n", + "\n", + "更准确地说,只要`_todo`非空,重建步骤就必须保持循环。在每个循环中,它应该:\n", + "\n", + "- 调用`recursive_consequences_of()`并传递中列出的每个任务`_todo`。返回值将不仅是`_todo`任务本身的列表,还包括任务下游的每个任务的列表-换句话说,如果这次输出不同,则可能需要重新执行每个任务。\n", + "\n", + "- 对于列表中的每个任务,检查它是否在中列出`_todo`。如果没有,那么我们可以跳过运行它,因为在上游重新调用的所有任务都没有产生需要任务重新计算的新返回值。\n", + "\n", + "- 但是,对于在`_todo`我们到达时确实列出的任何任务,我们需要要求它重新运行并重新计算其返回值。如果任务包装函数检测到此返回值与旧的缓存值不匹配,则`_todo`在我们将其下游任务返回到递归结果列表之前,它将自动添加到其下游任务。\n", + "\n", + "当我们到达列表的末尾时,实际上可能需要重新运行的每个任务实际上应该已经重新运行。但以防万一,`_todo `如果尚未清空,我们将检查并重试。即使对于变化非常快的依赖树,这也应该很快解决。\n", + "\n", + "只有一个循环(例如,任务A需要任务B的输出而任务 本身又需要任务A的输出)才可以使构建器处于无限循环中,并且前提是其返回值永远不会稳定。幸运的是,实际的构建任务通常没有周期。\n", + "\n", + "让我们通过一个示例来跟踪该系统的行为。\n", + "```\n", + "tutorial = \"\"\"\n", + " Beginners Tutorial\n", + " ------------------\n", + " Welcome to the tutorial!\n", + " We hope you enjoy it.\n", + " \"\"\"\n", + "``` \n", + "假设您编辑`tutorial.txt`,更改标题和正文内容。我们可以通过修改`filesystem dict`中的值来模拟这一点:" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [], + "source": [ + "filesystem['tutorial.txt'] = \"\"\"\n", + " The Coder Tutorial\n", + " ------------------\n", + " This is a new and improved\n", + " introductory paragraph.\n", + " \"\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "现在内容已更改,我们可以通过使用`cache_off()`上下文管理器要求项目重新运行该`read()`任务,该上下文管理器暂时禁用其对于给定任务和参数返回其旧缓存结果的意愿:" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [], + "source": [ + "with project.cache_off():\n", + "... text = read('tutorial.txt')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "现在,新的教程文本已读入缓存。有多少下游任务需要重新执行?\n", + "\n", + "为了帮助我们回答这个问题,Project该类支持一个简单的跟踪工具,该工具将告诉我们在重建过程中执行了哪些任务。由于上述更改`tutorial.txt` 影响到它的主体和标题,因此下游的所有内容都需要重新计算:" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "calling parse('tutorial.txt')\n", + "calling render('tutorial.txt')\n", + "calling title_of('tutorial.txt')\n", + "calling render('api.txt')\n", + "calling render('index.txt')\n" + ] + } + ], + "source": [ + "project.start_tracing()\n", + "project.rebuild()\n", + "print(project.stop_tracing())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![](figure4.png)\n", + "回顾上图,您会发现,正如预期的那样,这是`read('tutorial.txt')`的直接或下游结果中包含的所有任务。\n", + "\n", + "但是,如果我们再次编辑它,但是这次标题保持不变怎么办?" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [], + "source": [ + "filesystem['tutorial.txt'] = \"\"\"\n", + "... The Coder Tutorial\n", + "... ------------------\n", + "... Welcome to the coder tutorial!\n", + "... It should be read top to bottom.\n", + "... \"\"\"\n", + "\n", + "with project.cache_off():\n", + "... text = read('tutorial.txt')" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "calling parse('tutorial.txt')\n", + "calling render('tutorial.txt')\n", + "calling title_of('tutorial.txt')\n" + ] + } + ], + "source": [ + "project.start_tracing()\n", + "project.rebuild()\n", + "print(project.stop_tracing())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "成功!\n", + "\n", + "仅重建了一个文档。在`title_of()`给定新输入文档的情况下,尽管返回了相同的值,但这意味着所有其他下游任务均不受更改的影响,因此不会被重新调用。\n", + "\n", + "## 结论\n", + "\n", + "在Python中对Contingent进行编程时,我们跳过了诸如`TaskArgumentand`、`CachedResult`和`ConsequenceList`的十几种可能的类的创建。相反,我们借鉴了Python解决通用数据结构通用问题的悠久传统,导致代码重复使用了核心数据结构元组,列表,集合和字典中的一小部分想法。\n", + "\n", + "归功于严格的封装原则(仅允许`Graph`代码触摸图形的集合,并允许代码触摸Project项目的集合),如果`set`操作在项目的后续阶段返回错误,将永远不会产生歧义。错误发生时最内部执行方法的名称必然会将我们定向到错误所涉及的类和集合。`set`只要我们将常规下划线放在数据结构属性的前面,然后注意不要从类外部的代码中碰到它们,就不必为数据类型的每种可能的应用创建的子类。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "cgsource", + "language": "python", + "name": "cgsource" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.1" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git "a/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/figure1.dot" "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/figure1.dot" new file mode 100644 index 0000000..1965abb --- /dev/null +++ "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/figure1.dot" @@ -0,0 +1,22 @@ +digraph { +graph [rankdir=LR]; +node [fontname=Arial shape=rect penwidth=2 color="#DAB21D" + style=filled fillcolor="#F4E5AD"] +{rank=same +"api.rst" +"index.rst" +"tutorial.rst" +} +node [shape=rect penwidth=2 color="#708BA6" + style=filled fillcolor="#DCE9ED"] +{rank=same +"api.html" +"index.html" +"tutorial.html" +} +node [shape=oval penwidth=0 style=filled fillcolor="#E8EED2" + margin="0.05,0"] +"api.rst" -> "api.html" +"index.rst" -> "index.html" +"tutorial.rst" -> "tutorial.html" +} \ No newline at end of file diff --git "a/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/figure1.png" "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/figure1.png" new file mode 100644 index 0000000..f3e5e84 Binary files /dev/null and "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/figure1.png" differ diff --git "a/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/figure2.png" "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/figure2.png" new file mode 100644 index 0000000..b1bedf5 Binary files /dev/null and "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/figure2.png" differ diff --git "a/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/figure3.png" "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/figure3.png" new file mode 100644 index 0000000..5982e8f Binary files /dev/null and "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/figure3.png" differ diff --git "a/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/figure4.png" "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/figure4.png" new file mode 100644 index 0000000..dc5f415 Binary files /dev/null and "b/\345\212\250\346\200\201\346\236\204\345\273\272\347\263\273\347\273\237Contingent A Fully Dynamic Build System/figure4.png" differ