-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathsearch.xml
More file actions
305 lines (286 loc) · 244 KB
/
search.xml
File metadata and controls
305 lines (286 loc) · 244 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title><![CDATA[Hadoop NameNode启动之心跳机制]]></title>
<url>http://yoursite.com/2018/02/02/Hadoop-NameNode%E5%90%AF%E5%8A%A8%E4%B9%8B%E5%BF%83%E8%B7%B3%E6%9C%BA%E5%88%B6/</url>
<content type="html"><![CDATA[<h3 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h3><p>HDFS的心跳机制是如何设计的?如果我们自己来做又是如果判断一个服务是否有效?<br>假设不是一个分布式程序,而是一个多线程程序。有10个线程,一个master线程,9个slave线程。<br>你需要知道这9个slave线程是否还是活跃有效的。<br>那么,在本文最后会给出一些不错的文章链接。</p>
<a id="more"></a>
<h3 id="源码分析"><a href="#源码分析" class="headerlink" title="源码分析"></a>源码分析</h3><p>那么到底是谁给谁发送心跳呢?一般来说是DataNode向NameNode发送心跳,然后判断DataNode是否已经失效了。那么DataNode是如何发送心跳的呢?</p>
<p>DataNode.java<br><figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div></pre></td><td class="code"><pre><div class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">run</span><span class="params">()</span> </span>{</div><div class="line"> LOG.info(<span class="string">"Starting DataNode in: "</span>+data.data);</div><div class="line"> <span class="keyword">while</span> (shouldRun) { <span class="comment">// 当datanode停止服务的时候, 改变了会由true变为false</span></div><div class="line"> <span class="keyword">try</span> {</div><div class="line"> offerService(); <span class="comment">// 查看这里</span></div><div class="line"> } <span class="keyword">catch</span> (Exception ex) {</div><div class="line"> LOG.info(<span class="string">"Exception: "</span> + ex);</div><div class="line"> <span class="keyword">if</span> (shouldRun) {</div><div class="line"> LOG.info(<span class="string">"Lost connection to namenode. Retrying..."</span>);</div><div class="line"> <span class="keyword">try</span> {</div><div class="line"> Thread.sleep(<span class="number">5000</span>); <span class="comment">// 5秒钟发送一次心跳</span></div><div class="line"> } <span class="keyword">catch</span> (InterruptedException ie) {</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line"> LOG.info(<span class="string">"Finishing DataNode in: "</span>+data.data);</div><div class="line">}</div><div class="line"></div><div class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">offerService</span><span class="params">()</span> <span class="keyword">throws</span> Exception </span>{</div><div class="line"> <span class="keyword">long</span> wakeups = <span class="number">0</span>;</div><div class="line"> <span class="keyword">long</span> lastHeartbeat = <span class="number">0</span>, lastBlockReport = <span class="number">0</span>;</div><div class="line"> <span class="keyword">long</span> sendStart = System.currentTimeMillis();</div><div class="line"> <span class="keyword">int</span> heartbeatsSent = <span class="number">0</span>;</div><div class="line"> LOG.info(<span class="string">"using BLOCKREPORT_INTERVAL of "</span> + blockReportInterval + <span class="string">"msec"</span>);</div><div class="line"></div><div class="line"> <span class="keyword">while</span> (shouldRun) {</div><div class="line"> <span class="keyword">long</span> now = System.currentTimeMillis();</div><div class="line"> <span class="keyword">synchronized</span> (receivedBlockList) {</div><div class="line"> <span class="comment">// HEARTBEAT_INTERVAL = 3000 (3s)</span></div><div class="line"> <span class="comment">// 当前时间减去上一次的时间大于3秒后发送一次心跳</span></div><div class="line"> <span class="keyword">if</span> (now - lastHeartbeat > HEARTBEAT_INTERVAL) {</div><div class="line"> <span class="comment">// 向NameNode发送心跳, 把当前的容量和剩余的容量发送过去</span></div><div class="line"> namenode.sendHeartbeat(localName, data.getCapacity(), data.getRemaining());</div><div class="line"> <span class="comment">// 最后一次发送心跳时间记录</span></div><div class="line"> lastHeartbeat = now;</div><div class="line"> }</div><div class="line"> }</div><div class="line"> ...</div><div class="line"> } </div><div class="line">}</div></pre></td></tr></table></figure></p>
<p>好, 接下来看一下NameNode是如何处理的?</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div><div class="line">51</div><div class="line">52</div><div class="line">53</div><div class="line">54</div><div class="line">55</div><div class="line">56</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">sendHeartbeat</span><span class="params">(String sender, <span class="keyword">long</span> capacity, <span class="keyword">long</span> remaining)</span> </span>{</div><div class="line"> namesystem.gotHeartbeat(<span class="keyword">new</span> UTF8(sender), capacity, remaining);</div><div class="line">}</div><div class="line"></div><div class="line"><span class="comment">// 这里加了一个对象锁的时候, 其实里面不需要加锁heartbeats,datanodeMap之类的不需要加锁</span></div><div class="line"><span class="comment">// 上面的想法是最初的, 但是忽然想到, 对象是同一个,但是你无法保证共享变量不会被多个线程</span></div><div class="line"><span class="comment">// 进行修改, 而且被修改的对象被其它对象获取到引用的时候又会出现对于的线程安全问题 [参考样例1]</span></div><div class="line"></div><div class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">synchronized</span> <span class="keyword">void</span> <span class="title">gotHeartbeat</span><span class="params">(UTF8 name, <span class="keyword">long</span> capacity, <span class="keyword">long</span> remaining)</span> </span>{</div><div class="line"> <span class="keyword">synchronized</span> (heartbeats) { <span class="comment">// treeset,放入需要进行心跳的机器节点名称</span></div><div class="line"> <span class="keyword">synchronized</span> (datanodeMap) {</div><div class="line"> LOG.info(<span class="string">"currThread Run => "</span> + Thread.currentThread().getName());</div><div class="line"> LOG.info(<span class="string">"cur sync ? "</span> + <span class="keyword">this</span>);</div><div class="line"></div><div class="line"> <span class="keyword">long</span> capacityDiff = <span class="number">0</span>;</div><div class="line"> <span class="keyword">long</span> remainingDiff = <span class="number">0</span>;</div><div class="line"> <span class="comment">// 获取对应节点的信息</span></div><div class="line"> DatanodeInfo nodeinfo = (DatanodeInfo) datanodeMap.get(name);</div><div class="line"></div><div class="line"> <span class="comment">// 如果是一个新节点</span></div><div class="line"> <span class="keyword">if</span> (nodeinfo == <span class="keyword">null</span>) {</div><div class="line"> LOG.info(<span class="string">"Got brand-new heartbeat from "</span> + name);</div><div class="line"> nodeinfo = <span class="keyword">new</span> DatanodeInfo(name, capacity, remaining);</div><div class="line"> <span class="comment">// 加入到datanode map中</span></div><div class="line"> datanodeMap.put(name, nodeinfo);</div><div class="line"> capacityDiff = capacity;</div><div class="line"> remainingDiff = remaining;</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> <span class="comment">// 这里是一个有趣的地方</span></div><div class="line"> <span class="comment">// 该节点发送了最新的容量和剩余的容量过来, 和历史节点信息进行比较</span></div><div class="line"> <span class="comment">// 假设我们的服务器扩展了新的磁盘比原来的更大</span></div><div class="line"> <span class="comment">// 那么就赋值到到对应的变量上</span></div><div class="line"> capacityDiff = capacity - nodeinfo.getCapacity();</div><div class="line"> remainingDiff = remaining - nodeinfo.getRemaining();</div><div class="line"> <span class="comment">// 不管是否改变, 这里的节点信息都要删除。</span></div><div class="line"> heartbeats.remove(nodeinfo);</div><div class="line"> <span class="comment">// 这里会进行更新节点时间以及容量</span></div><div class="line"> nodeinfo.updateHeartbeat(capacity, remaining);</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 将更新后的节点添加到心跳中</span></div><div class="line"> heartbeats.add(nodeinfo);</div><div class="line"> <span class="comment">// 并更新集群总容量大小</span></div><div class="line"> totalCapacity += capacityDiff;</div><div class="line"> totalRemaining += remainingDiff;</div><div class="line"> }</div><div class="line"> }</div><div class="line">}</div><div class="line"></div><div class="line"></div><div class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">updateHeartbeat</span><span class="params">(<span class="keyword">long</span> capacity, <span class="keyword">long</span> remaining)</span> </span>{</div><div class="line"> <span class="keyword">this</span>.capacityBytes = capacity; </div><div class="line"> <span class="keyword">this</span>.remainingBytes = remaining;</div><div class="line"> <span class="keyword">this</span>.lastUpdate = System.currentTimeMillis(); <span class="comment">// 当前时间</span></div><div class="line">}</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div><div class="line">51</div><div class="line">52</div><div class="line">53</div><div class="line">54</div><div class="line">55</div><div class="line">56</div><div class="line">57</div><div class="line">58</div><div class="line">59</div><div class="line">60</div><div class="line">61</div><div class="line">62</div><div class="line">63</div><div class="line">64</div><div class="line">65</div><div class="line">66</div><div class="line">67</div><div class="line">68</div><div class="line">69</div><div class="line">70</div><div class="line">71</div><div class="line">72</div><div class="line">73</div><div class="line">74</div><div class="line">75</div><div class="line">76</div><div class="line">77</div><div class="line">78</div><div class="line">79</div><div class="line">80</div><div class="line">81</div></pre></td><td class="code"><pre><div class="line"><span class="comment">// 心跳监控线程</span></div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">HeartbeatMonitor</span> <span class="keyword">implements</span> <span class="title">Runnable</span> </span>{</div><div class="line"> <span class="comment">/**</span></div><div class="line"> */</div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">run</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">while</span> (fsRunning) {</div><div class="line"> heartbeatCheck();</div><div class="line"> <span class="keyword">try</span> {</div><div class="line"> Thread.sleep(heartBeatRecheck); <span class="comment">//1s发送一次心跳</span></div><div class="line"> } <span class="keyword">catch</span> (InterruptedException ie) {</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line">}</div><div class="line"></div><div class="line"><span class="comment">/** 心跳检查方法 */</span></div><div class="line"><span class="function"><span class="keyword">synchronized</span> <span class="keyword">void</span> <span class="title">heartbeatCheck</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">synchronized</span> (heartbeats) {</div><div class="line"> DatanodeInfo nodeInfo = <span class="keyword">null</span>;</div><div class="line"> <span class="comment">// 如果节点最后更新的时候小于当前时间减去10分钟, 那么表示当前节点已经失去联系了</span></div><div class="line"> <span class="comment">// 10分钟我们可以认为是网络的延迟</span></div><div class="line"> <span class="keyword">while</span> ((heartbeats.size() > <span class="number">0</span>) &&</div><div class="line"> ((nodeInfo = (DatanodeInfo) heartbeats.first()) != <span class="keyword">null</span>) &&</div><div class="line"> (nodeInfo.lastUpdate() < System.currentTimeMillis() - EXPIRE_INTERVAL)) {</div><div class="line"> LOG.info(<span class="string">"Lost heartbeat for "</span> + nodeInfo.getName());</div><div class="line"></div><div class="line"> <span class="comment">// 删除对应的DataNode节点信息(不在进行心跳联系)</span></div><div class="line"> heartbeats.remove(nodeInfo);</div><div class="line"> <span class="comment">// 删除对应的DataNode</span></div><div class="line"> <span class="keyword">synchronized</span> (datanodeMap) {</div><div class="line"> datanodeMap.remove(nodeInfo.getName());</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 将该DataNode的容量从总容量中去除</span></div><div class="line"> totalCapacity -= nodeInfo.getCapacity();</div><div class="line"> totalRemaining -= nodeInfo.getRemaining();</div><div class="line"></div><div class="line"> <span class="comment">// 获取到该节点的Block信息</span></div><div class="line"> Block deadblocks[] = nodeInfo.getBlocks();</div><div class="line"> <span class="keyword">if</span> (deadblocks != <span class="keyword">null</span>) {</div><div class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < deadblocks.length; i++) {</div><div class="line"> <span class="comment">// 把需要删除的block信息发送出去</span></div><div class="line"> removeStoredBlock(deadblocks[i], nodeInfo);</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">if</span> (heartbeats.size() > <span class="number">0</span>) {</div><div class="line"> nodeInfo = (DatanodeInfo) heartbeats.first();</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">synchronized</span> <span class="keyword">void</span> <span class="title">removeStoredBlock</span><span class="params">(Block block, DatanodeInfo node)</span> </span>{</div><div class="line"> <span class="comment">// 从blockMap中获取对应的节点名称信息</span></div><div class="line"> TreeSet containingNodes = (TreeSet) blocksMap.get(block);</div><div class="line"> <span class="comment">// 如果不存在的话则抛出异常</span></div><div class="line"> <span class="keyword">if</span> (containingNodes == <span class="keyword">null</span> || ! containingNodes.contains(node)) {</div><div class="line"> <span class="keyword">throw</span> <span class="keyword">new</span> IllegalArgumentException(<span class="string">"No machine mapping found for block "</span> + block + <span class="string">", which should be at node "</span> + node);</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// 删除节点信息</span></div><div class="line"> containingNodes.remove(node);</div><div class="line"></div><div class="line"> <span class="comment">// 检查block块是否有效, 如果当前block的副本小于配置的信息, 则需要进行复制</span></div><div class="line"> <span class="keyword">if</span> (dir.isValidBlock(block) && (containingNodes.size() < <span class="keyword">this</span>.desiredReplication)) {</div><div class="line"> <span class="keyword">synchronized</span> (neededReplications) {</div><div class="line"> neededReplications.add(block);</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// ...?</span></div><div class="line"> TreeSet excessBlocks = (TreeSet) excessReplicateMap.get(node.getName());</div><div class="line"> <span class="keyword">if</span> (excessBlocks != <span class="keyword">null</span>) {</div><div class="line"> excessBlocks.remove(block);</div><div class="line"> <span class="keyword">if</span> (excessBlocks.size() == <span class="number">0</span>) {</div><div class="line"> excessReplicateMap.remove(node.getName());</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div></pre></td></tr></table></figure>
<h3 id="样例"><a href="#样例" class="headerlink" title="样例"></a>样例</h3><p>例1<br><figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">public</span> <span class="keyword">static</span> <span class="class"><span class="keyword">class</span> <span class="title">Demo</span> </span>{</div><div class="line"> <span class="keyword">private</span> <span class="keyword">byte</span>[] b = <span class="keyword">new</span> <span class="keyword">byte</span>[<span class="number">1</span>];</div><div class="line"> <span class="comment">// 其实res不应该是public,这样的话该变量会被其它对象获取到引用,从而修改元素,但是演示方便我置位public</span></div><div class="line"> <span class="keyword">public</span> List<String> res = <span class="keyword">new</span> ArrayList<String>();</div><div class="line"> <span class="comment">// 如果仅仅只是对象锁的话,在修改res对象的时候,别的线程也在修改, 那么就出现问题了</span></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">synchronized</span> <span class="keyword">void</span> <span class="title">write1</span><span class="params">()</span> <span class="keyword">throws</span> InterruptedException </span>{</div><div class="line"> <span class="keyword">synchronized</span> (b) {</div><div class="line"> System.out.println(<span class="string">"w1 res = "</span> + res);</div><div class="line"> res.add(<span class="string">"1"</span>);</div><div class="line"> }</div><div class="line"> }</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">write2</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">synchronized</span> (b) {</div><div class="line"> System.out.println(<span class="string">"w2 res = "</span> + res);</div><div class="line"> res.add(<span class="string">"2"</span>);</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title">main</span><span class="params">(String[] args)</span> <span class="keyword">throws</span> InterruptedException </span>{</div><div class="line"> <span class="keyword">final</span> Demo demo = <span class="keyword">new</span> Demo();</div><div class="line"> Thread t1 = <span class="keyword">new</span> Thread(<span class="keyword">new</span> Runnable() {</div><div class="line"> </div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">run</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">try</span> {</div><div class="line"> demo.write1();</div><div class="line"> } <span class="keyword">catch</span> (InterruptedException e) {</div><div class="line"> <span class="comment">// TODO Auto-generated catch block</span></div><div class="line"> e.printStackTrace();</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }, <span class="string">"t1"</span>);</div><div class="line"> </div><div class="line"> </div><div class="line"> </div><div class="line"> Thread t2 = <span class="keyword">new</span> Thread(<span class="keyword">new</span> Runnable() {</div><div class="line"> </div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">run</span><span class="params">()</span> </span>{</div><div class="line"> demo.write2();</div><div class="line"> }</div><div class="line"> }, <span class="string">"t2"</span>);</div><div class="line"> </div><div class="line"> t1.start();</div><div class="line"> t2.start();</div><div class="line"> }</div></pre></td></tr></table></figure></p>
<p>样例2:抽取出心跳监控代码<br><figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div><div class="line">51</div><div class="line">52</div><div class="line">53</div><div class="line">54</div><div class="line">55</div><div class="line">56</div><div class="line">57</div><div class="line">58</div><div class="line">59</div><div class="line">60</div><div class="line">61</div><div class="line">62</div><div class="line">63</div><div class="line">64</div><div class="line">65</div><div class="line">66</div><div class="line">67</div><div class="line">68</div><div class="line">69</div><div class="line">70</div><div class="line">71</div><div class="line">72</div><div class="line">73</div><div class="line">74</div><div class="line">75</div><div class="line">76</div><div class="line">77</div><div class="line">78</div><div class="line">79</div><div class="line">80</div><div class="line">81</div><div class="line">82</div><div class="line">83</div><div class="line">84</div><div class="line">85</div><div class="line">86</div><div class="line">87</div><div class="line">88</div><div class="line">89</div><div class="line">90</div><div class="line">91</div><div class="line">92</div><div class="line">93</div><div class="line">94</div><div class="line">95</div><div class="line">96</div><div class="line">97</div><div class="line">98</div><div class="line">99</div><div class="line">100</div><div class="line">101</div><div class="line">102</div><div class="line">103</div><div class="line">104</div><div class="line">105</div><div class="line">106</div><div class="line">107</div><div class="line">108</div><div class="line">109</div><div class="line">110</div><div class="line">111</div><div class="line">112</div><div class="line">113</div><div class="line">114</div><div class="line">115</div><div class="line">116</div><div class="line">117</div><div class="line">118</div><div class="line">119</div><div class="line">120</div><div class="line">121</div><div class="line">122</div><div class="line">123</div><div class="line">124</div><div class="line">125</div><div class="line">126</div><div class="line">127</div><div class="line">128</div><div class="line">129</div><div class="line">130</div><div class="line">131</div><div class="line">132</div><div class="line">133</div><div class="line">134</div><div class="line">135</div><div class="line">136</div><div class="line">137</div><div class="line">138</div><div class="line">139</div><div class="line">140</div><div class="line">141</div><div class="line">142</div><div class="line">143</div><div class="line">144</div><div class="line">145</div><div class="line">146</div><div class="line">147</div><div class="line">148</div><div class="line">149</div><div class="line">150</div><div class="line">151</div><div class="line">152</div><div class="line">153</div><div class="line">154</div><div class="line">155</div><div class="line">156</div><div class="line">157</div><div class="line">158</div><div class="line">159</div><div class="line">160</div><div class="line">161</div><div class="line">162</div><div class="line">163</div><div class="line">164</div><div class="line">165</div><div class="line">166</div><div class="line">167</div><div class="line">168</div><div class="line">169</div><div class="line">170</div><div class="line">171</div><div class="line">172</div><div class="line">173</div><div class="line">174</div><div class="line">175</div><div class="line">176</div><div class="line">177</div><div class="line">178</div><div class="line">179</div><div class="line">180</div><div class="line">181</div><div class="line">182</div><div class="line">183</div><div class="line">184</div><div class="line">185</div><div class="line">186</div><div class="line">187</div><div class="line">188</div><div class="line">189</div><div class="line">190</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">HeartbeatMonitorTest</span> </span>{</div><div class="line"></div><div class="line"> <span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">final</span> Logger LOG = LogFormatter.getLogger(<span class="string">"cn.base.test.HeartbeatMonitorTest"</span>);</div><div class="line"></div><div class="line"> <span class="keyword">private</span> <span class="keyword">int</span> heartBeatRecheck = <span class="number">3000</span>;</div><div class="line"> <span class="keyword">private</span> <span class="keyword">boolean</span> fsRunning = <span class="keyword">true</span>;</div><div class="line"></div><div class="line"> <span class="keyword">private</span> TreeMap datanodeMap = <span class="keyword">new</span> TreeMap();</div><div class="line"></div><div class="line"> <span class="keyword">private</span> TreeMap blocksMap = <span class="keyword">new</span> TreeMap();</div><div class="line"></div><div class="line"> <span class="keyword">private</span> TreeMap excessReplicateMap = <span class="keyword">new</span> TreeMap();</div><div class="line"></div><div class="line"> <span class="keyword">private</span> FSDirectory dir;</div><div class="line"> <span class="keyword">private</span> <span class="keyword">int</span> desiredReplication;</div><div class="line"></div><div class="line"> <span class="keyword">private</span> TreeSet neededReplications = <span class="keyword">new</span> TreeSet();</div><div class="line"></div><div class="line"> <span class="keyword">private</span> <span class="keyword">long</span> totalCapacity = <span class="number">0</span>, totalRemaining = <span class="number">0</span>;</div><div class="line"></div><div class="line"> <span class="keyword">private</span> <span class="keyword">static</span> <span class="keyword">long</span> EXPIRE_INTERVAL = <span class="number">1</span> * <span class="number">60</span> * <span class="number">1000</span>;</div><div class="line"></div><div class="line"> <span class="keyword">private</span> Daemon hbthread = <span class="keyword">null</span>;</div><div class="line"> </div><div class="line"> <span class="keyword">private</span> <span class="keyword">long</span> capacityDiff = <span class="number">0</span>;</div><div class="line"> <span class="keyword">private</span> <span class="keyword">long</span> remainingDiff = <span class="number">0</span>;</div><div class="line"></div><div class="line"> TreeSet heartbeats = <span class="keyword">new</span> TreeSet(<span class="keyword">new</span> Comparator() {</div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">compare</span><span class="params">(Object o1, Object o2)</span> </span>{</div><div class="line"> DatanodeInfo d1 = (DatanodeInfo) o1;</div><div class="line"> DatanodeInfo d2 = (DatanodeInfo) o2;</div><div class="line"> <span class="keyword">long</span> lu1 = d1.lastUpdate();</div><div class="line"> <span class="keyword">long</span> lu2 = d2.lastUpdate();</div><div class="line"> <span class="keyword">if</span> (lu1 < lu2) {</div><div class="line"> <span class="keyword">return</span> -<span class="number">1</span>;</div><div class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (lu1 > lu2) {</div><div class="line"> <span class="keyword">return</span> <span class="number">1</span>;</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> <span class="keyword">return</span> d1.getName().compareTo(d2.getName());</div><div class="line"> }</div><div class="line"> }</div><div class="line"> });</div><div class="line"></div><div class="line"> <span class="class"><span class="keyword">class</span> <span class="title">DatanodeInfo</span> </span>{</div><div class="line"></div><div class="line"> <span class="keyword">private</span> <span class="keyword">long</span> capacityBytes, remainingBytes, lastUpdate;</div><div class="line"> <span class="keyword">private</span> <span class="keyword">volatile</span> TreeSet blocks;</div><div class="line"> <span class="keyword">private</span> UTF8 name;</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="title">DatanodeInfo</span><span class="params">(UTF8 name, <span class="keyword">long</span> capacity, <span class="keyword">long</span> remaining)</span> </span>{</div><div class="line"> <span class="keyword">this</span>.name = name;</div><div class="line"> <span class="keyword">this</span>.blocks = <span class="keyword">new</span> TreeSet();</div><div class="line"> updateHeartbeat(capacity, remaining);</div><div class="line"> }</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">updateHeartbeat</span><span class="params">(<span class="keyword">long</span> capacity, <span class="keyword">long</span> remaining)</span> </span>{</div><div class="line"> <span class="keyword">this</span>.capacityBytes = capacity;</div><div class="line"> <span class="keyword">this</span>.remainingBytes = remaining;</div><div class="line"> <span class="keyword">this</span>.lastUpdate = System.currentTimeMillis();</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">public</span> Block[] getBlocks() {</div><div class="line"> <span class="keyword">return</span> (Block[]) blocks.toArray(<span class="keyword">new</span> Block[blocks.size()]);</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> Iterator <span class="title">getBlockIterator</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">return</span> blocks.iterator();</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">long</span> <span class="title">getCapacity</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">return</span> capacityBytes;</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">long</span> <span class="title">getRemaining</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">return</span> remainingBytes;</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">long</span> <span class="title">lastUpdate</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">return</span> lastUpdate;</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> UTF8 <span class="title">getName</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">return</span> name;</div><div class="line"> }</div><div class="line"></div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="class"><span class="keyword">class</span> <span class="title">HeartbeatMonitorTask</span> <span class="keyword">implements</span> <span class="title">Runnable</span> </span>{</div><div class="line"></div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">run</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">while</span> (<span class="keyword">true</span>) {</div><div class="line"> heartbeatCheck();</div><div class="line"> <span class="keyword">try</span> {</div><div class="line"> Thread.sleep(heartBeatRecheck);</div><div class="line"> } <span class="keyword">catch</span> (Exception e) {</div><div class="line"> e.printStackTrace();</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">synchronized</span> <span class="keyword">void</span> <span class="title">heartbeatCheck</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">synchronized</span> (heartbeats) {</div><div class="line"> DatanodeInfo nodeInfo = <span class="keyword">null</span>;</div><div class="line"> </div><div class="line"> <span class="keyword">while</span> ((heartbeats.size() > <span class="number">0</span>) && ((nodeInfo = (DatanodeInfo) heartbeats.first()) != <span class="keyword">null</span>)</div><div class="line"> && (nodeInfo.lastUpdate() < System.currentTimeMillis() - EXPIRE_INTERVAL)) {</div><div class="line"> LOG.info(<span class="string">"Lost heartbeat for "</span> + nodeInfo.getName());</div><div class="line"></div><div class="line"> heartbeats.remove(nodeInfo);</div><div class="line"> <span class="keyword">synchronized</span> (datanodeMap) {</div><div class="line"> datanodeMap.remove(nodeInfo.getName());</div><div class="line"> }</div><div class="line"> totalCapacity -= nodeInfo.getCapacity();</div><div class="line"> totalRemaining -= nodeInfo.getRemaining();</div><div class="line"></div><div class="line"> Block deadblocks[] = nodeInfo.getBlocks();</div><div class="line"> <span class="keyword">if</span> (deadblocks != <span class="keyword">null</span>) {</div><div class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < deadblocks.length; i++) {</div><div class="line"> removeStoredBlock(deadblocks[i], nodeInfo);</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">if</span> (heartbeats.size() > <span class="number">0</span>) {</div><div class="line"> nodeInfo = (DatanodeInfo) heartbeats.first();</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">synchronized</span> <span class="keyword">void</span> <span class="title">removeStoredBlock</span><span class="params">(Block block, DatanodeInfo node)</span> </span>{</div><div class="line"> TreeSet containingNodes = (TreeSet) blocksMap.get(block);</div><div class="line"> <span class="keyword">if</span> (containingNodes == <span class="keyword">null</span> || !containingNodes.contains(node)) {</div><div class="line"> <span class="keyword">throw</span> <span class="keyword">new</span> IllegalArgumentException(</div><div class="line"> <span class="string">"No machine mapping found for block "</span> + block + <span class="string">", which should be at node "</span> + node);</div><div class="line"> }</div><div class="line"> containingNodes.remove(node);</div><div class="line"></div><div class="line"> <span class="comment">//</span></div><div class="line"> <span class="comment">// It's possible that the block was removed because of a datanode</span></div><div class="line"> <span class="comment">// failure. If the block is still valid, check if replication is</span></div><div class="line"> <span class="comment">// necessary. In that case, put block on a possibly-will-</span></div><div class="line"> <span class="comment">// be-replicated list.</span></div><div class="line"> <span class="comment">//</span></div><div class="line"> <span class="keyword">if</span> (dir.isValidBlock(block) && (containingNodes.size() < <span class="keyword">this</span>.desiredReplication)) {</div><div class="line"> <span class="keyword">synchronized</span> (neededReplications) {</div><div class="line"> neededReplications.add(block);</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">//</span></div><div class="line"> <span class="comment">// We've removed a block from a node, so it's definitely no longer</span></div><div class="line"> <span class="comment">// in "excess" there.</span></div><div class="line"> <span class="comment">//</span></div><div class="line"> TreeSet excessBlocks = (TreeSet) excessReplicateMap.get(node.getName());</div><div class="line"> <span class="keyword">if</span> (excessBlocks != <span class="keyword">null</span>) {</div><div class="line"> excessBlocks.remove(block);</div><div class="line"> <span class="keyword">if</span> (excessBlocks.size() == <span class="number">0</span>) {</div><div class="line"> excessReplicateMap.remove(node.getName());</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="title">HeartbeatMonitorTest</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">this</span>.hbthread = <span class="keyword">new</span> Daemon(<span class="keyword">new</span> HeartbeatMonitorTask(), <span class="string">"A"</span>);</div><div class="line"> }</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">start</span><span class="params">()</span> <span class="keyword">throws</span> InterruptedException </span>{</div><div class="line"> String name = <span class="string">"JokerdeMacBook-Pro.local:50010"</span>;</div><div class="line"> <span class="keyword">long</span> capacity = <span class="number">250140434432L</span>;</div><div class="line"> <span class="keyword">long</span> remaining = <span class="number">98054681805L</span>;</div><div class="line"> UTF8 n = <span class="keyword">new</span> UTF8(name);</div><div class="line"> DatanodeInfo nodeinfo = <span class="keyword">new</span> DatanodeInfo(n, capacity, remaining);</div><div class="line"> <span class="keyword">this</span>.datanodeMap.put(n, nodeinfo);</div><div class="line"> <span class="keyword">this</span>.capacityDiff = capacity;</div><div class="line"> <span class="keyword">this</span>.remainingDiff = remaining;</div><div class="line"> <span class="keyword">this</span>.heartbeats.add(nodeinfo);</div><div class="line"> <span class="keyword">this</span>.hbthread.start();</div><div class="line"> <span class="keyword">this</span>.hbthread.join();</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title">main</span><span class="params">(String[] args)</span> <span class="keyword">throws</span> InterruptedException </span>{</div><div class="line"> HeartbeatMonitorTest task = <span class="keyword">new</span> HeartbeatMonitorTest();</div><div class="line"> task.start();</div><div class="line"> }</div><div class="line"></div><div class="line">}</div></pre></td></tr></table></figure></p>
<h3 id="结束语"><a href="#结束语" class="headerlink" title="结束语"></a>结束语</h3><p>到这里, 我们结束了整个流程,由DataNode向NameNode发送心跳。<br>NameNode判断当前节点是否超时,如果是则进行对应的操作,否则一直持续相关操作。</p>
<p>最后在分享一些心跳监控的设计资源链接:<br><a href="http://liaojieliang.com/heartbeat-protocal-design/" target="_blank" rel="external">http://liaojieliang.com/heartbeat-protocal-design/</a><br><a href="http://blog.csdn.net/baidu20008/article/details/45022461" target="_blank" rel="external">http://blog.csdn.net/baidu20008/article/details/45022461</a><br><a href="http://www.raychase.net/3758" target="_blank" rel="external">http://www.raychase.net/3758</a></p>
]]></content>
</entry>
<entry>
<title><![CDATA[Hadoop NameNode启动之FSDirectiry]]></title>
<url>http://yoursite.com/2017/12/19/Hadoop-NameNode%E5%90%AF%E5%8A%A8%E4%B9%8BFSDirectiry/</url>
<content type="html"><![CDATA[<h3 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h3><p>作为Hadoop NameNode类启动分析篇的起始篇章。<br>我们先来了解一下FSDirectiry做了哪些功能点。</p>
<p>一切都因下面这段代码开始(一切的是命运之门的选择, 滑稽)</p>
<a id="more"></a>
<h3 id="进入主题"><a href="#进入主题" class="headerlink" title="进入主题"></a>进入主题</h3><p>我们先来看看NameNode的构造方法吧。<br>可以发现会创建一个FSNamesystem对象。</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div></pre></td><td class="code"><pre><div class="line"><span class="comment">/**</span></div><div class="line"> * Create a NameNode at the default location</div><div class="line"> */</div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="title">NameNode</span><span class="params">(Configuration conf)</span> <span class="keyword">throws</span> IOException </span>{</div><div class="line"> <span class="keyword">this</span>(getDir(conf), <span class="comment">// 获取namenode存放路径文件夹</span></div><div class="line"> DataNode.createSocketAddr <span class="comment">// 获取datanode端口信息</span></div><div class="line"> (conf.get(<span class="string">"fs.default.name"</span>, <span class="string">"local"</span>)).getPort(), conf);</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">/**</span></div><div class="line"> * Create a NameNode at the specified location and start it.</div><div class="line"> */</div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="title">NameNode</span><span class="params">(File dir, <span class="keyword">int</span> port, Configuration conf)</span> <span class="keyword">throws</span> IOException </span>{</div><div class="line"> <span class="keyword">this</span>.namesystem = <span class="keyword">new</span> FSNamesystem(dir, conf);</div><div class="line"> <span class="keyword">this</span>.handlerCount = conf.getInt(<span class="string">"dfs.namenode.handler.count"</span>, <span class="number">10</span>); <span class="comment">// NameNode用来处理来自DataNode的RPC请求的线程数量 </span></div><div class="line"> <span class="keyword">this</span>.server = RPC.getServer(<span class="keyword">this</span>, port, handlerCount, <span class="keyword">false</span>, conf);</div><div class="line"> <span class="keyword">this</span>.server.start();</div><div class="line"> }</div></pre></td></tr></table></figure>
<p>之后我们进入FSNamesystem观察</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="function"><span class="keyword">public</span> <span class="title">FSNamesystem</span><span class="params">(File dir, Configuration conf)</span> <span class="keyword">throws</span> IOException </span>{</div><div class="line"> <span class="keyword">this</span>.dir = <span class="keyword">new</span> FSDirectory(dir); <span class="comment">// fsimg和edits的初始化</span></div><div class="line"> <span class="keyword">this</span>.hbthread = <span class="keyword">new</span> Daemon(<span class="keyword">new</span> HeartbeatMonitor()); <span class="comment">// 心跳监控</span></div><div class="line"> <span class="keyword">this</span>.lmthread = <span class="keyword">new</span> Daemon(<span class="keyword">new</span> LeaseMonitor()); <span class="comment">// 租约监控</span></div><div class="line"> hbthread.start(); <span class="comment">// 启动心跳线程</span></div><div class="line"> lmthread.start(); <span class="comment">// 启动租约监控线程</span></div><div class="line"> <span class="keyword">this</span>.systemStart = System.currentTimeMillis();</div><div class="line"> <span class="keyword">this</span>.conf = conf;</div><div class="line"> </div><div class="line"> <span class="keyword">this</span>.desiredReplication = conf.getInt(<span class="string">"dfs.replication"</span>, <span class="number">3</span>); <span class="comment">// 备份数为3</span></div><div class="line"> <span class="keyword">this</span>.maxReplication = desiredReplication;</div><div class="line"> <span class="keyword">this</span>.maxReplicationStreams = conf.getInt(<span class="string">"dfs.max-repl-streams"</span>, <span class="number">2</span>);</div><div class="line"> <span class="keyword">this</span>.minReplication = <span class="number">1</span>; <span class="comment">// 最小备份数量</span></div><div class="line"> <span class="keyword">this</span>.heartBeatRecheck= <span class="number">1000</span>; <span class="comment">// 1秒发送一次心跳</span></div><div class="line"> }</div></pre></td></tr></table></figure>
<p>现在我们只关注FSDirectory类做了哪些功能即可。<br>为了观察方便, 我把需要研究的代码抽离出来了,并且写成一个测试类方便进行测试</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"></div><div class="line"><span class="function"><span class="keyword">public</span> <span class="title">FSDirectory</span><span class="params">(File dir)</span> <span class="keyword">throws</span> IOException </span>{</div><div class="line"> <span class="comment">// configuration中配置name的dir路径</span></div><div class="line"> File fullimage = <span class="keyword">new</span> File(dir, <span class="string">"image"</span>);</div><div class="line"> <span class="comment">// 如果不存在则表示没有对NameNode进行过初始化</span></div><div class="line"> <span class="keyword">if</span> (! fullimage.exists()) {</div><div class="line"> <span class="keyword">throw</span> <span class="keyword">new</span> IOException(<span class="string">"NameNode not formatted: "</span> + dir);</div><div class="line"> }</div><div class="line"> </div><div class="line"> <span class="comment">// edits文件</span></div><div class="line"> File edits = <span class="keyword">new</span> File(dir, <span class="string">"edits"</span>);</div><div class="line"> <span class="comment">// 加载和保存</span></div><div class="line"> <span class="keyword">if</span> (loadFSImage(fullimage, edits)) {</div><div class="line"> saveFSImage(fullimage, edits);</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">synchronized</span> (<span class="keyword">this</span>) {</div><div class="line"> <span class="keyword">this</span>.ready = <span class="keyword">true</span>;</div><div class="line"> <span class="keyword">this</span>.notifyAll();</div><div class="line"> <span class="keyword">this</span>.editlog = <span class="keyword">new</span> DataOutputStream(<span class="keyword">new</span> FileOutputStream(edits));</div><div class="line"> }</div><div class="line"> }</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div><div class="line">51</div><div class="line">52</div><div class="line">53</div><div class="line">54</div><div class="line">55</div><div class="line">56</div><div class="line">57</div><div class="line">58</div><div class="line">59</div><div class="line">60</div><div class="line">61</div><div class="line">62</div><div class="line">63</div><div class="line">64</div><div class="line">65</div><div class="line">66</div><div class="line">67</div><div class="line">68</div><div class="line">69</div><div class="line">70</div><div class="line">71</div><div class="line">72</div><div class="line">73</div><div class="line">74</div><div class="line">75</div><div class="line">76</div><div class="line">77</div><div class="line">78</div><div class="line">79</div><div class="line">80</div><div class="line">81</div><div class="line">82</div><div class="line">83</div><div class="line">84</div><div class="line">85</div><div class="line">86</div><div class="line">87</div><div class="line">88</div><div class="line">89</div><div class="line">90</div><div class="line">91</div><div class="line">92</div><div class="line">93</div><div class="line">94</div><div class="line">95</div><div class="line">96</div><div class="line">97</div><div class="line">98</div><div class="line">99</div><div class="line">100</div><div class="line">101</div><div class="line">102</div><div class="line">103</div><div class="line">104</div><div class="line">105</div><div class="line">106</div><div class="line">107</div><div class="line">108</div><div class="line">109</div><div class="line">110</div><div class="line">111</div><div class="line">112</div><div class="line">113</div><div class="line">114</div><div class="line">115</div><div class="line">116</div><div class="line">117</div><div class="line">118</div><div class="line">119</div><div class="line">120</div><div class="line">121</div><div class="line">122</div><div class="line">123</div><div class="line">124</div><div class="line">125</div><div class="line">126</div><div class="line">127</div><div class="line">128</div><div class="line">129</div><div class="line">130</div><div class="line">131</div><div class="line">132</div><div class="line">133</div><div class="line">134</div><div class="line">135</div><div class="line">136</div><div class="line">137</div><div class="line">138</div><div class="line">139</div><div class="line">140</div><div class="line">141</div><div class="line">142</div><div class="line">143</div><div class="line">144</div><div class="line">145</div><div class="line">146</div><div class="line">147</div><div class="line">148</div><div class="line">149</div><div class="line">150</div><div class="line">151</div><div class="line">152</div><div class="line">153</div><div class="line">154</div><div class="line">155</div><div class="line">156</div><div class="line">157</div><div class="line">158</div><div class="line">159</div><div class="line">160</div><div class="line">161</div><div class="line">162</div><div class="line">163</div><div class="line">164</div><div class="line">165</div><div class="line">166</div><div class="line">167</div><div class="line">168</div><div class="line">169</div><div class="line">170</div><div class="line">171</div><div class="line">172</div><div class="line">173</div><div class="line">174</div><div class="line">175</div><div class="line">176</div><div class="line">177</div><div class="line">178</div><div class="line">179</div><div class="line">180</div><div class="line">181</div><div class="line">182</div><div class="line">183</div><div class="line">184</div><div class="line">185</div><div class="line">186</div><div class="line">187</div><div class="line">188</div><div class="line">189</div><div class="line">190</div><div class="line">191</div><div class="line">192</div><div class="line">193</div><div class="line">194</div><div class="line">195</div><div class="line">196</div><div class="line">197</div><div class="line">198</div><div class="line">199</div><div class="line">200</div><div class="line">201</div><div class="line">202</div><div class="line">203</div><div class="line">204</div><div class="line">205</div><div class="line">206</div><div class="line">207</div><div class="line">208</div><div class="line">209</div><div class="line">210</div><div class="line">211</div><div class="line">212</div><div class="line">213</div><div class="line">214</div><div class="line">215</div><div class="line">216</div><div class="line">217</div><div class="line">218</div><div class="line">219</div><div class="line">220</div><div class="line">221</div><div class="line">222</div><div class="line">223</div><div class="line">224</div><div class="line">225</div><div class="line">226</div><div class="line">227</div><div class="line">228</div><div class="line">229</div><div class="line">230</div><div class="line">231</div><div class="line">232</div><div class="line">233</div><div class="line">234</div><div class="line">235</div><div class="line">236</div><div class="line">237</div><div class="line">238</div><div class="line">239</div><div class="line">240</div><div class="line">241</div><div class="line">242</div><div class="line">243</div><div class="line">244</div><div class="line">245</div><div class="line">246</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">FSDirectorTest</span> </span>{</div><div class="line"></div><div class="line"> <span class="class"><span class="keyword">class</span> <span class="title">INode</span> </span>{</div><div class="line"> <span class="keyword">public</span> String name;</div><div class="line"> <span class="keyword">public</span> INode parent;</div><div class="line"> <span class="keyword">public</span> TreeMap children = <span class="keyword">new</span> TreeMap();</div><div class="line"> <span class="keyword">public</span> Block blocks[];</div><div class="line"></div><div class="line"> <span class="comment">/**</span></div><div class="line"> */</div><div class="line"> INode(String name, INode parent, Block blocks[]) {</div><div class="line"> <span class="keyword">this</span>.name = name;</div><div class="line"> <span class="keyword">this</span>.parent = parent;</div><div class="line"> <span class="keyword">this</span>.blocks = blocks;</div><div class="line"> }</div><div class="line"> </div><div class="line"> <span class="comment">/**</span></div><div class="line"> * 添加文件块, 其实就是把blk_id写入到children中(到时候在saveImage中使用)</div><div class="line"> * <span class="doctag">@param</span> target</div><div class="line"> * <span class="doctag">@param</span> blks</div><div class="line"> * <span class="doctag">@return</span></div><div class="line"> */</div><div class="line"> <span class="function">INode <span class="title">addNode</span><span class="params">(String target, Block blks[])</span> </span>{</div><div class="line"> <span class="keyword">if</span> (getNode(target) != <span class="keyword">null</span>) {</div><div class="line"> <span class="keyword">return</span> <span class="keyword">null</span>;</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> String parentName = DFSFile.getDFSParent(target);</div><div class="line"> <span class="keyword">if</span> (parentName == <span class="keyword">null</span>) {</div><div class="line"> <span class="keyword">return</span> <span class="keyword">null</span>;</div><div class="line"> }</div><div class="line"></div><div class="line"> INode parentNode = getNode(parentName);</div><div class="line"> <span class="keyword">if</span> (parentNode == <span class="keyword">null</span>) {</div><div class="line"> <span class="keyword">return</span> <span class="keyword">null</span>;</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> <span class="comment">// 读取的fsimag文件数据, 这里的targetName其实就是我们HDFS上的文件名</span></div><div class="line"> <span class="comment">// blks就是存储在DataNode上的blk文件名字</span></div><div class="line"> <span class="comment">// parentNode是我们的父节点(头结点)</span></div><div class="line"> String targetName = <span class="keyword">new</span> File(target).getName();</div><div class="line"> INode newItem = <span class="keyword">new</span> INode(targetName, parentNode, blks);</div><div class="line"> <span class="comment">// 之后会被saveImage进行使用</span></div><div class="line"> parentNode.children.put(targetName, newItem);</div><div class="line"> <span class="keyword">return</span> newItem;</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">int</span> <span class="title">numItemsInTree</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">int</span> total = <span class="number">0</span>;</div><div class="line"> <span class="keyword">for</span> (Iterator it = children.values().iterator(); it.hasNext();) {</div><div class="line"> INode child = (INode) it.next();</div><div class="line"> total += child.numItemsInTree();</div><div class="line"> }</div><div class="line"> <span class="keyword">return</span> total + <span class="number">1</span>;</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">/**</span></div><div class="line"> * This is the external interface</div><div class="line"> */</div><div class="line"> <span class="function">INode <span class="title">getNode</span><span class="params">(String target)</span> </span>{</div><div class="line"> <span class="keyword">if</span> (!target.startsWith(<span class="string">"/"</span>) || target.length() == <span class="number">0</span>) {</div><div class="line"> <span class="keyword">return</span> <span class="keyword">null</span>;</div><div class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (parent == <span class="keyword">null</span> && <span class="string">"/"</span>.equals(target)) {</div><div class="line"> <span class="keyword">return</span> <span class="keyword">this</span>;</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> Vector components = <span class="keyword">new</span> Vector();</div><div class="line"> <span class="keyword">int</span> start = <span class="number">0</span>;</div><div class="line"> <span class="keyword">int</span> slashid = <span class="number">0</span>;</div><div class="line"> <span class="keyword">while</span> (start < target.length() && (slashid = target.indexOf(<span class="string">'/'</span>, start)) >= <span class="number">0</span>) {</div><div class="line"> components.add(target.substring(start, slashid));</div><div class="line"> start = slashid + <span class="number">1</span>;</div><div class="line"> }</div><div class="line"> <span class="keyword">if</span> (start < target.length()) {</div><div class="line"> components.add(target.substring(start));</div><div class="line"> }</div><div class="line"> <span class="keyword">return</span> getNode(components, <span class="number">0</span>);</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">/**</span></div><div class="line"> */</div><div class="line"> <span class="function">INode <span class="title">getNode</span><span class="params">(Vector components, <span class="keyword">int</span> index)</span> </span>{</div><div class="line"> <span class="keyword">if</span> (!name.equals((String) components.elementAt(index))) {</div><div class="line"> <span class="keyword">return</span> <span class="keyword">null</span>;</div><div class="line"> }</div><div class="line"> <span class="keyword">if</span> (index == components.size() - <span class="number">1</span>) {</div><div class="line"> <span class="keyword">return</span> <span class="keyword">this</span>;</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// Check with children</span></div><div class="line"> INode child = (INode) children.get(components.elementAt(index + <span class="number">1</span>));</div><div class="line"> <span class="keyword">if</span> (child == <span class="keyword">null</span>) {</div><div class="line"> <span class="keyword">return</span> <span class="keyword">null</span>;</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> <span class="keyword">return</span> child.getNode(components, index + <span class="number">1</span>);</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">/**</span></div><div class="line"> * 通过递归调用把元数据信息写入到fsimage.new文件中</div><div class="line"> * 其实就是把loadFSImage方法加载到内存的数据读取出来然后写入进去</div><div class="line"> */</div><div class="line"> <span class="function"><span class="keyword">void</span> <span class="title">saveImage</span><span class="params">(String parentPrefix, DataOutputStream out)</span> <span class="keyword">throws</span> IOException </span>{</div><div class="line"> String fullName = <span class="string">""</span>;</div><div class="line"> <span class="keyword">if</span> (parent != <span class="keyword">null</span>) {</div><div class="line"> fullName = parentPrefix + <span class="string">"/"</span> + name;</div><div class="line"> <span class="keyword">new</span> UTF8(fullName).write(out);</div><div class="line"> <span class="keyword">if</span> (blocks == <span class="keyword">null</span>) {</div><div class="line"> out.writeInt(<span class="number">0</span>);</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> out.writeInt(blocks.length);</div><div class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < blocks.length; i++) {</div><div class="line"> blocks[i].write(out);</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line"> <span class="keyword">for</span> (Iterator it = children.values().iterator(); it.hasNext();) {</div><div class="line"> INode child = (INode) it.next();</div><div class="line"> child.saveImage(fullName, out);</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">static</span> String FS_IMAGE = <span class="string">"fsimage"</span>;</div><div class="line"> <span class="keyword">static</span> String NEW_FS_IMAGE = <span class="string">"fsimage.new"</span>;</div><div class="line"> <span class="keyword">static</span> String OLD_FS_IMAGE = <span class="string">"fsimage.old"</span>;</div><div class="line"></div><div class="line"> <span class="comment">// 即我们的头结点</span></div><div class="line"> INode rootDir = <span class="keyword">new</span> INode(<span class="string">""</span>, <span class="keyword">null</span>, <span class="keyword">null</span>);</div><div class="line"> TreeSet activeBlocks = <span class="keyword">new</span> TreeSet();</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">boolean</span> <span class="title">unprotectedAddFile</span><span class="params">(UTF8 name, Block blocks[])</span> </span>{</div><div class="line"> <span class="keyword">synchronized</span> (rootDir) {</div><div class="line"> <span class="keyword">if</span> (blocks != <span class="keyword">null</span>) {</div><div class="line"> <span class="comment">// Add file->block mapping</span></div><div class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < blocks.length; i++) {</div><div class="line"> activeBlocks.add(blocks[i]);</div><div class="line"> }</div><div class="line"> }</div><div class="line"> <span class="keyword">return</span> (rootDir.addNode(name.toString(), blocks) != <span class="keyword">null</span>);</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">boolean</span> <span class="title">loadFSImage</span><span class="params">(File fsdir, File edits)</span> <span class="keyword">throws</span> IOException </span>{</div><div class="line"> File curFile = <span class="keyword">new</span> File(fsdir, FS_IMAGE);</div><div class="line"> File newFile = <span class="keyword">new</span> File(fsdir, NEW_FS_IMAGE);</div><div class="line"> File oldFile = <span class="keyword">new</span> File(fsdir, OLD_FS_IMAGE);</div><div class="line"></div><div class="line"> <span class="comment">// 这里的判断挺有意思的</span></div><div class="line"> <span class="comment">// saveFSImage中途失败还没从命名或者一些已经命名好了的文件,通过不同判断进行修复</span></div><div class="line"></div><div class="line"> <span class="keyword">if</span> (oldFile.exists() && curFile.exists()) {</div><div class="line"> oldFile.delete();</div><div class="line"> <span class="keyword">if</span> (edits.exists()) {</div><div class="line"> edits.delete();</div><div class="line"> }</div><div class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (oldFile.exists() && newFile.exists()) {</div><div class="line"> newFile.renameTo(curFile);</div><div class="line"> oldFile.delete();</div><div class="line"> } <span class="keyword">else</span> <span class="keyword">if</span> (curFile.exists() && newFile.exists()) {</div><div class="line"> newFile.delete();</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">if</span> (curFile.exists()) {</div><div class="line"> DataInputStream in = <span class="keyword">new</span> DataInputStream(<span class="keyword">new</span> BufferedInputStream(<span class="keyword">new</span> FileInputStream(curFile)));</div><div class="line"> <span class="keyword">try</span> {</div><div class="line"> <span class="keyword">int</span> numFiles = in.readInt();</div><div class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">0</span>; i < numFiles; i++) {</div><div class="line"> UTF8 name = <span class="keyword">new</span> UTF8();</div><div class="line"> name.readFields(in);</div><div class="line"> <span class="keyword">int</span> numBlocks = in.readInt();</div><div class="line"> <span class="keyword">if</span> (numBlocks == <span class="number">0</span>) {</div><div class="line"> unprotectedAddFile(name, <span class="keyword">null</span>);</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> Block blocks[] = <span class="keyword">new</span> Block[numBlocks];</div><div class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> j = <span class="number">0</span>; j < numBlocks; j++) {</div><div class="line"> blocks[j] = <span class="keyword">new</span> Block();</div><div class="line"> blocks[j].readFields(in);</div><div class="line"> }</div><div class="line"> unprotectedAddFile(name, blocks);</div><div class="line"> }</div><div class="line"> }</div><div class="line"> } <span class="keyword">finally</span> {</div><div class="line"> in.close();</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">return</span> <span class="keyword">true</span>;</div><div class="line"></div><div class="line"> <span class="comment">// if (edits.exists() && loadFSEdits(edits) > 0) {</span></div><div class="line"> <span class="comment">// return true;</span></div><div class="line"> <span class="comment">// } else {</span></div><div class="line"> <span class="comment">// return false;</span></div><div class="line"> <span class="comment">// }</span></div><div class="line"></div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">saveFSImage</span><span class="params">(File fullimage, File edits)</span> <span class="keyword">throws</span> IOException </span>{</div><div class="line"> File curFile = <span class="keyword">new</span> File(fullimage, FS_IMAGE);</div><div class="line"> File newFile = <span class="keyword">new</span> File(fullimage, NEW_FS_IMAGE);</div><div class="line"> File oldFile = <span class="keyword">new</span> File(fullimage, OLD_FS_IMAGE);</div><div class="line"></div><div class="line"> <span class="comment">//</span></div><div class="line"> <span class="comment">// Write out data</span></div><div class="line"> <span class="comment">//</span></div><div class="line"> DataOutputStream out = <span class="keyword">new</span> DataOutputStream(<span class="keyword">new</span> BufferedOutputStream(<span class="keyword">new</span> FileOutputStream(newFile)));</div><div class="line"> <span class="keyword">try</span> {</div><div class="line"> out.writeInt(rootDir.numItemsInTree() - <span class="number">1</span>);</div><div class="line"> rootDir.saveImage(<span class="string">""</span>, out);</div><div class="line"> } <span class="keyword">finally</span> {</div><div class="line"> out.close();</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">//</span></div><div class="line"> <span class="comment">// Atomic move sequence(这里说的原子操作我之前想成是类似事务的功能了,所在自己抛出一个异常,看看是否会回退,结果并没有)</span></div><div class="line"></div><div class="line"> <span class="comment">// 1-4步:把当前fsimage修改为old文件,并将saveImage方法写入数据的文件重命名为fsimage,之后删除edits和old文件</span></div><div class="line"> <span class="comment">// 1. Move cur to old</span></div><div class="line"> curFile.renameTo(oldFile);</div><div class="line"> </div><div class="line"><span class="comment">// int i = 1 / 0; // 测试原子性</span></div><div class="line"></div><div class="line"> <span class="comment">// 2. Move new to cur</span></div><div class="line"> newFile.renameTo(curFile);</div><div class="line"></div><div class="line"> <span class="comment">// 3. Remove pending-edits file (it's been integrated with newFile)</span></div><div class="line"> edits.delete();</div><div class="line"></div><div class="line"> <span class="comment">// 4. Delete old</span></div><div class="line"> oldFile.delete();</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title">main</span><span class="params">(String[] args)</span> <span class="keyword">throws</span> IOException </span>{</div><div class="line"> File dir = <span class="keyword">new</span> File(<span class="string">"tmp/hadoopx/dfs/name"</span>);</div><div class="line"> File fullimage = <span class="keyword">new</span> File(dir, <span class="string">"image"</span>);</div><div class="line"> File edits = <span class="keyword">new</span> File(dir, <span class="string">"edits"</span>);</div><div class="line"> FSDirectorTest fsDirectorTest = <span class="keyword">new</span> FSDirectorTest();</div><div class="line"> <span class="keyword">boolean</span> loadFSImage = fsDirectorTest.loadFSImage(fullimage, edits);</div><div class="line"> System.out.println(loadFSImage);</div><div class="line"> <span class="keyword">if</span> (loadFSImage) {</div><div class="line"> fsDirectorTest.saveFSImage(fullimage, edits);</div><div class="line"> }</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<h3 id="流程图"><a href="#流程图" class="headerlink" title="流程图?"></a>流程图?</h3><p>也算不上是一个流程图,只不过把代码上一些内容以图片方式呈现。</p>
<p><img src="https://github.com/basebase/img_server/blob/master/Hadoop-NameNode%E5%90%AF%E5%8A%A8%E4%B9%8BFSDirectiry/addNode.png?raw=true" alt="addNode"></p>
<p>写入到.new的文件后然后重命名为fsimage之后再删除.old的文件这样就算加载完毕了。</p>
]]></content>
</entry>
<entry>
<title><![CDATA[Java线程介绍及安全]]></title>
<url>http://yoursite.com/2017/04/16/Java%E7%BA%BF%E7%A8%8B%E4%BB%8B%E7%BB%8D%E5%8F%8A%E5%AE%89%E5%85%A8/</url>
<content type="html"><![CDATA[<h3 id="链接"><a href="#链接" class="headerlink" title="链接"></a>链接</h3><p><a href="https://github.com/forhappy/Cplusplus-Concurrency-In-Practice/blob/master/zh/chapter1-Introduction/1.1%20What%20is%20concurrency.md" target="_blank" rel="external">https://github.com/forhappy/Cplusplus-Concurrency-In-Practice/blob/master/zh/chapter1-Introduction/1.1%20What%20is%20concurrency.md</a></p>
]]></content>
</entry>
<entry>
<title><![CDATA[mysql学习笔记-事务隔离级别]]></title>
<url>http://yoursite.com/2017/03/05/mysql%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B0-%E4%BA%8B%E5%8A%A1%E9%9A%94%E7%A6%BB%E7%BA%A7%E5%88%AB/</url>
<content type="html"><![CDATA[<p>事务是什么?<br> 事务就是一组原子性的SQL查询,或者说一个独立的工作单元。如果数据库引擎能够<br> 成功的对数据库应用该组查询的全部语句,那么执行该组查询。如果其中有任何一条语句因为崩溃<br> 或者其它原因无法执行,那么所有的语句都不会执行。也就说,事务内的语句,要么全部执行成功,<br> 要么全部执行失败。</p>
<a id="more"></a>
<h3 id="事务"><a href="#事务" class="headerlink" title="事务"></a>事务</h3><h4 id="事务是什么?"><a href="#事务是什么?" class="headerlink" title="事务是什么?"></a>事务是什么?</h4><p> 事务就是一组原子性的SQL查询,或者说一个独立的工作单元。如果数据库引擎能够<br> 成功的对数据库应用该组查询的全部语句,那么执行该组查询。如果其中有任何一条语句因为崩溃<br> 或者其它原因无法执行,那么所有的语句都不会执行。也就说,事务内的语句,要么全部执行成功,<br> 要么全部执行失败。</p>
<h3 id="事务的特性ACID"><a href="#事务的特性ACID" class="headerlink" title="事务的特性ACID"></a>事务的特性ACID</h3><p>银行的例子是解释事务最好的例子,假设一个银行的数据库有两张表:支票(checking)和储蓄(savings)<br>表。现在要从用户小墨鱼的支票账户转移200元到小美的储蓄账户,那么至少需要三个步骤:</p>
<p>1、检查支票账户的余额高于200元<br>2、丛支票账户余额中减去200元<br>3、在储蓄账户账户中增加200元</p>
<p>上述三个步骤必须打包在一个事务中,任何一个步骤失败,则必须回滚所有的步骤。</p>
<p>我么可以使用<br><figure class="highlight sql"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">start</span> <span class="keyword">transaction</span></div></pre></td></tr></table></figure></p>
<p>开启一个事务,然后使用COMMIT提交事务将修改的数据持久保留,要么使用ROLLBACK撤销<br>所有的修改。事务SQL的样本如下:</p>
<figure class="highlight sql"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">start</span> <span class="keyword">transaction</span>;</div><div class="line"><span class="keyword">select</span> balance <span class="keyword">from</span> checking <span class="keyword">where</span> customer_id = <span class="number">111</span>;</div><div class="line"><span class="keyword">update</span> checking <span class="keyword">set</span> balance = balance - <span class="number">200</span> <span class="keyword">where</span> customer_id = <span class="number">111</span>;</div><div class="line"><span class="keyword">update</span> savings <span class="keyword">set</span> balance = balance + <span class="number">200</span> <span class="keyword">where</span> customer_id = <span class="number">111</span>;</div></pre></td></tr></table></figure>
<p>单纯的事务概念并不是故事的全部。试想一下,如果执行到第四条语句的时候服务器崩溃了,会发生什么?<br>天知道用户会损失200元。再假如,在执行到第三条第四条语句之间时,另外一个进程要删除支票账户<br>上的所有余额,那么结果可能就是银行在不知道这个逻辑的情况下白送200元。</p>
<p>除非系统通过严格的ACID测试,否则空谈事务的概念是不够的。<br>ACID表示原子性(Atomicity)、一致性(consistency)、隔离性(isolation)和持久性(durability)<br>一个运行良好的事务处理系统,必须具备这些标准的特征。</p>
<p><font color="DeepPink" size="2"> 原子性: </font><br> 一个事务必须被视为一个不可分割的最小工作单元,整个事务中的所有操作要么全部提交成功,<br> 要么全部失败回滚,对于一个事务来说,不可能只执行其中的一部分操作,这就是事务的原子性。</p>
<p><font color="DeepPink" size="2"> 一致性: </font><br> 数据库总是从一个一致性的状态转换到另一个一致性状态。在前面的例子中,一致性确保了,即使<br> 执行第三、四条语句之间系统崩溃,支票的账户中也不会有损失,因为事务最终没有提交,所以<br> 事务中所做的修改也不会保存到数据库中。</p>
<p><font color="DeepPink" size="2"> 隔离性:</font><br> 通常来说,一个事务所做的修改在最终提交以前,对其它事务是不可见的。在前面的例子中,当执行<br> 完第三条语句、第四条语句还未开始时,此时有另外一个账户汇总程序开始运行,则其看到的支票账户的<br> 余额并没有被减去200元。</p>
<p><font color="DeepPink" size="2"> 持久性: </font><br> 一旦事务提交,则其所做的修改就会永久保存到数据库中。此时即使系统崩溃,修改的数据也不会<br> 丢失。持久性是个有点模糊的概念,因为实际上持久性也分很多不同级别。有些持久性策略能够提供<br> 非常强的安全保障,而有些则未必。而且不可能有做到100%的持久性保障策略。</p>
<p>一个实现了ACID的数据库,相比没有实现ACID的数据库,通常会需要更强的CPU处理能力、更大的<br>内存和更多的磁盘空间。</p>
<h3 id="隔离级别"><a href="#隔离级别" class="headerlink" title="隔离级别"></a>隔离级别</h3><p>隔离性其实比想象的要复杂,在SQL标准中定义了四种隔离级别,每一种级别都规定了一个事务中所<br>做的修改,哪些在事务内和事务间可见的,哪些是不可见的。较低级别的隔离通常可以执行更高的并发,<br>系统的开销也比较低。</p>
<p>每种存储引擎实现的隔离级别不尽相同。如果熟悉其它的数据库产品,可能会发现某些特性和你期望的<br>会有一些不同,可以根据所选择的引擎查阅相关的手册。</p>
<p>下面简单的介绍一下四种隔离级别。</p>
<p><font color="DeepPink" size="2"> READ UNCOMMITTED(未提交读) </font><br> 在READ UNCOMMITTED级别,事务中的修改,即使没有提交,对其它事务也是可见的。<br> 事务可以读取为提交的数据,这也被称为“脏读”。这个级别会导致很多问题,从性能上<br> 来说,READ UNCOMMITTED不会比其它的级别好太多,但缺乏其它级别的很多好处,除非<br> 真的有非常必要的理由,在实际应用中一般很少使用。</p>
<p><font color="DeepPink" size="2"> READ COMMITTED(提交读) </font><br> 大多数数据库的默认隔离级别都是READ COMMITTED(但mysql不是)。READ COMMITTED满足<br> 前面提到的隔离性的简单定义:一个事务开始时,只能“看见”已经提交的事务所做的改变。<br> 换句话说,一个事务从开始直到提交之前,所做的任何修改对其它事务是不可见的。这个级别<br> 有时候也叫“不可重复读”,因为两次执行同样的查询,可能会得到不一样的结果。</p>
<p><font color="DeepPink" size="2"> REPEATABLE READ(可重复读) </font><br> REPEATABLE READ解决了脏读的问题,该级别保证了在同一个事务中多次读取同样记录的结果是<br> 一致的。但是理论上,可重复读隔离级别还是无法解决另外一个幻读的问题。所谓的幻读,指的是<br> 当某个事务在读取某个范围内的记录时,另外一个事务又在该范围内插入了新的记录,当之前的事务<br> 再次读取该范围的记录时,会产生幻行。InnoDB和XtraDB存储引擎通过多版本并发控制(MVCC)<br> 解决了幻读的问题。可重复读是mysql默认事务隔离级别。</p>
<p><font color="DeepPink" size="2"> SERIALIZABLE(可串行化) </font><br> SERIALIZABLE是最高的隔离级别。它通过强制事务串行执行,避免了前面说的幻读问题。<br> 简单来说,SERIALIZABLE会在读取的每一行数据上都加锁,所以可能导致大量的超时和锁<br> 争用的情况。实际应用中也很少用到这个隔离级别,只有在非常需要确保数据一致性而且可以接受<br> 没有并发的情况下,才考虑采用该级别。</p>
<p>其实,文章到这里已经就要结束的。<br>不过还要一些内容顺带一起写上去😏。。。</p>
<h3 id="死锁"><a href="#死锁" class="headerlink" title="死锁"></a>死锁</h3><p>死锁指的是两个或者多个事务在同一资源上相互占用,并请求锁定对方占用的资源,从而导致<br>而行循环现象。当多个事务视图以不同顺序锁定资源时,就可能会产生死锁。<br>多个事务同事锁定同一个资源时,也会产生死锁。例如,设想下面两个事务同时处理StockPrice表:</p>
<p>事务1</p>
<figure class="highlight sql"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">start</span> <span class="keyword">transaction</span>;</div><div class="line"><span class="keyword">update</span> stockprice <span class="keyword">set</span> <span class="keyword">close</span> = <span class="number">22</span> <span class="keyword">where</span> stock_id = <span class="number">4</span> <span class="keyword">and</span> <span class="built_in">date</span> = <span class="string">'2017-01-01'</span></div><div class="line"><span class="keyword">update</span> stockprice <span class="keyword">set</span> <span class="keyword">close</span> = <span class="number">11</span> <span class="keyword">where</span> stock_id = <span class="number">3</span> <span class="keyword">and</span> <span class="built_in">date</span> = <span class="string">'2017-02-03'</span></div><div class="line"><span class="keyword">commit</span>;</div></pre></td></tr></table></figure>
<p>事务2</p>
<figure class="highlight sql"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">start</span> <span class="keyword">transaction</span>;</div><div class="line"><span class="keyword">update</span> stockprice <span class="keyword">set</span> <span class="keyword">high</span> = <span class="number">20.12</span> <span class="keyword">where</span> stock_id = <span class="number">3</span> <span class="keyword">and</span> <span class="built_in">date</span> = <span class="string">'2017-02-03'</span></div><div class="line"><span class="keyword">update</span> stockprice <span class="keyword">set</span> <span class="keyword">high</span> = <span class="number">47.20</span> <span class="keyword">where</span> stock_id = <span class="number">4</span> <span class="keyword">and</span> <span class="built_in">date</span> = <span class="string">'2017-01-01'</span></div><div class="line"><span class="keyword">commit</span>;</div></pre></td></tr></table></figure>
<p>如果凑巧,两个事务都执行了第一条UPDATE语句,更新了一行数据,同时也锁定了该行数据,接着<br>每个事务都尝试去执行第二条UPDATE语句,却发现改行已经被对方锁定,然后两个事务都等待对方<br>释放锁,同事又持有对方需要的锁,则陷入死循环。除非有外部因素介入才可能解除死锁。</p>
<p>InnoDB目前解决死锁的方法是,将持有最少行级排它锁的事务进行回滚。</p>
<p>锁定行为和顺序是和存储引擎有关的。同样的顺序执行语句,有些存储引擎会产生死锁,有些则不会。<br>死锁产生有双重原因:有些是因为真正的数据冲突,这种情况通常很难避免,但有些则完全是由存储引擎<br>的实现方式导致的。</p>
<p>死锁发生以后,只有部分或者完全回滚其中一个事务,才能打破死锁。对于事务型的系统,这是无法避免的,<br>所以应用程序在设计时必须考虑如何处理死锁。大多数情况下只需要重新执行因死锁的回滚事务。</p>
<h3 id="多版本并发控制"><a href="#多版本并发控制" class="headerlink" title="多版本并发控制"></a>多版本并发控制</h3><p>MySQL的大多数事务型存储引擎实现的都不是简单的行级锁。基于提升并发性能的考虑,他们一般<br>都同时实现了多版本并发控制(MVCC)。不仅是MySQL,包括Oracle、PostgreSQL等其他数据库<br>也都实现了MVCC,但各自的实现机制不尽相同,因为MVCC没有一个统一的的实现标准。</p>
<p>可以认为MVCC是行级锁的一个变种,但是它在很多情况下避免了加锁的操作,因此开销更低。<br>虽然实现机制有所不同,但大多实现了非阻塞的读操作,写操作也只锁定必要的行。</p>
<p>MVCC的实现,是通过保存数据在某个时间点的快照来实现的。也就是说,不管需要执行多长时间,<br>每个事务看到的数据都是一致的。根据事务开始的不同,每个事务对同一张表,同一时刻看到的数据<br>可能是不一样的(因为不同的时间点可能数据就已经产生了不同的快照版本,而每个事务在默认的RR隔离级别下只能看到事务开始时的数据快照)。</p>
<p>前面说到不同存储引擎的MVCC实现是不同的,典型的有乐观(optimistic)并发控制和<br>悲观(pessimistic)并发控制。下面通过InnoDB的简化版行为来说明MVCC是如何工作的。</p>
<p>InnoDB的MVCC,是通过每行记录后面保存两个隐藏的列来实现的。这两个列:一个是保存了行的创建时间,<br>一个保存行的过期时间(或删除时间)。当然存储的并不是实际的时间值,而是系统版本号。没开始一个<br>新的事务,系统版本号会自动增加。事务开始时刻的系统版本号会作为事务的版本号,用来和查询到的<br>每行记录的版本号进行比较。下面看一下在REPEATABLE READ隔离级别下,MVCC具体是如何操作的。</p>
<p>SELECT<br> InnoDB会根据以下两个条件检查每行记录:<br> a) InnoDB只查找版本早于当前事务版本的数据行(也就是,行的版本号小于或等于事务的系统版本号)<br> 这样可以确保事务读取的行,要么是在事务开始之前已经存在的,要么是事务自身插入或者修改过的。</p>
<pre><code>b) 行的删除版本要么未定义,要么大于当前事务版本号。这样可以确保事务读取到的行,在
事务开始之前为被删除。
</code></pre><p>INSERT<br> InnoDB为新插入的每一行保存当前系统版本号作为行版本号。</p>
<p>DELETE<br> InnoDB为删除的每一行保存当前系统版本号作为删除标识。</p>
<p>UPDATE<br> InnoDB为插入一行新记录,保存当前系统版本号做为行版本号,同时保存当前系统版本号<br> 到原来的行为作为删除标识。</p>
<p>保存着两个额外系统版本号,使大多数读操作都可以不用加锁。这样设计使得读数据操作很简单,<br>性能很好,并且也能保证只会读取到符合标准的行。不足之处就是每行记录都需要额外的存储空间,<br>需要做更多检查工作,以及一些额外的维护工作。</p>
<p>MVCC只在PEPEATABLE READ 和 READ COMMITTED两个隔离级别下工作。<br>其它两个隔离级别都和MVCC不兼容,因为READ UNCOMMITTED总是读取到最新的数据行,而不是<br>符合当前事务版本的数据行。而SERIALIZABLE则会对所有读取的行都加锁。</p>
<h3 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h3><p><a href="http://www.jianshu.com/p/8d735db9c2c0" target="_blank" rel="external">http://www.jianshu.com/p/8d735db9c2c0</a> 【隔离级别实战】<br><a href="http://tech.meituan.com/innodb-lock.html" target="_blank" rel="external">http://tech.meituan.com/innodb-lock.html</a> 【美团的博客质量向来都比较高】</p>
]]></content>
</entry>
<entry>
<title><![CDATA[hadoop Partitioner使用及注意点]]></title>
<url>http://yoursite.com/2017/02/07/hadoop-Partitioner%E4%BD%BF%E7%94%A8%E5%8F%8A%E6%B3%A8%E6%84%8F%E7%82%B9/</url>
<content type="html"><![CDATA[<p>前言<br>hadoop已经出来这么长时间了,分区的文章早已经多如牛毛,为何你还要写呢?<br>其实呢,这篇文章主要是想要介绍一下使用MR自定义分区需要注意的一些点。<br><a id="more"></a></p>
<h3 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h3><p>hadoop已经出来这么长时间了,分区的文章早已经多如牛毛,为何你还要写呢?<br>其实呢,这篇文章主要是想要介绍一下使用MR自定义分区需要注意的一些点。<br>可能早有前辈已经指出该问题了。但还是容我自己做一个小小的记录,哈哈哈~~~</p>
<p>我们知道map数据会写入到分区,默认的分区只有一个,但是我想要10个又或者是100个,可以吗?<br>当然是可以的是。</p>
<p>你只需要创建一个类继承org.apache.hadoop.mapreduce.Partitioner<br>类就可以完全定义自己想要的分区方式。<br>然后在job中设置自定义的Partitioner类即可。<br>但是,这样写真的就结束了吗?</p>
<h3 id="一个例子"><a href="#一个例子" class="headerlink" title="一个例子"></a>一个例子</h3><p>PartitionMapper.java<br><figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div></pre></td><td class="code"><pre><div class="line"><span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">map</span><span class="params">(Object key, Text value, Context context)</span></span></div><div class="line"> <span class="keyword">throws</span> IOException, InterruptedException {</div><div class="line"> String[] tokens = value.toString().split(<span class="string">","</span>);</div><div class="line"> String gender = tokens[<span class="number">2</span>];</div><div class="line"> String nameAgeScore = tokens[<span class="number">0</span>] + <span class="string">","</span> + tokens[<span class="number">1</span>] + <span class="string">","</span> + tokens[<span class="number">3</span>];</div><div class="line"> context.write(<span class="keyword">new</span> Text(gender), <span class="keyword">new</span> Text(nameAgeScore));</div><div class="line"> }</div></pre></td></tr></table></figure></p>
<p>AgePartitioner.java<br><figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div></pre></td><td class="code"><pre><div class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">getPartition</span><span class="params">(Text key, Text value, <span class="keyword">int</span> numPartitions)</span> </span>{</div><div class="line"> </div><div class="line"> String[] nameAgeScore = value.toString().split(<span class="string">","</span>);</div><div class="line"> String age = nameAgeScore[<span class="number">1</span>];</div><div class="line"> <span class="keyword">int</span> ageInt = Integer.parseInt(age);</div><div class="line"></div><div class="line"> <span class="keyword">if</span> (numPartitions == <span class="number">0</span>) {</div><div class="line"> <span class="keyword">return</span> <span class="number">0</span>;</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">if</span> (ageInt <= <span class="number">20</span>) {</div><div class="line"> <span class="keyword">return</span> <span class="number">0</span>;</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">if</span> (ageInt > <span class="number">20</span> && ageInt <= <span class="number">50</span>) {</div><div class="line"> <span class="keyword">return</span> <span class="number">1</span> % numPartitions;</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keyword">return</span> <span class="number">2</span> % numPartitions;</div><div class="line"> }</div></pre></td></tr></table></figure></p>
<p>ParitionReducer.java<br><figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">reduce</span><span class="params">(Text key, Iterable<Text> values, Context context)</span></span></div><div class="line"> <span class="keyword">throws</span> IOException, InterruptedException {</div><div class="line"> </div><div class="line"> <span class="keyword">int</span> maxScore = Integer.MIN_VALUE;</div><div class="line"> String name = <span class="string">""</span>;</div><div class="line"> String age = <span class="string">""</span>;</div><div class="line"> String gender = <span class="string">""</span>;</div><div class="line"> </div><div class="line"> <span class="keyword">int</span> score = <span class="number">0</span>;</div><div class="line"> <span class="keyword">for</span> (Text val : values) {</div><div class="line"> String[] valTokens = val.toString().split(<span class="string">","</span>);</div><div class="line"> score = Integer.parseInt(valTokens[<span class="number">2</span>]);</div><div class="line"> </div><div class="line"> <span class="keyword">if</span> (score > maxScore) {</div><div class="line"> name = valTokens[<span class="number">0</span>];</div><div class="line"> age = valTokens[<span class="number">1</span>];</div><div class="line"> gender = key.toString();</div><div class="line"> maxScore = score;</div><div class="line"> }</div><div class="line"> }</div><div class="line"> </div><div class="line"> context.write(<span class="keyword">new</span> Text(name), <span class="keyword">new</span> Text(<span class="string">"age- "</span> + age + <span class="string">","</span> + gender + <span class="string">","</span> + <span class="string">" score-"</span> + maxScore));</div><div class="line"> }</div></pre></td></tr></table></figure></p>
<p>驱动类,具体的模板代码我就不再写入,只将Partitioner设置展示<br><figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">job.setPartitionerClass(AgePartitioner.class);</div></pre></td></tr></table></figure></p>
<p>以上这个例子,是我在其它文章中截取下来的,具体地址,会在链接中给出。<br>现在,你可以运行该例子,你会发现Reduce输出的只有一个文件,然后你还会发现其实使用的并非是自定义<br>的Partitioner类。</p>
<p>一开始的时候,我有点懵逼了。what?我的设置没有生效吗?<br>你的设置是没有问题的,但是你却忘记了一项重要的事情。究竟是什么事情呀,快点说说呀(臭鱼)。<br>在说出这个秘密之前,我们看看map的context.write()这个方法是怎么做的吧。</p>
<p>MapTask.java<br><figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">write</span><span class="params">(K key, V value)</span> <span class="keyword">throws</span> IOException, InterruptedException </span>{</div><div class="line"> collector.collect(key, value,</div><div class="line"> partitioner.getPartition(key, value, partitions));</div><div class="line"> }</div></pre></td></tr></table></figure></p>
<p>其中partitioner是在哪里定义的呢?<br>在NewOutputCollector类中,该类作为MapTask内部类。<br><figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">private</span> <span class="class"><span class="keyword">class</span> <span class="title">NewOutputCollector</span><<span class="title">K</span>,<span class="title">V</span>></span></div><div class="line"> <span class="keyword">extends</span> <span class="title">org</span>.<span class="title">apache</span>.<span class="title">hadoop</span>.<span class="title">mapreduce</span>.<span class="title">RecordWriter</span><<span class="title">K</span>,<span class="title">V</span>> {</div><div class="line"> <span class="keyword">private</span> <span class="keyword">final</span> MapOutputCollector<K,V> collector;</div><div class="line"> <span class="keyword">private</span> <span class="keyword">final</span> org.apache.hadoop.mapreduce.Partitioner<K,V> partitioner;</div><div class="line"> <span class="keyword">private</span> <span class="keyword">final</span> <span class="keyword">int</span> partitions;</div><div class="line"></div><div class="line"> <span class="meta">@SuppressWarnings</span>(<span class="string">"unchecked"</span>)</div><div class="line"> NewOutputCollector(org.apache.hadoop.mapreduce.JobContext jobContext,</div><div class="line"> JobConf job,</div><div class="line"> TaskUmbilicalProtocol umbilical,</div><div class="line"> TaskReporter reporter</div><div class="line"> ) <span class="keyword">throws</span> IOException, ClassNotFoundException {</div><div class="line"> collector = createSortingCollector(job, reporter);</div><div class="line"> partitions = jobContext.getNumReduceTasks();</div><div class="line"> <span class="keyword">if</span> (partitions > <span class="number">1</span>) {</div><div class="line"> partitioner = (org.apache.hadoop.mapreduce.Partitioner<K,V>)</div><div class="line"> ReflectionUtils.newInstance(jobContext.getPartitionerClass(), job);</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> partitioner = <span class="keyword">new</span> org.apache.hadoop.mapreduce.Partitioner<K,V>() {</div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">getPartition</span><span class="params">(K key, V value, <span class="keyword">int</span> numPartitions)</span> </span>{</div><div class="line"> <span class="keyword">return</span> partitions - <span class="number">1</span>;</div><div class="line"> }</div><div class="line"> };</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> ...</div></pre></td></tr></table></figure></p>
<p>我们来看看构造函数之中,如果partitions大于1就从配置中读取我们自己的Partitioner对象并实例化给引用,否则自己就创建一个实例。<br>那partitions是从jobContext.getNumReduceTasks();读取出来的,这个要怎么配置呢?</p>
<font color="DeepPink" size="3"><br>job.setNumReduceTasks(number);<br></font>
<p>配置该值之后,那么就可以使用我们自己定义的分区函数了。<br>好了,文章到这里也就结束了,欢迎大家拍砖!!!</p>
<h3 id="链接"><a href="#链接" class="headerlink" title="链接"></a>链接</h3><p><a color="red" href="https://hadooptutorial.wikispaces.com/Custom+partitioner" target="_blank" rel="external"><br>就这里的例子,数据这里也有</a></p>
]]></content>
</entry>
<entry>
<title><![CDATA[mapreduce join]]></title>
<url>http://yoursite.com/2017/02/03/mapreduce-join/</url>
<content type="html"><![CDATA[<p>前言<br>我们知道hive,mysql等sql语言都可以进行join操作。那么mapreduce是如何join的呢?<br>在说明mapreduce进行join开始,我们来先看看sql的语法。</p>
<a id="more"></a>
<h3 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h3><p>我们知道hive,mysql等sql语言都可以进行join操作。那么mapreduce是如何join的呢?<br>在说明mapreduce进行join开始,我们来先看看sql的语法。</p>
<figure class="highlight sql"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">SELECT</span> </div><div class="line"> tab1.*, </div><div class="line"> tab2.* </div><div class="line"><span class="keyword">FROM</span> </div><div class="line"> table1 tab1</div><div class="line"><span class="keyword">JOIN</span></div><div class="line"> table2 tb2 </div><div class="line"> <span class="keyword">on</span> tab1.id = tab2.id</div></pre></td></tr></table></figure>
<p>两张表关联在一起,需要什么数据,就从不同的表中取出数据即可。</p>
<h3 id="mapreduce如何JOIN"><a href="#mapreduce如何JOIN" class="headerlink" title="mapreduce如何JOIN"></a>mapreduce如何JOIN</h3><p>通过前面的铺垫,我们知道sql进行表关联是多么的简单。那么通过程序如何进行关联呢?<br>我们hive的数据是不是从HDFS上来的,HDFS上是不是文件数据。那么我们可以把表数据看成是一个文件。<br>那么两张表可不可以看成两个文件,hive也是拿到这两个文件进行关联的。</p>
<p>现在如果不使用MR进行计算,写一个程序来进行连接呢?<br>文件A假设1GB,文件B500MB。<br>此时,我么可以把文件B读入到内存之中。<br>文件B的数据结构:<br> <figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">Map<String, List<String>></div></pre></td></tr></table></figure></p>
<p>然后一行一行读取文件A的数据,判断key是否存在,如果存在,则把文件A的值+List的值遍历输出<br>是不是也可以得到呢。</p>
<p>上面的数据可以放置在内存中,我觉得挺合理的,应为单机能处理的数据表示数据量普遍不算很大。<br>但是,如果说你现在的数据有1T呢?还有可能放置在内存中吗?</p>
<p>我看过大部分博客内容,大部分都是相同的写法如下:</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">static</span> <span class="class"><span class="keyword">class</span> <span class="title">MyReducer</span> <span class="keyword">extends</span> <span class="title">Reducer</span><<span class="title">Text</span>, <span class="title">Text</span>, <span class="title">Text</span>, <span class="title">Text</span>> </span>{</div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">reduce</span><span class="params">(Text key, Iterable<Text> values,</span></span></div><div class="line"> Reducer<Text, Text, Text, Text>.Context context)</div><div class="line"> <span class="keyword">throws</span> IOException, InterruptedException {</div><div class="line"></div><div class="line"> LinkedList<String> linkU = <span class="keyword">new</span> LinkedList<String>(); <span class="comment">//users值</span></div><div class="line"> LinkedList<String> linkL = <span class="keyword">new</span> LinkedList<String>(); <span class="comment">//login_logs值</span></div><div class="line"> </div><div class="line"> <span class="keyword">for</span> (Text tval : values) {</div><div class="line"> String val = tval.toString(); </div><div class="line"> <span class="keyword">if</span>(val.startsWith(<span class="string">"u#"</span>)) {</div><div class="line"> linkU.add(val.substring(<span class="number">2</span>));</div><div class="line"> } <span class="keyword">else</span> <span class="keyword">if</span>(val.startsWith(<span class="string">"l#"</span>)) {</div><div class="line"> linkL.add(val.substring(<span class="number">2</span>));</div><div class="line"> }</div><div class="line"> }</div><div class="line"> </div><div class="line"> <span class="keyword">for</span> (String u : linkU) {</div><div class="line"> <span class="keyword">for</span> (String l : linkL) {</div><div class="line"> context.write(key, <span class="keyword">new</span> Text(u + DELIMITER + l));</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div></pre></td></tr></table></figure>
<p>这种写法对吗,是对的。但是有没有更好的,有。(稍后让你看,😏😏😏😏)<br>但是,我么通常也会遇到数据倾斜的问题,可能是某一组值特别的大,那么如果在JOIN的时候<br>也遇到了特别多相同的key值,那么内存还放得下吗?<br>不过这位使用LinkedList也是非常不错的,插入删除速度也是优于ArrayList(点个赞)。</p>
<p>好了,说了这么多。你真的能比这些人的内容写的好的吗?不会再吹牛吧(尴尬表情)</p>
<p>来说说我如何去做。<br>自定义key值之后,由于我不是放入内存的,所以字段输出的顺序可能是有点问题的<br>所以还要进行二次排序,到reduce的时候一组数据已经在一起了,我么设置一个boolean值<br>在第一次遍历值的时候不写入文件中,而是记录在一个字符串,然后第二次boolean更改后<br>把上一次和这一次的值一起写入。</p>
<p>来看看代码吧:</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div><div class="line">51</div><div class="line">52</div><div class="line">53</div><div class="line">54</div><div class="line">55</div><div class="line">56</div><div class="line">57</div><div class="line">58</div><div class="line">59</div><div class="line">60</div><div class="line">61</div><div class="line">62</div><div class="line">63</div><div class="line">64</div><div class="line">65</div><div class="line">66</div><div class="line">67</div><div class="line">68</div><div class="line">69</div><div class="line">70</div><div class="line">71</div><div class="line">72</div><div class="line">73</div><div class="line">74</div><div class="line">75</div><div class="line">76</div><div class="line">77</div><div class="line">78</div><div class="line">79</div><div class="line">80</div><div class="line">81</div><div class="line">82</div><div class="line">83</div><div class="line">84</div><div class="line">85</div><div class="line">86</div><div class="line">87</div><div class="line">88</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">Pair</span> <span class="keyword">implements</span> <span class="title">WritableComparable</span><<span class="title">Pair</span>> </span>{</div><div class="line"></div><div class="line"> <span class="keyword">private</span> Text first;</div><div class="line"> <span class="keyword">private</span> Text second;</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="title">Pair</span><span class="params">(String first, String second)</span> </span>{</div><div class="line"> set(<span class="keyword">new</span> Text(first), <span class="keyword">new</span> Text(second));</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="title">Pair</span><span class="params">()</span> </span>{</div><div class="line"> set(<span class="keyword">new</span> Text(), <span class="keyword">new</span> Text());</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="title">Pair</span><span class="params">(Text first, String second)</span> </span>{</div><div class="line"> set(first, <span class="keyword">new</span> Text(second));</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="title">Pair</span><span class="params">(String first, Text second)</span> </span>{</div><div class="line"> set(<span class="keyword">new</span> Text(first), second);</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">set</span><span class="params">(Text first, Text second)</span> </span>{</div><div class="line"> <span class="keyword">this</span>.first = first;</div><div class="line"> <span class="keyword">this</span>.second = second;</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> Text <span class="title">getFirst</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">return</span> first;</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> Text <span class="title">getSecond</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">return</span> second;</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">write</span><span class="params">(DataOutput out)</span> <span class="keyword">throws</span> IOException </span>{</div><div class="line"> first.write(out);</div><div class="line"> second.write(out);</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">readFields</span><span class="params">(DataInput in)</span> <span class="keyword">throws</span> IOException </span>{</div><div class="line"> first.readFields(in);</div><div class="line"> second.readFields(in);</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">hashCode</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">return</span> first.hashCode() * <span class="number">163</span> + second.hashCode();</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">boolean</span> <span class="title">equals</span><span class="params">(Object obj)</span> </span>{</div><div class="line"> <span class="keyword">if</span> (obj <span class="keyword">instanceof</span> Pair) {</div><div class="line"> Pair pair = (Pair) obj;</div><div class="line"> <span class="keyword">return</span> first.equals(pair.first) && second.equals(pair.second);</div><div class="line"> }</div><div class="line"> <span class="keyword">return</span> <span class="keyword">false</span>;</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> String <span class="title">toString</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">return</span> <span class="keyword">this</span>.first.toString();</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">compareTo</span><span class="params">(Pair pair)</span> </span>{</div><div class="line"> <span class="comment">// int cmp = first.compareTo(pair.first);</span></div><div class="line"> <span class="keyword">int</span> cmp = pair.first.compareTo(first);</div><div class="line"> <span class="keyword">if</span> (cmp != <span class="number">0</span>) {</div><div class="line"> <span class="keyword">return</span> cmp;</div><div class="line"> }</div><div class="line"> <span class="comment">// return second.compareTo(pair.second);</span></div><div class="line"> <span class="keyword">return</span> pair.second.compareTo(second);</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">compareTo</span><span class="params">(Pair pair, <span class="keyword">int</span> index)</span> </span>{</div><div class="line"> <span class="keyword">if</span> (index == <span class="number">1</span>) {</div><div class="line"><span class="comment">// return this.first.compareTo(pair.first);</span></div><div class="line"> <span class="keyword">return</span> pair.first.compareTo(<span class="keyword">this</span>.first);</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"><span class="comment">// return this.second.compareTo(pair.second);</span></div><div class="line"> <span class="keyword">return</span> pair.second.compareTo(<span class="keyword">this</span>.second);</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line">}</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">LJoinMapper</span> <span class="keyword">extends</span> <span class="title">Mapper</span><<span class="title">Object</span>, <span class="title">Text</span>, <span class="title">Pair</span>, <span class="title">Text</span>> </span>{</div><div class="line"></div><div class="line"> <span class="keyword">private</span> Text key = <span class="keyword">new</span> Text();</div><div class="line"> <span class="keyword">private</span> Text val = <span class="keyword">new</span> Text();</div><div class="line"></div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">map</span><span class="params">(Object line, Text value, Context context)</span> <span class="keyword">throws</span> IOException, InterruptedException </span>{</div><div class="line"> String[] tokens = value.toString().split(<span class="string">","</span>);</div><div class="line"> <span class="keyword">if</span> (tokens.length < <span class="number">3</span>) {</div><div class="line"> <span class="keyword">return</span>;</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// key</span></div><div class="line"> String k = tokens[<span class="number">0</span>] + <span class="string">","</span> + tokens[<span class="number">1</span>];</div><div class="line"> </div><div class="line"> Pair pair = <span class="keyword">new</span> Pair(k, <span class="string">"0"</span>);</div><div class="line"></div><div class="line"> <span class="comment">// val</span></div><div class="line"><span class="comment">// String v = "left" + "," + tokens[2];</span></div><div class="line"> String v = tokens[<span class="number">2</span>];</div><div class="line"><span class="comment">// key.set(k);</span></div><div class="line"> val.set(v);</div><div class="line"><span class="comment">// context.write(key, val);</span></div><div class="line"> context.write(pair, val);</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">RJoinMapper</span> <span class="keyword">extends</span> <span class="title">Mapper</span><<span class="title">Object</span>, <span class="title">Text</span>, <span class="title">Pair</span>, <span class="title">Text</span>> </span>{</div><div class="line"></div><div class="line"><span class="comment">// private Text key = new Text();</span></div><div class="line"> <span class="keyword">private</span> Text val = <span class="keyword">new</span> Text();</div><div class="line"></div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">map</span><span class="params">(Object line, Text value, Context context)</span> <span class="keyword">throws</span> IOException, InterruptedException </span>{</div><div class="line"> String[] tokens = value.toString().split(<span class="string">","</span>);</div><div class="line"> <span class="keyword">if</span> (tokens.length < <span class="number">5</span>) {</div><div class="line"> <span class="keyword">return</span>;</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="comment">// key</span></div><div class="line"> String k = tokens[<span class="number">0</span>] + <span class="string">","</span> + tokens[<span class="number">1</span>];</div><div class="line"> </div><div class="line"> Pair pair = <span class="keyword">new</span> Pair(k, <span class="string">"1"</span>);</div><div class="line"> </div><div class="line"> <span class="comment">// val</span></div><div class="line"><span class="comment">// String v = "right" + "," + tokens[2] + "," + tokens[3] + "," + tokens[4];</span></div><div class="line"> String v = tokens[<span class="number">2</span>] + <span class="string">","</span> + tokens[<span class="number">3</span>] + <span class="string">","</span> + tokens[<span class="number">4</span>];</div><div class="line"><span class="comment">// key.set(k);</span></div><div class="line"> val.set(v);</div><div class="line"><span class="comment">// context.write(key, val);</span></div><div class="line"> context.write(pair, val);</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">JoinReducer</span> <span class="keyword">extends</span> <span class="title">Reducer</span><<span class="title">Pair</span>, <span class="title">Text</span>, <span class="title">NullWritable</span>, <span class="title">Text</span>> </span>{</div><div class="line"></div><div class="line"> <span class="keyword">private</span> Text val = <span class="keyword">new</span> Text();</div><div class="line"></div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">reduce</span><span class="params">(Pair key, Iterable<Text> values, Context context)</span> <span class="keyword">throws</span> IOException, InterruptedException </span>{</div><div class="line"></div><div class="line"> String deptName = <span class="keyword">null</span>;</div><div class="line"> <span class="keyword">boolean</span> set = <span class="keyword">false</span>;</div><div class="line"> </div><div class="line"> <span class="keyword">for</span> (Text v : values) {</div><div class="line"> </div><div class="line"> String[] vs = v.toString().split(<span class="string">","</span>);</div><div class="line"> </div><div class="line"> <span class="keyword">if</span> (!set) {</div><div class="line"> deptName = v.toString();</div><div class="line"> set = <span class="keyword">true</span>;</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> System.out.println(key.toString() + <span class="string">","</span> + deptName + <span class="string">","</span> + v);</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">JOINGroup</span> <span class="keyword">extends</span> <span class="title">WritableComparator</span> </span>{</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="title">JOINGroup</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">super</span>(Pair.class, <span class="keyword">true</span>);</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">compare</span><span class="params">(WritableComparable a, WritableComparable b)</span> </span>{</div><div class="line"> Pair keyA = (Pair) a;</div><div class="line"> Pair keyB = (Pair) b;</div><div class="line"><span class="comment">// return keyA.compareTo(keyB, 1);</span></div><div class="line"> <span class="keyword">return</span> keyB.compareTo(keyA, <span class="number">1</span>);</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">JOINPartition</span> <span class="keyword">extends</span> <span class="title">Partitioner</span><<span class="title">Pair</span>, <span class="title">Text</span>> </span>{</div><div class="line"></div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">getPartition</span><span class="params">(Pair key, Text value, <span class="keyword">int</span> numPartitions)</span> </span>{</div><div class="line"> <span class="keyword">return</span> (key.getFirst().hashCode() % numPartitions);</div><div class="line"> }</div><div class="line"></div><div class="line">}</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">JOINSort</span> <span class="keyword">extends</span> <span class="title">WritableComparator</span> </span>{</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="title">JOINSort</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">super</span>(Pair.class, <span class="keyword">true</span>);</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">compare</span><span class="params">(WritableComparable a, WritableComparable b)</span> </span>{</div><div class="line"> Pair compositeKey1 = (Pair) a;</div><div class="line"> Pair compositeKey2 = (Pair) b;</div><div class="line"> <span class="keyword">return</span> compositeKey2.compareTo(compositeKey1);</div><div class="line"> }</div><div class="line"></div><div class="line">}</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">Dirver</span> <span class="keyword">extends</span> <span class="title">Configured</span> <span class="keyword">implements</span> <span class="title">Tool</span> </span>{</div><div class="line"></div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">run</span><span class="params">(String[] args)</span> <span class="keyword">throws</span> Exception </span>{</div><div class="line"></div><div class="line"> <span class="keyword">if</span> (args.length != <span class="number">3</span>) {</div><div class="line"> System.out.printf(<span class="string">"Usage: %s [generic options] <input dir> <output dir>\n"</span>, getClass().getSimpleName());</div><div class="line"> ToolRunner.printGenericCommandUsage(System.out);</div><div class="line"> <span class="keyword">return</span> -<span class="number">1</span>;</div><div class="line"> }</div><div class="line"></div><div class="line"> Configuration conf = <span class="keyword">new</span> Configuration();</div><div class="line"> Job job = Job.getInstance(conf, <span class="string">"join"</span>);</div><div class="line"> job.setJarByClass(getClass());</div><div class="line"></div><div class="line"> MultipleInputs.addInputPath(job, <span class="keyword">new</span> Path(args[<span class="number">0</span>]), TextInputFormat.class, LJoinMapper.class);</div><div class="line"> MultipleInputs.addInputPath(job, <span class="keyword">new</span> Path(args[<span class="number">1</span>]), TextInputFormat.class, RJoinMapper.class);</div><div class="line"> FileOutputFormat.setOutputPath(job, <span class="keyword">new</span> Path(args[<span class="number">2</span>]));</div><div class="line"></div><div class="line"> job.setReducerClass(JoinReducer.class);</div><div class="line"> </div><div class="line"> job.setGroupingComparatorClass(JOINGroup.class);</div><div class="line"> job.setPartitionerClass(JOINPartition.class);</div><div class="line"> job.setSortComparatorClass(JOINSort.class);</div><div class="line"> </div><div class="line"> job.setMapOutputKeyClass(Pair.class);</div><div class="line"> job.setMapOutputValueClass(Text.class);</div><div class="line"></div><div class="line"> job.setOutputKeyClass(NullWritable.class);</div><div class="line"> job.setOutputValueClass(Text.class);</div><div class="line"></div><div class="line"> <span class="keyword">return</span> job.waitForCompletion(<span class="keyword">true</span>) ? <span class="number">0</span> : <span class="number">1</span>;</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title">main</span><span class="params">(String[] args)</span> </span>{</div><div class="line"> <span class="keyword">try</span> {</div><div class="line"> args = <span class="keyword">new</span> String[] {<span class="string">"in/l"</span>, <span class="string">"in/r"</span>, <span class="string">"ljoinout"</span>};</div><div class="line"> ToolRunner.run(<span class="keyword">new</span> Configuration(), <span class="keyword">new</span> Dirver(), args);</div><div class="line"> } <span class="keyword">catch</span> (Exception e) {</div><div class="line"> e.printStackTrace();</div><div class="line"> }</div><div class="line"> }</div></pre></td></tr></table></figure>
<p>测试数据:</p>
<p>l.txt<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">1,job,beijing</div><div class="line">2,jue,shanghai</div><div class="line">3,role,shenzhen</div><div class="line">4,jie,guangzhou</div></pre></td></tr></table></figure></p>
<p>r.txt<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div></pre></td><td class="code"><pre><div class="line">1,job,30,man,333330000</div><div class="line">2,jue,90,woman,9384832</div><div class="line">3,role,100,man,9103841038</div><div class="line">4,jie,0,man,103848103474</div><div class="line">1,job,20,man,333330000</div><div class="line">1,job,10,man,333330000</div></pre></td></tr></table></figure></p>
<p>以下是输出结果:<br><img src="https://github.com/basebase/img_server/blob/master/mapreduce-join/join-01.png?raw=true" alt="join"></p>
<font color="DeepPink" size="2"><br>以上程序作为内联展示给了大家,如果对文章内容有疑问,或者有更好的建议,又或者有土豪打赏<br>都不要吝啬。谢谢!<br></font>
<h3 id="还不错的文章链接"><a href="#还不错的文章链接" class="headerlink" title="还不错的文章链接"></a>还不错的文章链接</h3><p><a href="http://codingjunkie.net/mapreduce-reduce-joins/" target="_blank" rel="external">http://codingjunkie.net/mapreduce-reduce-joins/</a><br><a href="https://www.safaribooksonline.com/library/view/data-algorithms/9781491906170/ch01.html" target="_blank" rel="external">https://www.safaribooksonline.com/library/view/data-algorithms/9781491906170/ch01.html</a><br><a href="https://chamibuddhika.wordpress.com/2012/02/26/joins-with-map-reduce/" target="_blank" rel="external">https://chamibuddhika.wordpress.com/2012/02/26/joins-with-map-reduce/</a></p>
<p>方便后人,方便自己!!!</p>
]]></content>
</entry>
<entry>
<title><![CDATA[eclipse配置运行HDFS]]></title>
<url>http://yoursite.com/2017/01/23/eclipse%E9%85%8D%E7%BD%AE%E8%BF%90%E8%A1%8CHDFS/</url>
<content type="html"><![CDATA[<p>前言<br>通常我们在运行HDFS都是编译源码并配置Hadoop环境变量,然后进入sbin目录中<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">start-dfs.sh</div></pre></td></tr></table></figure></p>
<a id="more"></a>
<h3 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h3><p>通常我们在运行HDFS都是编译源码并配置Hadoop环境变量,然后进入sbin目录中<br><figure class="highlight bash"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">start-dfs.sh</div></pre></td></tr></table></figure></p>
<p>用来启动hdfs的,如果想要看看NameNode的启动是不是需要配置远程调试了。(我以前弄过,但是之前没有写过博客)。</p>
<p>如果可以在本地就可以调试这些内容是不是更能了解内部是如何处理的。</p>
<h3 id="实战"><a href="#实战" class="headerlink" title="实战"></a>实战</h3><p>需要准备什么东东呢?<br>hadoop源码(我用的是2.7.0,你们随意)<br>IDEA或者eclipse等等用来导入源码,方便阅读。</p>
<p>假设以上内容已经齐全,现在开始进入主题。</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">mvn install -DskipTests</div><div class="line">mvn eclipse:eclipse -DdownloadSources=<span class="literal">true</span> -DdownloadJavadocs=<span class="literal">true</span></div></pre></td></tr></table></figure>
<p>导入之后,找到hdfs项目的NameNode和DataNode运行即可。<br><img src="https://github.com/basebase/img_server/blob/master/eclipse%E9%85%8D%E7%BD%AE%E8%BF%90%E8%A1%8CHDFS/01.png?raw=true" alt="项目图"><br><img src="https://github.com/basebase/img_server/blob/master/eclipse%E9%85%8D%E7%BD%AE%E8%BF%90%E8%A1%8CHDFS/02.png?raw=true" alt="NN"><br><img src="https://github.com/basebase/img_server/blob/master/eclipse%E9%85%8D%E7%BD%AE%E8%BF%90%E8%A1%8CHDFS/03.png?raw=true" alt="DT"></p>
<p>你以为这就结束了???</p>
<p>接下来,我来说说在部署和启动遇到的问题吧。</p>
<p>1、webapps/hdfs not found in CLASSPATH<br>出现这个异常是在启动NameNode的时候出现的,下面是具体抛出异常的代码(hadoop-common项目http包下的HttpServer2)</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div></pre></td><td class="code"><pre><div class="line"><span class="function"><span class="keyword">protected</span> String <span class="title">getWebAppsPath</span><span class="params">(String appName)</span> <span class="keyword">throws</span> FileNotFoundException </span>{</div><div class="line"> URL url = getClass().getClassLoader().getResource(<span class="string">"webapps/"</span> + appName);</div><div class="line"> <span class="keyword">if</span> (url == <span class="keyword">null</span>)</div><div class="line"> <span class="keyword">throw</span> <span class="keyword">new</span> FileNotFoundException(<span class="string">"webapps/"</span> + appName</div><div class="line"> + <span class="string">" not found in CLASSPATH"</span>);</div><div class="line"> String urlString = url.toString();</div><div class="line"> <span class="keyword">return</span> urlString.substring(<span class="number">0</span>, urlString.lastIndexOf(<span class="string">'/'</span>));</div><div class="line"> }</div></pre></td></tr></table></figure>
<p>当我debug的时候,他一直找我的hadoop-common的jar的webapp中的hdfs目录,所以导致我一直出现<br>找不到的异常,后来我重写一个该类指定hdfs项目的路径解决可以运行(如果还有其他的解决方案请在评论下方说明一下,万分感谢!), 还有个datanode目录没有找到的异常是在运行DataNode出现的<br>具体出现的位置我给忘记了,😝😝😝😝</p>
<p>2、Exception in thread “main” java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority.</p>
<p>出现此异常首先去stackoverflow找了下答案,如下:<br><figure class="highlight xml"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div></pre></td><td class="code"><pre><div class="line"><span class="tag"><<span class="name">configuration</span>></span></div><div class="line"> <span class="tag"><<span class="name">property</span>></span></div><div class="line"> <span class="tag"><<span class="name">name</span>></span>fs.defaultFS<span class="tag"></<span class="name">name</span>></span></div><div class="line"> <span class="tag"><<span class="name">value</span>></span>hdfs://10.100.20.168/<span class="tag"></<span class="name">value</span>></span></div><div class="line"> <span class="tag"></<span class="name">property</span>></span></div><div class="line"><span class="tag"></<span class="name">configuration</span>></span></div></pre></td></tr></table></figure></p>
<p>但是对我是无效的,后来知道原因我使用的是core-default.xml的配置没有生效。<br>但是一开始我就是defualt的配置也能正常运行。。。(奇怪的问题)</p>
<p>如下代码是抛出异常的代码:</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div></pre></td><td class="code"><pre><div class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">static</span> InetSocketAddress <span class="title">getAddress</span><span class="params">(URI filesystemURI)</span> </span>{</div><div class="line"> String authority = filesystemURI.getAuthority();</div><div class="line"> <span class="keyword">if</span> (authority == <span class="keyword">null</span>) {</div><div class="line"> <span class="keyword">throw</span> <span class="keyword">new</span> IllegalArgumentException(String.format(</div><div class="line"> <span class="string">"Invalid URI for NameNode address (check %s): %s has no authority."</span>,</div><div class="line"> FileSystem.FS_DEFAULT_NAME_KEY, filesystemURI.toString()));</div><div class="line"> }</div><div class="line"> <span class="keyword">if</span> (!HdfsConstants.HDFS_URI_SCHEME.equalsIgnoreCase(</div><div class="line"> filesystemURI.getScheme())) {</div><div class="line"> <span class="keyword">throw</span> <span class="keyword">new</span> IllegalArgumentException(String.format(</div><div class="line"> <span class="string">"Invalid URI for NameNode address (check %s): %s is not of scheme '%s'."</span>,</div><div class="line"> FileSystem.FS_DEFAULT_NAME_KEY, filesystemURI.toString(),</div><div class="line"> HdfsConstants.HDFS_URI_SCHEME));</div><div class="line"> }</div><div class="line"> <span class="keyword">return</span> getAddress(authority);</div><div class="line"> }</div></pre></td></tr></table></figure>
<p>至于为什么会这样,在FileSystem类中有这么一段<br><figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">final</span> String FS_DEFAULT_NAME_KEY = </div><div class="line"> CommonConfigurationKeys.FS_DEFAULT_NAME_KEY;</div><div class="line"> <span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">final</span> String DEFAULT_FS = </div><div class="line"> CommonConfigurationKeys.FS_DEFAULT_NAME_DEFAULT;</div><div class="line"></div><div class="line"><span class="comment">/*</span></div><div class="line"> // CommonConfigura.java</div><div class="line"> public static final String FS_DEFAULT_NAME_KEY = "fs.defaultFS";</div><div class="line"> public static final String FS_DEFAULT_NAME_DEFAULT = "file:///";</div><div class="line"></div><div class="line">*/</div><div class="line"></div><div class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">static</span> URI <span class="title">getDefaultUri</span><span class="params">(Configuration conf)</span> </span>{</div><div class="line"> <span class="comment">// 这里就是创建URI对象, 各位可以单独写个Test来看看创建的对象数据。</span></div><div class="line"> <span class="comment">// URI uri = new URI("file:///");</span></div><div class="line"> <span class="keyword">return</span> URI.create(fixName(conf.get(FS_DEFAULT_NAME_KEY, DEFAULT_FS)));</div><div class="line"> }</div></pre></td></tr></table></figure></p>
<p>至此,我这里就把我遇到的问题全部说完了。</p>
<p>最后,在给大家演示一下,我运行hadoop fs -put localPath hdfsPath是如何断点的。<br><img src="https://github.com/basebase/img_server/blob/master/eclipse%E9%85%8D%E7%BD%AE%E8%BF%90%E8%A1%8CHDFS/04.png?raw=true" alt="put"><br><img src="https://github.com/basebase/img_server/blob/master/eclipse%E9%85%8D%E7%BD%AE%E8%BF%90%E8%A1%8CHDFS/05.png?raw=true" alt="put-d"></p>
<h3 id="链接"><a href="#链接" class="headerlink" title="链接"></a>链接</h3><p><a href="https://wiki.apache.org/hadoop/EclipseEnvironment" target="_blank" rel="external">https://wiki.apache.org/hadoop/EclipseEnvironment</a></p>
]]></content>
</entry>
<entry>
<title><![CDATA[java自旋锁]]></title>
<url>http://yoursite.com/2016/11/14/java%E8%87%AA%E6%97%8B%E9%94%81/</url>
<content type="html"><![CDATA[<p>此文章不涉及与互斥锁等比比较,只是单纯的介绍一个自旋锁,如果想要了解更多可以点击参考链接<br><a id="more"></a></p>
<h3 id="一个自旋锁例子"><a href="#一个自旋锁例子" class="headerlink" title="一个自旋锁例子"></a>一个自旋锁例子</h3><p>自旋锁的介绍原理等过程我就不在此介绍了,下面的参考已经写的非常不错了!</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"></div><div class="line"><span class="comment">/***</span></div><div class="line"> * 自旋锁</div><div class="line"> * <span class="doctag">@author</span> Joker</div><div class="line"> *</div><div class="line"> */</div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">SpinLock</span> </span>{</div><div class="line"> </div><div class="line"> AtomicReference<Thread> owner = <span class="keyword">new</span> AtomicReference<Thread>();</div><div class="line"> <span class="keyword">private</span> <span class="keyword">int</span> count ;</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">lock</span><span class="params">()</span> </span>{</div><div class="line"> Thread currentThread = Thread.currentThread();</div><div class="line"> System.out.println(<span class="string">"lock() -> "</span> + currentThread.getName());</div><div class="line"> <span class="keyword">if</span> (currentThread == owner.get()) {</div><div class="line"> count++; <span class="comment">// 获取锁的次数</span></div><div class="line"> <span class="keyword">return</span> ;</div><div class="line"> }</div><div class="line"> </div><div class="line"> <span class="comment">// 当线程越来越多由于while循环会浪费cpu时间片,compareAndSet需要多次对同一内存进行访问</span></div><div class="line"> <span class="keyword">while</span> (!owner.compareAndSet(<span class="keyword">null</span>, currentThread)) {</div><div class="line"> </div><div class="line"> }</div><div class="line"> }</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">unLock</span><span class="params">()</span> </span>{</div><div class="line"> Thread currentThread = Thread.currentThread();</div><div class="line"> System.out.println(<span class="string">"unLock() -> "</span> + currentThread.getName());</div><div class="line"> <span class="keyword">if</span> (currentThread == owner.get()) {</div><div class="line"> <span class="keyword">if</span> (count > <span class="number">0</span>) {</div><div class="line"> count--;</div><div class="line"> } <span class="keyword">else</span> {</div><div class="line"> owner.compareAndSet(currentThread, <span class="keyword">null</span>);</div><div class="line"> }</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line">}</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">SpinLockTest</span> <span class="keyword">implements</span> <span class="title">Runnable</span></span>{</div><div class="line"></div><div class="line"> <span class="keyword">static</span> <span class="keyword">int</span> sum ;</div><div class="line"> <span class="keyword">private</span> SpinLock lock;</div><div class="line"> </div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="title">SpinLockTest</span><span class="params">(SpinLock lock)</span> </span>{</div><div class="line"> <span class="keyword">this</span>.lock = lock;</div><div class="line"> }</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">run</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">this</span>.lock.lock();</div><div class="line"> <span class="keyword">this</span>.lock.lock();</div><div class="line"> System.out.println(<span class="string">"当前线程 "</span> + Thread.currentThread().getName() + <span class="string">" start..."</span>);</div><div class="line"> sum++;</div><div class="line"> System.out.println(<span class="string">"当前线程 "</span> + Thread.currentThread().getName() + <span class="string">" end..."</span>);</div><div class="line"> </div><div class="line"> <span class="keyword">this</span>.lock.unLock();</div><div class="line"> <span class="keyword">this</span>.lock.unLock();</div><div class="line"> }</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title">main</span><span class="params">(String[] args)</span> <span class="keyword">throws</span> InterruptedException </span>{</div><div class="line"> SpinLock lock = <span class="keyword">new</span> SpinLock();</div><div class="line"> <span class="keyword">for</span> (<span class="keyword">int</span> i = <span class="number">1</span>; i < <span class="number">5</span>; i++) {</div><div class="line"> SpinLockTest lockTest = <span class="keyword">new</span> SpinLockTest(lock);</div><div class="line"> Thread t = <span class="keyword">new</span> Thread(lockTest, <span class="string">"thread-lock-"</span>+i);</div><div class="line"> t.start();</div><div class="line"> }</div><div class="line"> </div><div class="line"> Thread.sleep(<span class="number">1000</span>);</div><div class="line"> System.out.println(sum);</div><div class="line"> }</div><div class="line"></div><div class="line">}</div></pre></td></tr></table></figure>
<p>最后附上一张我自己画的一张运行图</p>
<p><img src="https://github.com/basebase/img_server/blob/master/java%E8%87%AA%E6%97%8B%E9%94%81/java%E8%87%AA%E6%97%8B%E9%94%81.png?raw=true" alt="java自旋锁"></p>
<h3 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h3><p><a href="http://www.cnblogs.com/cposture/p/SpinLock.html#_label0" target="_blank" rel="external">http://www.cnblogs.com/cposture/p/SpinLock.html#_label0</a></p>
]]></content>
</entry>
<entry>
<title><![CDATA[java finalize方法]]></title>
<url>http://yoursite.com/2016/08/27/java-finalize%E6%96%B9%E6%B3%95/</url>
<content type="html"><![CDATA[<p>为什么写这篇文章?<br>要说finalize方法我想做java的都知道,那么finalize方法会不会执行,如果会什么时候执行?如果重写finalize方法又有什么严重的后果? </p>
<a id="more"></a>
<h3 id="为什么写这篇文章"><a href="#为什么写这篇文章" class="headerlink" title="为什么写这篇文章?"></a>为什么写这篇文章?</h3><p>要说finalize方法我想做java的都知道,那么finalize方法会不会执行,如果会什么时候执行?如果重写finalize方法又有什么严重的后果? </p>
<p>题外话:以前看Java GC相关内容主要为的是应付面试而已,不过最近有个同事提了个问题(问题你么先YY),但他只说了对象会被回收,具体细节并没有说出,进而引发我再次探究GC<br>估计下次会以此问题展开讨论,由于我也是个菜鸟需要大家指出文章中的不足,谢谢!</p>
<h3 id="Java-finalize方法"><a href="#Java-finalize方法" class="headerlink" title="Java finalize方法"></a>Java finalize方法</h3><p>finalize方法是Object类的方法,任何类都是重写finalize方法,实现自己想要的功能。<br>默认的finalize方法什么也没做</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">finalize</span><span class="params">()</span> <span class="keyword">throws</span> Throwable </span>{ }</div></pre></td></tr></table></figure>
<p>但是一般都不建议自己重写finalize方法,由于在清理对象时候无法保证finalize方法一定会被执行。</p>
<p>现在我们有一段小程序非常的简单,就是打印一句话,然后程序就结束了。<br>那么,对象会被回收吗?</p>
<p>什么时候执行GC我么是不清楚的,根据不同的算法有不同的调度,有的是根据时间调度,有的是根据<br>内存使用的情况进行调度。</p>
<p>不过让我来做的话,我更倾向于后者,不管运行多长时间只要内存没到我指定的阈值大小我就不执行,<br>现在的这个想法来源hadoop的spill,假设有100m的内存使用但是只要达到上限80m的内存用量,<br>那么我就开始执行GC。【只是我的想法,哈哈~】</p>
<p>那好,无论是根据时间又或者是内存大小进行GC,但是我么就一段输出代码,程序结束了估计也不会<br>执行GC线程进行清理吧!</p>
<p>没有执行GC也就是无法执行到finalize方法了。</p>
<p>那么,有什么方法可以执行gc呢?方法是有的,但是也不能保证一定会执行gc,只能说会催促进而执行<br>GC。</p>
<p>这也就是我么常说的<br><figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">System.gc()</div></pre></td></tr></table></figure></p>
<p>当然还存在其它方法,在最后我会将我参考链接发出。<br>那么,什么样的对象会被执行finalize方法呢?finalize方法又会被执行多少次呢?</p>
<h3 id="对象销毁过程"><a href="#对象销毁过程" class="headerlink" title="对象销毁过程"></a>对象销毁过程</h3><p>对象的销毁过程中,按照对象的finalize执行情况,可以分为以下几种,系统会记录对象的<br>对应状态。</p>
<p>1、unfinalized 没有执行finalize,系统也不准备执行。<br>2、finalizable 可以执行finalize了,系统会在随后的某个时间执行finalize。<br>3、finalized该对象的finalize已经被执行了。 </p>
<p>GC怎么来保持对finalizable的对象的追踪呢。GC有一个Queue,<br>叫做F-Queue,所有对象在变为finalizable的时候会加入到该Queue,然后等待GC执行它的<br>finalize方法。</p>
<p>这时我们引入了对对象的另外一种记录分类,系统可以检查到一个对象属于哪一种。</p>
<p>a.reachable:活动的对象引用链可以到达的对象,包括所有线程当前栈的局部变量,<br>所有的静态变量等等。 </p>
<p>b.finalizer-reachable除了reachable外,从F-Queue可以通过引用到达的对象。<br>c.unreachable其它的对象</p>
<p><img src="https://github.com/basebase/img_server/blob/master/java-finalize%E6%96%B9%E6%B3%95/gc.gif?raw=true" alt="转换过程"></p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div></pre></td><td class="code"><pre><div class="line">1.首先,所有的对象都是从Reachable+Unfinalized走向死亡之路的。</div><div class="line"></div><div class="line">2.当前活动对象不可达时,对象可以从Reachable状态变到F-Reachable或者Unreachable状态。</div><div class="line"></div><div class="line">3.当对象为非Reachable+Unfinalized时,GC会把它移入F-Queue,</div><div class="line"> 状态变为F-Reachable+Finalizable。</div><div class="line"></div><div class="line">4.好了,关键的来了,任何时候,GC都可以从F-Queue中拿到一个Finalizable的对象,</div><div class="line"> 标记它为Finalized,然后执行它的finalize方法,由于该对象在这个线程中又可达了,</div><div class="line"> 于是该对象变成Reachable了(并且Finalized)。而finalize方法执行时,又有可能把其它的F-Reachable的对象变为一个Reachable的,这个叫做对象再生。</div><div class="line"></div><div class="line">5.当一个对象在Unreachable+Unfinalized时,如果该对象使用的是默认的Object的finalize,</div><div class="line"> 或者虽然重写了,但是新的实现什么也不干。为了性能,GC可以把该对象直接变到Reclaimed状态直接销毁,而不用加入到F-Queue等待GC做进一步处理。</div><div class="line"></div><div class="line">6.从状态图看出,不管怎么折腾,任意一个对象的finalize只至多执行一次,一旦对象变为Finalized</div><div class="line"> 就怎么也不会在回到F-Queue去了。当然没有机会再执行finalize了。 </div><div class="line"></div><div class="line">7.当对象处于Unreachable+Finalized时,该对象离真正的死亡不远了。GC可以安全的回收该对象的</div><div class="line"> 内存了。进入Reclaimed。</div></pre></td></tr></table></figure>
<h3 id="实践"><a href="#实践" class="headerlink" title="实践"></a>实践</h3><figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">Test1</span> </span>{</div><div class="line"> Test2 t2 ;</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="title">Test1</span><span class="params">(Test2 t2)</span> </span>{</div><div class="line"> <span class="keyword">this</span>.t2 = t2;</div><div class="line"> }</div><div class="line"> </div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">finalize</span><span class="params">()</span> <span class="keyword">throws</span> Throwable </span>{</div><div class="line"> System.out.println(<span class="string">"Test1 finalize..."</span>);</div><div class="line"> Test3.t1 = <span class="keyword">this</span>;</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">Test2</span> </span>{</div><div class="line"> String name;</div><div class="line"> <span class="keyword">int</span> age;</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="title">Test2</span><span class="params">(String name, <span class="keyword">int</span> age)</span> </span>{</div><div class="line"> <span class="keyword">this</span>.name = name;</div><div class="line"> <span class="keyword">this</span>.age = age;</div><div class="line"> }</div><div class="line"> </div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">finalize</span><span class="params">()</span> <span class="keyword">throws</span> Throwable </span>{</div><div class="line"> System.out.println(<span class="string">"Test2 finalize..."</span>);</div><div class="line"> }</div><div class="line"> </div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> String <span class="title">toString</span><span class="params">()</span> </span>{</div><div class="line"> <span class="keyword">return</span> <span class="keyword">this</span>.name + <span class="string">"is "</span> + age;</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">Test3</span> </span>{</div><div class="line"> </div><div class="line"> <span class="keyword">static</span> Test1 t1;</div><div class="line"></div><div class="line">}</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">Main</span> </span>{</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title">main</span><span class="params">(String[] args)</span> <span class="keyword">throws</span> InterruptedException </span>{</div><div class="line"> </div><div class="line"> Test1 t1 = <span class="keyword">new</span> Test1(<span class="keyword">new</span> Test2(<span class="string">"joker"</span>, <span class="number">18</span>));</div><div class="line"> System.out.println(t1);</div><div class="line"> t1 = <span class="keyword">null</span>;</div><div class="line"> </div><div class="line"> System.gc();</div><div class="line"> Thread.sleep(<span class="number">10000</span>);</div><div class="line"> System.out.println(Test3.t1);</div><div class="line"> System.out.println(Test3.t1.t2);</div><div class="line"> </div><div class="line"> t1 = <span class="keyword">null</span>;</div><div class="line"> System.gc();</div><div class="line"> System.out.println(<span class="string">"done."</span>);</div><div class="line"> </div><div class="line"> </div><div class="line"> }</div><div class="line"></div><div class="line">}</div></pre></td></tr></table></figure>
<p>输出如下:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div></pre></td><td class="code"><pre><div class="line">cn.base.gc.test.Test1@6b04d3c8</div><div class="line">[GC 2642K->437K(251392K), 0.0012310 secs]</div><div class="line">[Full GC 437K->350K(251392K), 0.0102620 secs]</div><div class="line">Test1 finalize...</div><div class="line">Test2 finalize...</div><div class="line">cn.base.gc.test.Test1@6b04d3c8</div><div class="line">jokeris 18</div><div class="line">[GC 2992K->446K(251392K), 0.0006800 secs]</div><div class="line">[Full GC 446K->350K(251392K), 0.0069980 secs]</div><div class="line">done.</div></pre></td></tr></table></figure>
<p>可以看到的是我们在释放test1的时候成员对象test2也一起被回收了,由于test1重写了finalize<br>方法,在最后test1又复活了。</p>
<p>由于在GC Root中又有引用链起死回生,但是我么再一次设置null并执行gc可以看到test1对象<br>没有在进入finalize方法了。</p>
<h3 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h3><p>finalize方法不是每次都会执行的,使用System.gc()<br>也只不过是加快gc调用,并且重写finalize方法最好不要使对象再生,这样容易造成<br>对象的生命周期混乱!</p>
<h3 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h3><p><a href="http://mazhuang.org/2015/12/15/java-object-finalize/" target="_blank" rel="external">http://mazhuang.org/2015/12/15/java-object-finalize/</a><br><a href="http://bijian1013.iteye.com/blog/2289661" target="_blank" rel="external">http://bijian1013.iteye.com/blog/2289661</a></p>
]]></content>
</entry>
<entry>
<title><![CDATA[mapreduce计算uv]]></title>
<url>http://yoursite.com/2016/08/23/mapreduce%E8%AE%A1%E7%AE%97uv/</url>
<content type="html"><![CDATA[<p>为什么写这篇文件?<br>我们在统计的时候pv和uv可以说的最基础的也是最常见的,相信做数据的都知道。这种需求<br>我们一般就是使用hive进行统计就完事了,非常的简单。<br>根据url计算每个页面的访问次数和独立访客用户数。</p>
<a id="more"></a>
<h3 id="为什么写这篇文件"><a href="#为什么写这篇文件" class="headerlink" title="为什么写这篇文件?"></a>为什么写这篇文件?</h3><p>我们在统计的时候pv和uv可以说的最基础的也是最常见的,相信做数据的都知道。这种需求<br>我们一般就是使用hive进行统计就完事了,非常的简单。<br>根据url计算每个页面的访问次数和独立访客用户数。</p>
<figure class="highlight sql"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">select</span> <span class="keyword">count</span>(gi) <span class="keyword">as</span> <span class="string">'pv'</span>, <span class="keyword">count</span>(<span class="keyword">distinct</span> gi) <span class="keyword">as</span> <span class="string">'uv'</span> <span class="keyword">from</span> </div><div class="line"><span class="keyword">table</span> <span class="keyword">where</span> cdate = <span class="string">'2016-06-01'</span> <span class="keyword">group</span> <span class="keyword">by</span> <span class="keyword">url</span></div></pre></td></tr></table></figure>
<p>那好,我们通过mapreduce如何计算呢?<br>我想我大多数人都是通过在reduce中使用Set或者List进行判断是否在集合中存在,<br>如果不存在那么就加1。<br>事实却是如此,我搜索发现很多blog都是此方法并且内容大致相同,包括我最开始写的mapreduce也是<br>按照这种方法做的。</p>
<p>但是,使用这种方法做数据量小看不出问题,但是数据量一旦非常大就马上出现问题。<br>因为你的数据放在了内存,很容易就oom了。</p>
<p>其实我们需要通过两个mapreduce进行计算。<br>第一个map就是分割url+uid作为key,value为1<br>数据格式如下:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">http://www.google.com,zhangsan 1</div><div class="line">http://www.google.com,zhangsan 1</div><div class="line">http://www.google.com,zhangsan 1</div></pre></td></tr></table></figure>
<p>相同的key值发送到同一个reduce中,这样的话zhangsan的数据都为1了,reduce不用做什么就是<br>把key写入就行。</p>
<p>然后到了第二个map中,我们将第一个reduce的数据进行拆解就得到了url和uid的数据了<br>由于在第一个mr中已经将相同的uid和url归为一类,所以不会存在重复数据,所以这里就和<br>wordcount一样计算就行了。</p>
<h3 id="实践"><a href="#实践" class="headerlink" title="实践"></a>实践</h3><p>上面已经说了这么多了,是不是感觉很乏味了。来看看代码醒醒脑吧,嘿嘿嘿~</p>
<p>使用hadoop2.7.0</p>
<p>测试数据:<br><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div></pre></td><td class="code"><pre><div class="line">http://www.google.com,2016-01-02,dsadasd-dasd-as-das</div><div class="line">https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-name-node/,2016-01-02,000-111-11-22</div><div class="line">http://www.jd.com/?keyword=dadas&keywordid=34879410794&re_dcp=202m0QjIIg==&traffic_source=1004&test=1&enc=utf8&cu=true&utm_source=baidu-search&utm_medium=cpc&utm_campaign=t_262767352_baidusearch&utm_term=34879410794_0_b0d37d1995654fdb9c013c4eb7544071,2016-01-02,dasdsa-ds-ad-as-da</div><div class="line">http://mall.jd.com/index-56654.html,2016-01-02,d99dsa-dsdasdsa-dasdj</div><div class="line">http://mall.jd.com/index-56654.html,2016-01-02,d99dsa-dsdasdsa-dasdj</div><div class="line">http://mall.jd.com/index-56654.html,2016-01-02,d99dsa-dsddddd-dsss</div><div class="line">http://mall.jd.com/index-56654.html,2016-01-02,d99dsa-dsdasdsa-dasdj</div><div class="line">http://item.jd.com/3148810.html,2016-01-02,d99dsa-dsdasdsa-dasdj</div><div class="line">http://item.jd.com/3148810.html,2016-01-02,d99dsa-dsdasdasda-sadas</div><div class="line">http://item.jd.com/3148762.html,2016-01-02,d99dsa-dsdasdsa-xxxx</div></pre></td></tr></table></figure></p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">import</span> java.io.IOException;</div><div class="line"></div><div class="line"><span class="keyword">import</span> org.apache.hadoop.io.LongWritable;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.io.Text;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.Mapper;</div><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">UVMapper</span> <span class="keyword">extends</span> <span class="title">Mapper</span><<span class="title">Object</span>, <span class="title">Text</span>, <span class="title">Text</span>, <span class="title">LongWritable</span>> </span>{</div><div class="line"> </div><div class="line"> <span class="keyword">private</span> Text k = <span class="keyword">new</span> Text();</div><div class="line"> <span class="keyword">private</span> LongWritable v = <span class="keyword">new</span> LongWritable(<span class="number">1</span>);</div><div class="line"> </div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">map</span><span class="params">(Object key, Text value, Context context)</span></span></div><div class="line"> <span class="keyword">throws</span> IOException, InterruptedException {</div><div class="line"> </div><div class="line"> String line = value.toString();</div><div class="line"> String[] tokens = line.split(<span class="string">","</span>);</div><div class="line"> </div><div class="line"> <span class="comment">// url + uid</span></div><div class="line"> k.set(tokens[<span class="number">0</span>] + <span class="string">","</span> + tokens[<span class="number">2</span>]);</div><div class="line"> context.write(k, v);</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">import</span> java.io.IOException;</div><div class="line"></div><div class="line"><span class="keyword">import</span> org.apache.hadoop.io.LongWritable;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.io.NullWritable;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.io.Text;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.Reducer;</div><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">UVReducer</span> <span class="keyword">extends</span> <span class="title">Reducer</span><<span class="title">Text</span>, <span class="title">LongWritable</span>, <span class="title">Text</span>, <span class="title">NullWritable</span>> </span>{</div><div class="line"> </div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">reduce</span><span class="params">(Text key, Iterable<LongWritable> values, Context context)</span> <span class="keyword">throws</span> IOException, InterruptedException </span>{</div><div class="line"> context.write(key, NullWritable.get());</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">import</span> java.io.IOException;</div><div class="line"></div><div class="line"><span class="keyword">import</span> org.apache.hadoop.io.LongWritable;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.io.Text;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.Mapper;</div><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">UVMapperUp</span> <span class="keyword">extends</span> <span class="title">Mapper</span><<span class="title">Object</span>, <span class="title">Text</span>, <span class="title">Text</span>, <span class="title">LongWritable</span>> </span>{</div><div class="line"> </div><div class="line"> <span class="keyword">private</span> Text k = <span class="keyword">new</span> Text();</div><div class="line"> <span class="keyword">private</span> LongWritable v = <span class="keyword">new</span> LongWritable(<span class="number">1</span>);</div><div class="line"> </div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">map</span><span class="params">(Object key, Text value, Context context)</span></span></div><div class="line"> <span class="keyword">throws</span> IOException, InterruptedException {</div><div class="line"> </div><div class="line"> String line = value.toString();</div><div class="line"> String[] tokens = line.split(<span class="string">","</span>);</div><div class="line"> </div><div class="line"> <span class="keyword">if</span> (tokens.length != <span class="number">2</span>) {</div><div class="line"> <span class="keyword">return</span> ;</div><div class="line"> }</div><div class="line"> </div><div class="line"> String url = tokens[<span class="number">0</span>];</div><div class="line"> </div><div class="line"> k.set(url);</div><div class="line"> context.write(k, v);</div><div class="line"> </div><div class="line"> }</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">import</span> java.io.IOException;</div><div class="line"></div><div class="line"><span class="keyword">import</span> org.apache.hadoop.io.LongWritable;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.io.Text;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.Reducer;</div><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">UVReducerUp</span> <span class="keyword">extends</span> <span class="title">Reducer</span><<span class="title">Text</span>, <span class="title">LongWritable</span>, <span class="title">Text</span>, <span class="title">LongWritable</span>> </span>{</div><div class="line"> </div><div class="line"> <span class="keyword">private</span> LongWritable res = <span class="keyword">new</span> LongWritable();</div><div class="line"> </div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">protected</span> <span class="keyword">void</span> <span class="title">reduce</span><span class="params">(Text key, Iterable<LongWritable> values, Context context)</span> <span class="keyword">throws</span> IOException, InterruptedException </span>{</div><div class="line"> <span class="keyword">long</span> sum = <span class="number">0</span>;</div><div class="line"> <span class="keyword">for</span> (LongWritable val : values) {</div><div class="line"> sum += val.get();</div><div class="line"> }</div><div class="line"> </div><div class="line"> res.set(sum);</div><div class="line"> </div><div class="line"> context.write(key, res);</div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div><div class="line">51</div><div class="line">52</div><div class="line">53</div><div class="line">54</div><div class="line">55</div><div class="line">56</div><div class="line">57</div><div class="line">58</div><div class="line">59</div><div class="line">60</div><div class="line">61</div><div class="line">62</div><div class="line">63</div><div class="line">64</div><div class="line">65</div><div class="line">66</div><div class="line">67</div><div class="line">68</div><div class="line">69</div><div class="line">70</div><div class="line">71</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">import</span> org.apache.hadoop.conf.Configuration;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.conf.Configured;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.fs.Path;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.io.LongWritable;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.io.NullWritable;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.io.Text;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.Job;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.lib.input.FileInputFormat;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.lib.input.TextInputFormat;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.util.Tool;</div><div class="line"><span class="keyword">import</span> org.apache.hadoop.util.ToolRunner;</div><div class="line"></div><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">UVApp</span> <span class="keyword">extends</span> <span class="title">Configured</span> <span class="keyword">implements</span> <span class="title">Tool</span> </span>{</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title">main</span><span class="params">(String[] args)</span> </span>{</div><div class="line"> <span class="keyword">try</span> {</div><div class="line"> </div><div class="line"> args = <span class="keyword">new</span> String[]{<span class="string">"in/browse.txt"</span>, <span class="string">"uv_out"</span>, <span class="string">"f_uv_out"</span>};</div><div class="line"> ToolRunner.run(<span class="keyword">new</span> Configuration(), <span class="keyword">new</span> UVApp(), args);</div><div class="line"> } <span class="keyword">catch</span> (Exception e) {</div><div class="line"> e.printStackTrace();</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">int</span> <span class="title">run</span><span class="params">(String[] args)</span> <span class="keyword">throws</span> Exception </span>{</div><div class="line"> </div><div class="line"> </div><div class="line"> Configuration conf = <span class="keyword">new</span> Configuration();</div><div class="line"> Job job1 = Job.getInstance(conf, <span class="string">"uv"</span>);</div><div class="line"> Job job2 = Job.getInstance(conf, <span class="string">"uv"</span>);</div><div class="line"> </div><div class="line"> job1.setJarByClass(UVApp.class);</div><div class="line"> job2.setJarByClass(UVApp.class);</div><div class="line"> </div><div class="line"> job1.setMapperClass(UVMapper.class);</div><div class="line"> job1.setReducerClass(UVReducer.class);</div><div class="line"> </div><div class="line"> job2.setMapperClass(UVMapperUp.class);</div><div class="line"> job2.setReducerClass(UVReducerUp.class);</div><div class="line"> </div><div class="line"> job1.setMapOutputKeyClass(Text.class);</div><div class="line"> job1.setMapOutputValueClass(LongWritable.class);</div><div class="line"> </div><div class="line"> job2.setMapOutputKeyClass(Text.class);</div><div class="line"> job2.setOutputValueClass(LongWritable.class);</div><div class="line"> </div><div class="line"> job1.setOutputKeyClass(Text.class);</div><div class="line"> job1.setOutputValueClass(NullWritable.class);</div><div class="line"> </div><div class="line"> job2.setOutputKeyClass(Text.class);</div><div class="line"> job2.setOutputValueClass(LongWritable.class);</div><div class="line"> </div><div class="line"> FileInputFormat.addInputPath(job1, <span class="keyword">new</span> Path(args[<span class="number">0</span>]));</div><div class="line"> FileOutputFormat.setOutputPath(job1, <span class="keyword">new</span> Path(args[<span class="number">1</span>]));</div><div class="line"> </div><div class="line"> FileInputFormat.addInputPath(job2, <span class="keyword">new</span> Path(args[<span class="number">1</span>]));</div><div class="line"> FileOutputFormat.setOutputPath(job2, <span class="keyword">new</span> Path(args[<span class="number">2</span>]));</div><div class="line"> </div><div class="line"> <span class="keyword">int</span> code = job1.waitForCompletion(<span class="keyword">true</span>) ? <span class="number">0</span> : <span class="number">1</span>;</div><div class="line"> </div><div class="line"> <span class="keyword">if</span>(code != <span class="number">0</span>){</div><div class="line"> System.exit(<span class="number">1</span>);</div><div class="line"> }</div><div class="line"> </div><div class="line"> <span class="keyword">return</span> job2.waitForCompletion(<span class="keyword">true</span>) ? <span class="number">0</span> : <span class="number">1</span>;</div><div class="line"> }</div><div class="line"></div><div class="line">}</div></pre></td></tr></table></figure>
<h3 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h3><p> 一般使用分布式框架表示我们数据是比较大的,放内存肯定是不合理的。<br> 看来代码质量有待提高!!!</p>
]]></content>
</entry>
<entry>
<title><![CDATA[storm整合kafka重复消费问题分析]]></title>
<url>http://yoursite.com/2016/08/11/storm%E6%95%B4%E5%90%88kafka%E9%87%8D%E5%A4%8D%E6%B6%88%E8%B4%B9%E9%97%AE%E9%A2%98%E5%88%86%E6%9E%90/</url>
<content type="html"><![CDATA[<p>为什么写这篇文章?<br>最近在整合storm+kafka一直纠结于重复数据的读取,重新启动topology更是把kafka的数据扫描一遍,<br>【如果线上逻辑较重,并且还要往数据库里面插入数据是不是有很多重复数据了!】</p>
<a id="more"></a>
<h3 id="为什么写这篇文章"><a href="#为什么写这篇文章" class="headerlink" title="为什么写这篇文章?"></a>为什么写这篇文章?</h3><p>最近在整合storm+kafka一直纠结于重复数据的读取,重新启动topology更是把kafka的数据扫描一遍,<br>【如果线上逻辑较重,并且还要往数据库里面插入数据是不是有很多重复数据了!】</p>
<h3 id="软件环境"><a href="#软件环境" class="headerlink" title="软件环境"></a>软件环境</h3><p>zookeeper-3.4.6.tar.gz<br>kafka_2.9.2-0.8.1.1<br>apache-storm-1.0.1.tar.gz</p>
<h3 id="实践出真知"><a href="#实践出真知" class="headerlink" title="实践出真知"></a>实践出真知</h3><p>那我们知道这kafka和storm都是依赖zk的,并且我们在创建topology的时候也是把offset写入到zk<br>但是一开始的程序是非常奇怪的,zk并没有创建我所指定的目录和id。</p>
<p>先来看一个”错误”的例子</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">package</span> cn.base.sk.ex03;</div><div class="line"></div><div class="line"><span class="keyword">import</span> java.util.Map;</div><div class="line"></div><div class="line"><span class="keyword">import</span> org.apache.storm.task.OutputCollector;</div><div class="line"><span class="keyword">import</span> org.apache.storm.task.TopologyContext;</div><div class="line"><span class="keyword">import</span> org.apache.storm.topology.OutputFieldsDeclarer;</div><div class="line"><span class="keyword">import</span> org.apache.storm.topology.base.BaseRichBolt;</div><div class="line"><span class="keyword">import</span> org.apache.storm.tuple.Tuple;</div><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">SplitBolt</span> <span class="keyword">extends</span> <span class="title">BaseRichBolt</span> </span>{</div><div class="line"></div><div class="line"> <span class="keyword">private</span> <span class="keyword">static</span> <span class="keyword">final</span> <span class="keyword">long</span> serialVersionUID = -<span class="number">1380001209433177193L</span>;</div><div class="line"></div><div class="line"> <span class="keyword">private</span> OutputCollector collector = <span class="keyword">null</span>;</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">prepare</span><span class="params">(Map stormConf, TopologyContext context, OutputCollector collector)</span> </span>{</div><div class="line"> <span class="keyword">this</span>.collector = collector;</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">execute</span><span class="params">(Tuple input)</span> </span>{</div><div class="line"> String word = input.getString(<span class="number">0</span>);</div><div class="line"> System.out.println(<span class="string">"source data => "</span> + word);</div><div class="line"> <span class="comment">//collector.ack(input);</span></div><div class="line"> }</div><div class="line"></div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">declareOutputFields</span><span class="params">(OutputFieldsDeclarer declarer)</span> </span>{</div><div class="line"></div><div class="line"> }</div><div class="line"></div><div class="line">}</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> cn.base.sk.ex03;</div><div class="line"></div><div class="line"><span class="keyword">import</span> java.util.Arrays;</div><div class="line"></div><div class="line"><span class="keyword">import</span> org.apache.storm.Config;</div><div class="line"><span class="keyword">import</span> org.apache.storm.LocalCluster;</div><div class="line"><span class="keyword">import</span> org.apache.storm.kafka.BrokerHosts;</div><div class="line"><span class="keyword">import</span> org.apache.storm.kafka.KafkaSpout;</div><div class="line"><span class="keyword">import</span> org.apache.storm.kafka.SpoutConfig;</div><div class="line"><span class="keyword">import</span> org.apache.storm.kafka.StringScheme;</div><div class="line"><span class="keyword">import</span> org.apache.storm.kafka.ZkHosts;</div><div class="line"><span class="keyword">import</span> org.apache.storm.spout.SchemeAsMultiScheme;</div><div class="line"><span class="keyword">import</span> org.apache.storm.topology.TopologyBuilder;</div><div class="line"></div><div class="line"><span class="keyword">import</span> cn.base.sk.ex02.MyKafkaTopology;</div><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">KafkaTopology</span> </span>{</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title">main</span><span class="params">(String[] args)</span> </span>{</div><div class="line"> String zks = <span class="string">"localhost:2181/kafka"</span>;</div><div class="line"> String topic = <span class="string">"topic2"</span>;</div><div class="line"> String zkRoot = <span class="string">"/topic2"</span>;</div><div class="line"> String id = <span class="string">"split"</span>;</div><div class="line"></div><div class="line"> BrokerHosts brokerHosts = <span class="keyword">new</span> ZkHosts(zks);</div><div class="line"> SpoutConfig spoutConf = <span class="keyword">new</span> SpoutConfig(brokerHosts, topic, zkRoot, id);</div><div class="line"> spoutConf.scheme = <span class="keyword">new</span> SchemeAsMultiScheme(<span class="keyword">new</span> StringScheme());</div><div class="line"> spoutConf.zkServers = Arrays.asList(<span class="keyword">new</span> String[] {<span class="string">"localhost"</span>});</div><div class="line"> spoutConf.zkPort = <span class="number">2181</span>;</div><div class="line"></div><div class="line"></div><div class="line"> TopologyBuilder builder = <span class="keyword">new</span> TopologyBuilder();</div><div class="line"> builder.setSpout(<span class="string">"kafka-spoutx"</span>, <span class="keyword">new</span> KafkaSpout(spoutConf));</div><div class="line"> builder.setBolt(<span class="string">"word-splitx"</span>, <span class="keyword">new</span> SplitBolt()).shuffleGrouping(<span class="string">"kafka-spoutx"</span>);</div><div class="line"></div><div class="line"> Config conf = <span class="keyword">new</span> Config();</div><div class="line"> String name = MyKafkaTopology.class.getSimpleName();</div><div class="line"> conf.setMaxTaskParallelism(<span class="number">3</span>);</div><div class="line"></div><div class="line"> LocalCluster cluster = <span class="keyword">new</span> LocalCluster();</div><div class="line"> cluster.submitTopology(name, conf, builder.createTopology());</div><div class="line"></div><div class="line"><span class="comment">// Utils.sleep(10000);</span></div><div class="line"><span class="comment">// cluster.shutdown();</span></div><div class="line"></div><div class="line"> }</div><div class="line">}</div><div class="line">`</div></pre></td></tr></table></figure>
<p>上面这个例子是无法在zk中创建/topic2/split的,至于为什么我在后面会说明。<br>由于也是最近几天才开始撸起来的所以我就各种搜索,在一个blog中找到了说明</p>
<p><font color="DeepPink" size="2">原文<br> 此处需要特别注意的是,要使用backtype.storm.topology.base.BaseBasicBolt对象作为父类,否则不会在zk记录偏移量offset数据。<br></font><br>后来我修改bolt继承该类确实在zk中创建出了topic,但是至于为什么并没有详细说明。<br>我们先来看看修改后的code。</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div></pre></td><td class="code"><pre><div class="line"></div><div class="line"><span class="keyword">package</span> cn.base.sk.ex02;</div><div class="line"></div><div class="line"><span class="keyword">import</span> java.util.Map;</div><div class="line"></div><div class="line"><span class="keyword">import</span> org.apache.commons.logging.Log;</div><div class="line"><span class="keyword">import</span> org.apache.commons.logging.LogFactory;</div><div class="line"><span class="keyword">import</span> org.apache.log4j.Logger;</div><div class="line"><span class="keyword">import</span> org.apache.storm.task.OutputCollector;</div><div class="line"><span class="keyword">import</span> org.apache.storm.task.TopologyContext;</div><div class="line"><span class="keyword">import</span> org.apache.storm.topology.BasicOutputCollector;</div><div class="line"><span class="keyword">import</span> org.apache.storm.topology.OutputFieldsDeclarer;</div><div class="line"><span class="keyword">import</span> org.apache.storm.topology.base.BaseBasicBolt;</div><div class="line"><span class="keyword">import</span> org.apache.storm.topology.base.BaseRichBolt;</div><div class="line"><span class="keyword">import</span> org.apache.storm.tuple.Fields;</div><div class="line"><span class="keyword">import</span> org.apache.storm.tuple.Tuple;</div><div class="line"><span class="keyword">import</span> org.apache.storm.tuple.Values;</div><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">KafkaWordSplitter</span> <span class="keyword">extends</span> <span class="title">BaseBasicBolt</span> </span>{</div><div class="line"></div><div class="line"> <span class="keyword">private</span> <span class="keyword">static</span> Log logger = LogFactory.getLog(KafkaWordSplitter.class);</div><div class="line"> <span class="keyword">private</span> OutputCollector collector;</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">execute</span><span class="params">(Tuple input, BasicOutputCollector collector)</span> </span>{</div><div class="line"> String line = input.getString(<span class="number">0</span>);</div><div class="line"> System.out.println(<span class="string">"RECV[kafka -> splitter] "</span> + line);</div><div class="line"></div><div class="line"> String[] words = line.split(<span class="string">","</span>);</div><div class="line"> <span class="keyword">for</span> (String word : words) {</div><div class="line"> System.out.println(<span class="string">"EMIT[splitter -> counter] "</span> + word);</div><div class="line"> collector.emit(<span class="keyword">new</span> Values(word, <span class="number">1</span>));</div><div class="line"> }</div><div class="line"></div><div class="line"><span class="comment">// collector.ack(input);</span></div><div class="line"> }</div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">declareOutputFields</span><span class="params">(OutputFieldsDeclarer declarer)</span> </span>{</div><div class="line"> declarer.declare(<span class="keyword">new</span> Fields(<span class="string">"word"</span>, <span class="string">"count"</span>));</div><div class="line"> }</div><div class="line"></div><div class="line">}</div><div class="line">`</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div><div class="line">51</div><div class="line">52</div><div class="line">53</div><div class="line">54</div><div class="line">55</div><div class="line">56</div><div class="line">57</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> cn.base.sk.ex02;</div><div class="line"></div><div class="line"><span class="keyword">import</span> java.util.HashMap;</div><div class="line"><span class="keyword">import</span> java.util.Iterator;</div><div class="line"><span class="keyword">import</span> java.util.Map;</div><div class="line"><span class="keyword">import</span> java.util.Map.Entry;</div><div class="line"><span class="keyword">import</span> java.util.concurrent.atomic.AtomicInteger;</div><div class="line"></div><div class="line"><span class="keyword">import</span> org.apache.commons.logging.Log;</div><div class="line"><span class="keyword">import</span> org.apache.commons.logging.LogFactory;</div><div class="line"><span class="keyword">import</span> org.apache.storm.task.OutputCollector;</div><div class="line"><span class="keyword">import</span> org.apache.storm.task.TopologyContext;</div><div class="line"><span class="keyword">import</span> org.apache.storm.topology.BasicOutputCollector;</div><div class="line"><span class="keyword">import</span> org.apache.storm.topology.OutputFieldsDeclarer;</div><div class="line"><span class="keyword">import</span> org.apache.storm.topology.base.BaseBasicBolt;</div><div class="line"><span class="keyword">import</span> org.apache.storm.topology.base.BaseRichBolt;</div><div class="line"><span class="keyword">import</span> org.apache.storm.tuple.Tuple;</div><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">WordCounter</span> <span class="keyword">extends</span> <span class="title">BaseBasicBolt</span> </span>{</div><div class="line"> <span class="keyword">private</span> <span class="keyword">static</span> Log logger = LogFactory.getLog(WordCounter.class);</div><div class="line"> <span class="keyword">private</span> OutputCollector collector;</div><div class="line"> <span class="keyword">private</span> Map<String, AtomicInteger> countMap;</div><div class="line"></div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">prepare</span><span class="params">(Map stormConf, TopologyContext context)</span> </span>{</div><div class="line"> countMap = <span class="keyword">new</span> HashMap<String, AtomicInteger>();</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="meta">@Override</span></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">cleanup</span><span class="params">()</span> </span>{</div><div class="line"> System.out.println(<span class="string">"The final result:"</span>);</div><div class="line"> Iterator<Entry<String, AtomicInteger>> iter = <span class="keyword">this</span>.countMap.entrySet().iterator();</div><div class="line"> <span class="keyword">while</span> (iter.hasNext()) {</div><div class="line"></div><div class="line"> Entry<String, AtomicInteger> entry = iter.next();</div><div class="line"> System.out.println(entry.getKey() + <span class="string">"\t:\t"</span> + entry.getValue().get());</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">execute</span><span class="params">(Tuple input, BasicOutputCollector collector)</span> </span>{</div><div class="line"> String word = input.getString(<span class="number">0</span>);</div><div class="line"> Integer count = input.getInteger(<span class="number">1</span>);</div><div class="line"></div><div class="line"> System.out.println(<span class="string">"RECV[splitter -> counter] "</span> + word + <span class="string">" : "</span> + count);</div><div class="line"> AtomicInteger ai = <span class="keyword">this</span>.countMap.get(word);</div><div class="line"> <span class="keyword">if</span> (ai == <span class="keyword">null</span>) {</div><div class="line"> ai = <span class="keyword">new</span> AtomicInteger(<span class="number">1</span>);</div><div class="line"> <span class="keyword">this</span>.countMap.put(word, ai);</div><div class="line"> }<span class="keyword">else</span> {</div><div class="line"> ai.addAndGet(count);</div><div class="line"><span class="comment">// collector.ack(input);</span></div><div class="line"> System.out.println(<span class="string">"CHECK statistics map: "</span> + <span class="keyword">this</span>.countMap);</div><div class="line"> }</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">declareOutputFields</span><span class="params">(OutputFieldsDeclarer declarer)</span> </span>{}</div><div class="line">}</div></pre></td></tr></table></figure>
<figure class="highlight java"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">package</span> cn.base.sk.ex02;</div><div class="line"></div><div class="line"><span class="keyword">import</span> org.apache.kafka.common.utils.Utils;</div><div class="line"><span class="keyword">import</span> org.apache.storm.Config;</div><div class="line"><span class="keyword">import</span> org.apache.storm.LocalCluster;</div><div class="line"><span class="keyword">import</span> org.apache.storm.kafka.BrokerHosts;</div><div class="line"><span class="keyword">import</span> org.apache.storm.kafka.KafkaSpout;</div><div class="line"><span class="keyword">import</span> org.apache.storm.kafka.SpoutConfig;</div><div class="line"><span class="keyword">import</span> org.apache.storm.kafka.StringScheme;</div><div class="line"><span class="keyword">import</span> org.apache.storm.kafka.ZkHosts;</div><div class="line"><span class="keyword">import</span> org.apache.storm.spout.SchemeAsMultiScheme;</div><div class="line"><span class="keyword">import</span> org.apache.storm.topology.TopologyBuilder;</div><div class="line"><span class="keyword">import</span> org.apache.storm.tuple.Fields;</div><div class="line"></div><div class="line"><span class="keyword">import</span> scala.actors.threadpool.Arrays;</div><div class="line"></div><div class="line"><span class="keyword">public</span> <span class="class"><span class="keyword">class</span> <span class="title">MyKafkaTopology</span> </span>{</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title">main</span><span class="params">(String[] args)</span> </span>{</div><div class="line"> String zks = <span class="string">"localhost:2181/kafka"</span>;</div><div class="line"> String topic = <span class="string">"topic1"</span>;</div><div class="line"> String zkRoot = <span class="string">"/topic1"</span>;</div><div class="line"> String id = <span class="string">"word"</span>;</div><div class="line"></div><div class="line"> BrokerHosts brokerHosts = <span class="keyword">new</span> ZkHosts(zks);</div><div class="line"> SpoutConfig spoutConf = <span class="keyword">new</span> SpoutConfig(brokerHosts, topic, zkRoot, id);</div><div class="line"> spoutConf.scheme = <span class="keyword">new</span> SchemeAsMultiScheme(<span class="keyword">new</span> StringScheme());</div><div class="line"> spoutConf.zkServers = Arrays.asList(<span class="keyword">new</span> String[] {<span class="string">"localhost"</span>});</div><div class="line"> spoutConf.zkPort = <span class="number">2181</span>;</div><div class="line"></div><div class="line"></div><div class="line"> TopologyBuilder builder = <span class="keyword">new</span> TopologyBuilder();</div><div class="line"> builder.setSpout(<span class="string">"kafka-spout"</span>, <span class="keyword">new</span> KafkaSpout(spoutConf));</div><div class="line"> builder.setBolt(<span class="string">"word-split"</span>, <span class="keyword">new</span> KafkaWordSplitter()).shuffleGrouping(<span class="string">"kafka-spout"</span>);</div><div class="line"> builder.setBolt(<span class="string">"word-count"</span>, <span class="keyword">new</span> WordCounter()).fieldsGrouping(<span class="string">"word-split"</span>, <span class="keyword">new</span> Fields(<span class="string">"word"</span>));</div><div class="line"></div><div class="line"> Config conf = <span class="keyword">new</span> Config();</div><div class="line"> String name = MyKafkaTopology.class.getSimpleName();</div><div class="line"> conf.setMaxTaskParallelism(<span class="number">3</span>);</div><div class="line"></div><div class="line"> LocalCluster cluster = <span class="keyword">new</span> LocalCluster();</div><div class="line"> cluster.submitTopology(name, conf, builder.createTopology());</div><div class="line"></div><div class="line"><span class="comment">// Utils.sleep(10000);</span></div><div class="line"><span class="comment">// cluster.shutdown();</span></div><div class="line"></div><div class="line"> }</div><div class="line">}</div></pre></td></tr></table></figure>
<p>原谅我写了两个例子吧!<br>好的,上面一大段代码是修改过的。此时进入zkcli已经创建出来了我们所需的路径<br>并且已经记录了offset</p>
<p><img src="https://github.com/basebase/img_server/blob/master/storm%E6%95%B4%E5%90%88kafka%E9%87%8D%E5%A4%8D%E6%B6%88%E8%B4%B9%E9%97%AE%E9%A2%98%E5%88%86%E6%9E%90_img/01.png?raw=true" alt="zk数据"></p>
<p>读取数据的时候就从这里开始了。<br>那好,为啥继承了BaseBasicBolt类就可以,而BaseRichBolt类就不行呢。</p>
<h3 id="走进源码"><a href="#走进源码" class="headerlink" title="走进源码"></a>走进源码</h3><p>首先看看KafkaSpout类的open方法做了一些初始化的工作<br>下图才是我么要看的</p>
<p><img src="https://github.com/basebase/img_server/blob/master/storm%E6%95%B4%E5%90%88kafka%E9%87%8D%E5%A4%8D%E6%B6%88%E8%B4%B9%E9%97%AE%E9%A2%98%E5%88%86%E6%9E%90_img/02.png?raw=true" alt="kafkaSpout!nextTuple"></p>
<p>不用在意其它方法,直接进入commit()方法<br><img src="https://github.com/basebase/img_server/blob/master/storm%E6%95%B4%E5%90%88kafka%E9%87%8D%E5%A4%8D%E6%B6%88%E8%B4%B9%E9%97%AE%E9%A2%98%E5%88%86%E6%9E%90_img/03.png?raw=true" alt="kafkaSpout!commit"></p>
<p>看到没, 只要if成立就会在zk中创建数据。但是为什么不能进入呢,来看看lastCompletedOffset<br><img src="https://github.com/basebase/img_server/blob/master/storm%E6%95%B4%E5%90%88kafka%E9%87%8D%E5%A4%8D%E6%B6%88%E8%B4%B9%E9%97%AE%E9%A2%98%E5%88%86%E6%9E%90_img/04.png?raw=true" alt="kafkaSpout!lastCompletedOffset"></p>
<p>当你debug到这里的时候首先获取的是第一个key,这个map的key是offset,value是timestamp<br>读一次会和上一次进行比较,最终在里面重新赋值最新的offset。</p>
<p>仔细观察,如果继承BaseRichSpout类,调用过后map的key依旧存在,而BaseBasicBolt会进行删除,如果不删除的话会在commit判断时候一直相等。</p>
<p>那么,是在什么时候进行删除的呢?如果是你,你会想在什么时候把这份数据进行删除?<br>对的,当我们确认完毕这条数据被消费后,我们可以进行删除了。</p>
<p>在进行ack之后,我们看到删除map的数据,这样就顺利的在zk里面创建并写入数据。<br><img src="https://github.com/basebase/img_server/blob/master/storm%E6%95%B4%E5%90%88kafka%E9%87%8D%E5%A4%8D%E6%B6%88%E8%B4%B9%E9%97%AE%E9%A2%98%E5%88%86%E6%9E%90_img/05.png?raw=true" alt="kafkaSpout!ack"></p>
<p>那么,如果我就想继承自BaseRichBolt类,那有办法实现吗?肯定的,你只需要自己ack一下就行了<br><img src="https://github.com/basebase/img_server/blob/master/storm%E6%95%B4%E5%90%88kafka%E9%87%8D%E5%A4%8D%E6%B6%88%E8%B4%B9%E9%97%AE%E9%A2%98%E5%88%86%E6%9E%90_img/06.png?raw=true" alt="UserBolt!ack"></p>
<p>ok,此时你在次打开zkcli查看就存在指定的目录和id,并且重启topology也不会重新读取历史。</p>
<h3 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h3><p>BaseBasicBolt没有提供ack而是隐示进行了调用,而BaseRichSpout需要显示调用。</p>
<h3 id="结尾"><a href="#结尾" class="headerlink" title="结尾"></a>结尾</h3><p>参考:<a href="http://www.howardliu.cn/a-few-notes-about-storm/" target="_blank" rel="external">http://www.howardliu.cn/a-few-notes-about-storm/</a></p>
]]></content>
</entry>
</search>