-
Notifications
You must be signed in to change notification settings - Fork 17
Expand file tree
/
Copy pathchapter11.py
More file actions
61 lines (46 loc) · 2.12 KB
/
chapter11.py
File metadata and controls
61 lines (46 loc) · 2.12 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#!/usr/bin/env python
# Chapter 11 Assignment: Extracting Data With Regular Expressions
# Finding Numbers in a Haystack
# In this assignment you will read through and parse a file with text and
# numbers. You will extract all the numbers in the file and compute the sum of
# the numbers.
# Data Files
# We provide two files for this assignment. One is a sample file where we give
# you the sum for your testing and the other is the actual data you need to
# process for the assignment.
# Sample data: http://python-data.dr-chuck.net/regex_sum_42.txt
# (There are 87 values with a sum=445822)
# Actual data: http://python-data.dr-chuck.net/regex_sum_201873.txt
# (There are 96 values and the sum ends with 156)
# These links open in a new window. Make sure to save the file into the same
# folder as you will be writing your Python program. Note: Each student will
# have a distinct data file for the assignment - so only use your own data file
# for analysis.
# Data Format
# The file contains much of the text from the introduction of the textbook
# except that random numbers are inserted throughout the text. Here is a sample
# of the output you might see:
'''
Why should you learn to write programs? 7746
12 1929 8827
Writing programs (or programming) is a very creative
7 and rewarding activity. You can write programs for
many reasons, ranging from making your living to solving
8837 a difficult data analysis problem to having fun to helping 128
someone else solve a problem. This book assumes that
everyone needs to know how to program ...
'''
# The sum for the sample text above is 27486. The numbers can appear anywhere
# in the line. There can be any number of numbers in each line (including none).
# Handling The Data
# The basic outline of this problem is to read the file, look for integers using
# the re.findall(), looking for a regular expression of '[0-9]+' and then
# converting the extracted strings to integers and summing up the integers.
import re
file = open('regex_sum_201873.txt', 'r')
sum = 0
for line in file:
numbers = re.findall('[0-9]+', line)
for number in numbers:
sum = sum + int(number)
print(sum)