-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathnlp2.py
More file actions
48 lines (34 loc) · 3.86 KB
/
nlp2.py
File metadata and controls
48 lines (34 loc) · 3.86 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# -*- coding: utf-8 -*-
"""
Created on Tue Jul 3 10:03:50 2018
@author: ahanmr
"""
import re
my_string="Let's 66 write 94 RegEx! Won't that be fun? I sure think so. Can you find 4 sentences? Or perhaps, all 19 words?"
# \ is used for or statement and can take multiple
#regular expressions and output the same.
patterns= ("\d+|\w+")
print(re.findall(patterns, my_string))
scene_one="SCENE 1: [wind] [clop clop clop] \nKING ARTHUR: Whoa there! [clop clop clop] \nSOLDIER #1: Halt! Who goes there?\nARTHUR: It is I, Arthur, son of Uther Pendragon, from the castle of Camelot. King of the Britons, defeator of the Saxons, sovereign of all England!\nSOLDIER #1: Pull the other one!\nARTHUR: I am, ... and this is my trusty servant Patsy. We have ridden the length and breadth of the land in search of knights who will join me in my court at Camelot. I must speak with your lord and master.\nSOLDIER #1: What? Ridden on a horse?\nARTHUR: Yes!\nSOLDIER #1: You're using coconuts!\nARTHUR: What?\nSOLDIER #1: You've got two empty halves of coconut and you're bangin' 'em together.\nARTHUR: So? We have ridden since the snows of winter covered this land, through the kingdom of Mercea, through--\nSOLDIER #1: Where'd you get the coconuts?\nARTHUR: We found them.\nSOLDIER #1: Found them? In Mercea? The coconut's tropical!\nARTHUR: What do you mean?\nSOLDIER #1: Well, this is a temperate zone.\nARTHUR: The swallow may fly south with the sun or the house martin or the plover may seek warmer climes in winter, yet these are not strangers to our land?\nSOLDIER #1: Are you suggesting coconuts migrate?\nARTHUR: Not at all. They could be carried.\nSOLDIER #1: What? A swallow carrying a coconut?\nARTHUR: It could grip it by the husk!\nSOLDIER #1: It's not a question of where he grips it! It's a simple question of weight ratios! A five ounce bird could not carry a one pound coconut.\nARTHUR: Well, it doesn't matter. Will you go and tell your master that Arthur from the Court of Camelot is here.\nSOLDIER #1: Listen. In order to maintain air-speed velocity, a swallow needs to beat its wings forty-three times every second, right?\nARTHUR: Please!\nSOLDIER #1: Am I right?\nARTHUR: I'm not interested!\nSOLDIER #2: It could be carried by an African swallow!\nSOLDIER #1: Oh, yeah, an African swallow maybe, but not a European swallow. That's my point.\nSOLDIER #2: Oh, yeah, I agree with that.\nARTHUR: Will you ask your master if he wants to join my court at Camelot?!\nSOLDIER #1: But then of course a-- African swallows are non-migratory.\nSOLDIER #2: Oh, yeah...\nSOLDIER #1: So they couldn't bring a coconut back anyway... [clop clop clop] \nSOLDIER #2: Wait a minute! Supposing two swallows carried it together?\nSOLDIER #1: No, they'd have to have it on a line.\nSOLDIER #2: Well, simple! They'd just use a strand of creeper!\nSOLDIER #1: What, held under the dorsal guiding feathers?\nSOLDIER #2: Well, why not?\n"
# Import necessary modules
from nltk.tokenize import word_tokenize
from nltk.tokenize import sent_tokenize
# Split scene_one into sentences: sentences
sentences = sent_tokenize(scene_one)
# Use word_tokenize to tokenize the fourth sentence: tokenized_sent
tokenized_sent = word_tokenize(sentences[3])
# Make a set of unique tokens in the entire scene: unique_tokens
unique_tokens = set(word_tokenize(scene_one))
# Print the unique tokens result
print(unique_tokens)
# Search for the first occurrence of "coconuts" in scene_one: match
match = re.search("coconuts", scene_one)
# Print the start and end indexes of match
print(match.start(),match.end())
# Write a regular expression to search for anything in square brackets: pattern1
pattern1 = r"\[.*]+"
# Use re.search to find the first text in square brackets
print(re.search(pattern1, scene_one))
# Find the script notation at the beginning of the fourth sentence and print it
pattern2 = r"[ARTHUR:]+"
print(re.match(pattern2, sentences[3]))