home | copyright ©2016, tim@menzies.us
overview |
syllabus |
src |
submit |
chat
Extend your Table reader (from last week) with methods to "normalize" numbers
and return distances between numbers
class Num
...
def norm(i,x):
tmp= (x - i.lo) / (i.up - i.lo + 10**-32)
if tmp > 1: return 1
elif tmp < 0: return 0
else: return tmp
def dist(i,x,y):
return i.norm(x) - i.norm(y)
def furthest(i,x) :
return i.up if x <(i.up-i.lo)/2 else i.loExtend your Sym class (from last week) to "normalize" symbols (actually, to do nothing at all)
class Sym
...
def norm(i,x) : return x
def dist(i,x,y) : return 0 if x==y else 1
def furthest(i,x): return "SoMEcrazyTHing"Implement Aha's algorithm (http://goo.gl/ZspOeL, p42, first para) for the distance between two rows in a table of data (where the columns are symbols or numbers)
For what its worth, here's that code from one of my tools:
- missing values are denoted "?"
- my
i.colsheaders contain the column numbers of the headers. Sor1[col.pos]means "ask my column header where to look in the row". - All my
i.colsNums orSyms so i can defer to them to find the distances - I divide Aha's term by n^1/2 (where n is th e number of variables) so that my distances range 0 to 1.
UNKNOWN = "?"
def distance(i,r1,r2,f=2):
d,n = 0, 10**-32
for col in i.cols:
x, y = r1[col.pos], r2[col.pos]
if x is UNKNOWN and y is UNKNOWN:
continue
if x is UNKNOWN: x=col.my.furthest(y)
if y is UNKNOWN: y=col.my.furthest(x)
n += 1
inc = col.dist(x,y)**f
d += inc
return (d**(1/f)) / (n**(1/f))For data/weather, for the first two rows, find the nearest and furthest rows
Important note: in the distance calc, do not include the dependent variable (the class). Distance is usually computed between independent variables.
Using the tricks explored last week.
Code
Example output from Test. Make you print out rows1,2 and, for each, print the row closest and furthest away.