Hi All,
I'm trying compare two files that in theory are exactly the same, but live in two different folders. Out of interest they are Google search pages. The files and folders are generated by another script and are:
Test_Folder/pies.html and
GM_HTML_FILES/pies.htmlIt seems that I can check there are two file names the same using:
a = cmpfiles("Test_Folder", "Test_Folder2", "cinema.html")
print aSo, looking at the
difflib and trawling -
http://www.thescripts.com I came up with this combo, that doesn't work due to the error below. Naturally, I've gotten into the habit of visiting
http://www.python.org/doc/2.4.2/lib/module-difflib.html for example,
clear as mud as usual#define the SEARCH TERMS we'll be using
searchTerms = ['pies', 'cars', 'boats']
urlIterate = 0
# define the 10 files we want to create
testFile = [(searchTerms[0]+'.html'), (searchTerms[1]+'.html'), (searchTerms[2]+'.html')]
testFileNumber = 0
gmFile = testFile
gmFileNumber = testFileNumber
while testFileNumber < 1:
a = open('Test_Folder/'+testFile[testFileNumber], 'r')
b = open('GM_HTML_Files/' + gmFile[gmFileNumber], 'r')
d = difflib.Differ()
for line in d.compare(a,b):
print line
a.close()
b.close()
testFileNumber += 1
urlIterate += 1
the good news is it DOES find the files but then spits this error code
Traceback (most recent call last):
File "C:\Python25\my_pythoning\taptu\file_compare.py", line 28, in <module>
for line in d.compare(a,b):
File "C:\Python25\lib\difflib.py", line 906, in compare
cruncher = SequenceMatcher(self.linejunk, a, b)
File "C:\Python25\lib\difflib.py", line 211, in __init__
self.set_seqs(a, b)
File "C:\Python25\lib\difflib.py", line 223, in set_seqs
self.set_seq2(b)
File "C:\Python25\lib\difflib.py", line 277, in set_seq2
self.__chain_b()
File "C:\Python25\lib\difflib.py", line 308, in __chain_b
n = len(b)
TypeError: object of type 'file' has no len()
Sorry! Complex one today... ANY pointers of diffing and reporting on two html files would be welcome!
