Code Point
September 08, 2010, 08:24:15 AM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: Geshi Sintax Hilighting Mod Installed. Tens of languages available.
 
   Home   Help Search Login Register  
Pages: [1]   Go Down
  Print  
Author Topic: Compare (diff) two files - cmpfile and difflib?  (Read 2631 times)
MarkCrowther
Jr. Member
*

Karma: 0
Offline Offline

Posts: 16


tippity tappety


« on: March 23, 2007, 01:19:25 PM »

Hi All,

I'm trying compare two files that in theory are exactly the same, but live in two different folders. Out of interest they are Google search pages. The files and folders are generated by another script and are: Test_Folder/pies.html and GM_HTML_FILES/pies.html

It seems that I can check there are two file names the same using:
Code
a = cmpfiles("Test_Folder", "Test_Folder2", "cinema.html")
print a

So, looking at the difflib and trawling  - http://www.thescripts.com I came up with this combo, that doesn't work due to the error below. Naturally, I've gotten into the habit of visiting http://www.python.org/doc/2.4.2/lib/module-difflib.html for example, clear as mud as usual

Code
#define the SEARCH TERMS we'll be using
searchTerms = ['pies', 'cars', 'boats']
urlIterate = 0
 
# define the 10 files we want to create
testFile = [(searchTerms[0]+'.html'), (searchTerms[1]+'.html'), (searchTerms[2]+'.html')]
testFileNumber = 0
 
gmFile = testFile
gmFileNumber = testFileNumber
 
while testFileNumber < 1:
   a = open('Test_Folder/'+testFile[testFileNumber], 'r')
   b = open('GM_HTML_Files/' + gmFile[gmFileNumber], 'r')
 
 
   d = difflib.Differ()
   for line in d.compare(a,b):
       print line
 
   a.close()
   b.close()
 
   testFileNumber += 1
   urlIterate += 1

the good news is it DOES find the files but then spits this error code
Code
Traceback (most recent call last):
 File "C:\Python25\my_pythoning\taptu\file_compare.py", line 28, in <module>
   for line in d.compare(a,b):
 File "C:\Python25\lib\difflib.py", line 906, in compare
   cruncher = SequenceMatcher(self.linejunk, a, b)
 File "C:\Python25\lib\difflib.py", line 211, in __init__
   self.set_seqs(a, b)
 File "C:\Python25\lib\difflib.py", line 223, in set_seqs
   self.set_seq2(b)
 File "C:\Python25\lib\difflib.py", line 277, in set_seq2
   self.__chain_b()
 File "C:\Python25\lib\difflib.py", line 308, in __chain_b
   n = len(b)
TypeError: object of type 'file' has no len()

Sorry! Complex one today... ANY pointers of diffing and reporting on two html files would be welcome! Smiley
Logged

When I grow up I'm going to be an idiot
Alter Lobo
Global Moderator
Jr. Member
*****

Karma: 10
Offline Offline

Posts: 65


« Reply #1 on: March 23, 2007, 02:36:23 PM »

It seems that I can check there are two file names the same using:
Code
a = cmpfiles("Test_Folder", "Test_Folder2", "cinema.html")
print a

Not exactly like that. The third argument must be a list of files to be compared:
Code
>>> import filecmp 
>>> filecmp.cmpfiles('./', '../', ['07data.zip', 'commitedcfg.cfg', 'non_existent_file'], shallow=False)
(['07data.zip'], ['commitedcfg.cfg'], ['non_existent_file'])
 

The returned tuple contains three lists. The first is a list of the files that matched. The second is a list of the files that did not match and the third is a list of files for which the comparison failed for some reason, in this case the file didn't exist.

So, looking at the difflib and trawling  - http://www.thescripts.com I came up with this combo, that doesn't work due to the error below. Naturally, I've gotten into the habit of visiting http://www.python.org/doc/2.4.2/lib/module-difflib.html for example, clear as mud as usual

Code
#define the SEARCH TERMS we'll be using
searchTerms = ['pies', 'cars', 'boats']
urlIterate = 0
 
# define the 10 files we want to create
testFile = [(searchTerms[0]+'.html'), (searchTerms[1]+'.html'), (searchTerms[2]+'.html')]
testFileNumber = 0
 
gmFile = testFile
gmFileNumber = testFileNumber
 
while testFileNumber < 1:
   a = open('Test_Folder/'+testFile[testFileNumber], 'r')
   b = open('GM_HTML_Files/' + gmFile[gmFileNumber], 'r')
 
 
   d = difflib.Differ()
   for line in d.compare(a,b):
       print line
 
   a.close()
   b.close()
 
   testFileNumber += 1
   urlIterate += 1

the good news is it DOES find the files but then spits this error code

The compare() method of the differ() class gets line sequences as parameters not files. I didn't test but try it like this:

Code
    a = open('Test_Folder/'+testFile[testFileNumber], 'r').readlines()
   b = open('GM_HTML_Files/' + gmFile[gmFileNumber], 'r').readlines()
 
Logged
MarkCrowther
Jr. Member
*

Karma: 0
Offline Offline

Posts: 16


tippity tappety


« Reply #2 on: March 24, 2007, 03:05:22 AM »

Hiya,

Thanks for that adding .readlines etc. worked, I jsut need to play with the output now Smiley

I had a try of
Code
import filecmp
filecmp.cmpfiles('../', '../', ['file1.txt', 'file2.txt', 'file3.txt'], shallow=False)(['file1.txt'], ['file2.txt'], ['file3.txt'])
 

but am getting TypeError: 'tuple' object is not callable

God... it is always this much of a pain in the a**e  Huh
Logged

When I grow up I'm going to be an idiot
Alter Lobo
Global Moderator
Jr. Member
*****

Karma: 10
Offline Offline

Posts: 65


« Reply #3 on: March 24, 2007, 11:37:41 AM »

I had a try of
Code
import filecmp
filecmp.cmpfiles('../', '../', ['file1.txt', 'file2.txt', 'file3.txt'], shallow=False)(['file1.txt'], ['file2.txt'], ['file3.txt'])
 
but am getting TypeError: 'tuple' object is not callable

That last tuple is the output, not the input of the command:

Code
import filecmp
filecmp.cmpfiles('../', '../', ['file1.txt', 'file2.txt', 'file3.txt'], shallow=False)
 

God... it is always this much of a pain in the a**e  Huh

I think you are trying too advanced things. If you have never programmed you have gone very far very quickly. Just step down and go on a little slowly.

Now on a side note, a programmer will never have an easy life. Even after many years of experience. Programming is solving problems on a daily base. Sometimes they are easy, sometimes they drive you nuts. It is just a case of developing frustration resistance. There are no good programmers without that. The bright side is that a job well done and appreciated by the clients gives you a great feeling of accomplishment and may be a good salary.
Logged
MarkCrowther
Jr. Member
*

Karma: 0
Offline Offline

Posts: 16


tippity tappety


« Reply #4 on: March 25, 2007, 09:08:44 PM »

Yes, I probably am, but I'm *this* close  Tongue

Once I'm done with thos little project I'll head back to doing tutorials and reading Begining Python.  Grin
Logged

When I grow up I'm going to be an idiot
Pages: [1]   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC Valid XHTML 1.0! Valid CSS!