Singular file finder

Moderator: jsachs

tomczak
Posts: 1372
Joined: April 25th, 2009, 12:56 am
What is the make/model of your primary camera?: Fuji X-E2
Contact:

Singular file finder

Post by tomczak »

There are several duplicate file finders, finding files that have the same hash across directories. I can't figure out how to do the oposite: find singular files.

Having two drives with lots of images on each, but arranged in a different folder structure, ranamed and likely with different dates, how do I find images that exist on one drive but not on the other. Cheers!
Maciej Tomczak
Phototramp.com
Bob Walker
Posts: 78
Joined: April 25th, 2009, 9:08 am
What is the make/model of your primary camera?: Canon R5
Location: Los Alamos, New Mexico
Contact:

Re: Singular file finder

Post by Bob Walker »

Maciej,

It sounds like you are looking for a file synchronization tool of some sort. I use something called "Folderclone", and as one of its functions, it will find files (but with the same name) that exist in one directory but not in another, as a preliminary step to synchronizing files between folders. There are lots of other synchronization tools, like Microsoft's SyncToy, but I doubt they will all do what you want.

If you want to chase any of these tools down, just use a search engine.

Now, for renamed files with different modification times, it gets tougher. Are the file contents identical? You can set the timestamp to the EXIF time, which I often like to do even for my post-processed images. Of course, if you have post-processed the file, it will not have the same CRC/hash as an original, so I am not sure what you mean here about hashes.

If the file contents are identical, just name and timestamps different, can you can use something like QuickSFV, which will calculate a CRC or MD5 hash and write it to a text file. Then you could compare hashes with something like a Unix diff function to find same/different files.

The photo database program IMatch used to have a function that would look for similar images across directories, but I never tried that.

Are any of these ideas in the right direction?

Bob W
ksinkel
Posts: 594
Joined: April 2nd, 2009, 11:58 am
Contact:

Re: Singular file finder

Post by ksinkel »

You can use Picture Window's workflow window for identifying duplicate files located in different folders.

To do so:

1. Open a new workflow window. (Click on the wave button in the toolbar.)
2. Use the Add Files button to add images from as may folders as you like.
3. Sort the images by create date. This will place the same images next to each other, even if they have been subsequently modified.

Now you can open duplicates for detailed inspection. You can also move them to other folders or delete unwanted duplicates directly from the workfow window.

Kiril
Kiril Sinkel
Digital Light & Color
tomczak
Posts: 1372
Joined: April 25th, 2009, 12:56 am
What is the make/model of your primary camera?: Fuji X-E2
Contact:

Re: Singular file finder

Post by tomczak »

Thanks a lot Bob,

What happened in this case is that I made a compilation of images from several different sources. In the process, the names (sometimes dates) and the directory structure were changed. The file content remains the same (hance I mentioned CRC or hashes).

I want to delete the sources, but before I do it I wanted to do was to double check if I didn't miss anything making this compilation. There is lots of files and doing it by hand is hopeless.

I know how to find duplicates by content regardless of directory structures. I also know how to check if the content of two drives is the same, providing that the directories on both drives matches, but they don't. What I was hoping was something similar to finding duplicates by content (which can ignore directory structure), except that rather than listing the duplicates, listing files that exist in one copy only (this way I coud tell what I have missed). I can't figure out how do it gracefuly... or perhaps there is another way of doig it?

Cheers and thanks.
Maciej Tomczak
Phototramp.com
tomczak
Posts: 1372
Joined: April 25th, 2009, 12:56 am
What is the make/model of your primary camera?: Fuji X-E2
Contact:

Re: Singular file finder

Post by tomczak »

Thanks Kiril,

It's a long story, but the dates have been modified in the process (though EXIF dates should be the same). The problem is that without some automation I'm more or less doomed - there is some 20,000 files to compare...

Cheers.
Maciej Tomczak
Phototramp.com
ksinkel
Posts: 594
Joined: April 2nd, 2009, 11:58 am
Contact:

Re: Singular file finder

Post by ksinkel »

The create date uses the exif date, unless the image does not have one. Only in the latter case is the Windows create date used.

Kiril
Kiril Sinkel
Digital Light & Color
Bob Walker
Posts: 78
Joined: April 25th, 2009, 9:08 am
What is the make/model of your primary camera?: Canon R5
Location: Los Alamos, New Mexico
Contact:

Re: Singular file finder

Post by Bob Walker »

Maciej,

When you edit an image, PWP (and everyone else) will change the timestamp that Windows normally uses (i.e., the modification date).

If you want to reset the timestamp to the EXIF date, you can use BreezeBrowser (costs $$), or Exifer (free).

You could also use Breezebrowser or Irfanview to batch rename files to %yymmddhhmmss-%filename to help stuff files with the same EXIF time next to each other in a routine directory listing (, where %yymmddhhmmss is EXIF year-month-date-hours-minutes-seconds). That might help, you could just look through your 20000 entries for singletons; Unix utilities like "sort" could then automate that and give you a count of the number of times the %yymmddhhmmss prefix is repeated, and scan for singletons. I don't know if you want to install Unix-like utilities on your machine, or how familiar you are with them.

Bob
keithrj
Posts: 71
Joined: April 27th, 2009, 7:35 pm
What is the make/model of your primary camera?: Canon 40D
Location: Perth, Australia
Contact:

Re: Singular file finder

Post by keithrj »

IMatch will do what you want but you do have to buy it. IMatch is a great tool and hugely powerful but with that it can be a little difficult to use. You can also write scripts to do anything you like.

The matching facility (hence the name) can be used to find duplicates across your whole directory structure as well as offline media (as it keeps a database of thumbnails) and can even be set to use a specific tolerance for matching. In other words you can search for pictures that are 'almost' the same - cropped pictures or different sizes of the same picture.

IMatch can be downloaded here: www.photools.com

If you need more info or assistance let me know.
tomczak
Posts: 1372
Joined: April 25th, 2009, 12:56 am
What is the make/model of your primary camera?: Fuji X-E2
Contact:

Re: Singular file finder

Post by tomczak »

Thank you,

The RAW files in sources and the destination drive have the same content (but different names, dates and folders), so I was thinking of using a duplicate finder (I use either Duplicate Finder or Nir Sofer's FindMyFiles - both small, free, portable and excellent), but it turns out that this is not what I need. With duplicates I can tell which files were copied on the destination drive (and even how many of them), but how do I isolate those that were not duplicated?

It's quite easy to inspect it by hand having a few files. It's not that difficult to do it if the source and destination directory tree was the same (most synchronization software could do that), but with large number of files pooled from several different drives and folders into one drive with a new directory structure, how can I find what I have left behind (it was a fairly complex manual job and I'm sure that I missed some files in the process.).

I called it the opposite to the duplicates finding because I'm trying to find the files that were not duplicated (i.e. exist in the source drives/folders but not in the destination).

Cheers!
Maciej Tomczak
Phototramp.com
tomczak
Posts: 1372
Joined: April 25th, 2009, 12:56 am
What is the make/model of your primary camera?: Fuji X-E2
Contact:

Re: Singular file finder

Post by tomczak »

I think I got it. Christian Ghisler of Total Commander told me how to do it.

This is indeed difficult. I would do it like this:
1. Create md5 checksum file on both disk via Files - Create checksums
2. Sort the two files by name with command
sort list.md5 > sorted.md5
This sorts them by checksum!
3. Use Files - Compare by content to compare the two

I was tempted to use CRC32 instead since it should be faster, but using md5 is important because md5 list one hash value per line, starting the line with this value, and thus can be sorted properly. The file can be hashed across directories. The process takes a while (it hashes 3-4 CR2 files per second on my computer), but it's easy and works.
Maciej Tomczak
Phototramp.com
Post Reply