What is De-Duplication?
De-duplication, or “de-duping,” is a way to clean up your files by finding and removing duplicates. Over time, computers and storage drives collect duplicate files—whether from downloads, backups, or shared folders. These duplicates take up space and make it harder to find what you are looking for. De-duping removes duplicate files and thus frees up storage space, speeds up searches and backups, and reduces clutter.
There are different types of duplicate files. Understanding the different types of duplicates is key to choosing the right approach.
1. Exact duplicates (file-level duplicates)
These are the easiest to understand—they’re identical files: duplicate content, metadata, everything. You might end up with these when people copy, forward, or download the same document multiple times.
- How to spot them: Exact duplicates have the same digital fingerprint, called a hash value (e.g., MD5, SHA-1). If two files have the same hash, they’re exact duplicates.
2. Near duplicates (document-level duplicates)
These are files that are similar but not completely identical. Someone may have made a small edit, added a comment, or saved the document in a different format.
- How to spot them: Near duplicates are found using content comparison tools that give you a similarity score (e.g., 90% match).
In this blog post, we will be only looking at two free software programs that will eliminate Exact duplicates—dupeGuru and DupScout.
dupeGuru (Free): https://dupeguru.voltaicideas.net/
dupeGuru is an easy-to-use tool that helps find duplicate files based on name or content. It works on Mac and Linux and can:
✔ Find duplicates even if the names are slightly different
✔ Scan file content for duplicates
✔ Work with network drives
✔ Export results as an HTML or CSV file
Limitations:
✘ Cannot search by file type
✘ No detailed reports or charts
dupeGuru finds both identical and similar filenames. It can recognize if a file has been renamed or if the file extension was changed. It groups the files and allows you to select “Dupes Only” which can then be marked for deletion, moved or copied to another location. You can rename a file from within the Results tab.

dupeGuru will also search across network drives.
One limitation is that it does not allow searching by file types. It does allow searching by Application mode (Standard, Music or Picture) and allows filtering by file size. dupeGuru will also find email messages and files (msg, pst and mbox).

Although it does not have reporting or fancy charts, it does have the ability to export your search results as an HTML or CSV file.

DupScout (std free version): https://www.dupscout.com/index.html
DupScout is another tool that works on Windows. It can:
✔ Search for duplicates across drives, servers, and networks
✔ Replace duplicates with shortcuts or moving them
✔ Create detailed reports with charts
✔ Sort duplicates by file type, size, or creation date
Limitations:
✘ The free version is limited to 2TB of storage
✘ Advanced reporting is only available in the paid version
✘ No Mac support
DupScout is a duplicate files search and removal tool that will let you search for duplicate files in disks, directories, server drives and other Network-Attached Storage (NAS) devices. It will search for Audio, Images, or Documents, and will find email messages and files (msg, pst, and mbox). It can recognize if a file has been renamed or if the extension was changed. It does limit the amount of disk space to 2 TB total for all connected drives and\or devices, and the Reports menu feature is only available on the Pro version. There is no MacOS version available.
The interface is user-friendly. Under the Actions menu, you can replace duplicate files with shortcuts or hard links, move duplicate files to another directory, compress and move duplicates, or delete all duplicate files.

Using DupScout, you can save search results to HTML, PDF, Excel, text, CSV and XML reports. You can make rules to find files by file extension, file size, file creation date, etc. The DupScout duplicate files finder provides multiple types of statistical pie charts and timeline charts capable of showing the amount of duplicate disk space and the number of duplicate files per directory, file extension, file type, file size, file owner, creation, modification and last access time.
To open the charts dialog, press the ‘Charts’ button located on the main toolbar and select an appropriate chart type.


You can save the Pie Chart information in a variety of formats as shown above. You can display the results as Duplicate Disk space per File Type or by Number of Duplicates by File Type. You can also toggle between Pie Chart and Bar Chart.
In Summary:
Both tools are great for cleaning up your files and keeping your system organized. If you have any questions or are interested in trying this for yourself, feel free to reach out to Nelson_Garcia@fd.org.