FIND MOST FRAGMENTED FILES mostfragged – using filefrag to analyze file fragmentation (find most fragmented files)

The script your looking for is at the bottom, the rest is explanations.

UPDATES TO MOSTFRAGGED SCRIPT

UPDATE 5-14-2015: added ReadyNAS version script at very bottom. Also added fragmentation percentage to STATS files & run output. Checkout my own personal NASes fragmentation being plotted every day. I did that using these bash and python scripts (read thru the article and check out “Example 2 plotting fragmentation”. Also uploaded the scripts to /scripts so that you can “wget” them: mostfragged (use on any server that has filefrag command – analyzes every file) and mostfragged-for-readynas-os6 (use on readynas os6 devices – analyzes every file but skips snapshots)

UPDATE 8-21-2014: added to script a way to skip analyzing snapshot folders (just read “Pick a find” comment section in script)

UPDATE 8-20-2014: works with files that have spaces. dumping all results to the same folder & better usability

UPDATE 5-27-2014: filenames output are better now in script (make more sense)

UPDATE 10-9-2014: verbose and silent find & less bogus error output. Also added all sorts of new outputs and better output comments. Also added ability to dump to a different directory with 2nd argument.

UPDATE 10-10-2014: clearing up outputs

UPDATE 10-10-2014: clearing up outputs and added FOOTER

REQUIREMENTS FOR SCRIPT: filefrag, awk, sed, echo, cat (the typical), readlink

Each file can be fragmented. A none fragmented file has 1 extent. Just the 1 extent that contains the file. An extent is a start and stop location on storage media where the Filesystem is, anything in between the start and stop location is the data that belong to the file.

Some filesystems fragment more then others, like the COW filesystems (which tend to not overwrite old data, so new data is going to be all over the place). Ex: BTRFS and ZFS. With BTRFS you can set files that you want to stay none-fragmented to have the attribute NODATACOW (check out this article: NODATACOW BTRFS). This is recommended for files that require fast IO (like vmdks, vm disk files etc..)

NOTE : If you have a BTRFS filesystem that has alot of fragmentation in a folder and you need to unfragment it, trying uncow.py (here is my script that runs uncow.py for every file & folder in current dir fixing fragmentation: UNCOW) <- this is like a makeshift defragmenter that works file by file (a defragmenter would work block by block, but this works file by file)

==========
EXAMPLE
==========

To understand extents lets look at this kitty.txt file

Example the file:
# cat kitty.txt
This cat of mine likes soup

# filefrag kitty.txt

Lets say it was 1 extent
It could be like this
* First and only extent: This cat of mine likes soup
The first and only extent could be at the end of the disk (or wherever). The contents of the entire file would be read from about the same location, so theoritecally the drive spindle could pick up all the data in foulswoop.

# filefrag kitty.txt

Lets say it was 2 extents
It could be like this
* First extent: This cat of
* Second extent: mine likes soup
The first extent could be at the end of the disk, and second extent at the front of the disk. This will cause the disk spindle to go back and forth.

Thus the kitty.txt file with 2 extents is more fragmented and

More fragmented = more extents

Density of fragmentation is measuring how many fragments or extents are per each average megabyte. Thats what the below scripts do.

Lets find out which files have the most extents per megabyte (thus they are the most fragmented):

====================
INTERACTIVE EXAMPLE
====================

Unfortunatly this interactive example wouldnt work good on files with spaces in the name. However the Script at the bottom works great with files with spaces (the issue was fixed in 8-21-2014 update)

 =====================================
ONE LINERS FRAGMENTATION ANALYZERS
======================================

These are taken from the script below. I just connected some of the commands with pipes. There was no point in tying the script below with pipes because have to wait for sort to complete with the script to see any output. Since we are not sorting here, we can get a live line by line update.

Just set the PATH1 variable below for the folders you want to analyze.

NOTE: Pick one of the red sections below (to show different results), then pick one of the 3 one liners (edit the custom one) to pick what files are read (what directories are skipped from scanning). This happens with a modified find command which is set to skip certain phrases in the pathname (such as /snapshot/). Why 3 different find varitions ones (because the name of your snapshots could be different): default one that doesnt skip anything, readynas one thats custom made for OS6, and a custom one in case you want to use it on your own system (most likely you dont follow the same snapshot naming as the ones on the readynas). Also for more info on the different find command variations read the comment section in the script at the bottom of the page. The comment section is titled “Pick your find”

FILEFRAG ORGANIZED BETTER, order of output (per line): bytes, extents, filename

FILEFRAG WITH EXTENTS/MEGABYTE (binary), order of output (per line): bytes, extents, extents/megabyte(binary), filename

FILEFRAG WITH EXTENTS/MEGABYTE (binary), order of output (per line): bytes, extents, bytes/extent, filename

 

===================================================
THE SCRIPT – GENERAL – APPLIES TO ALL LINUX SERVERS
===================================================

Here is a script that can analyze the fragmentation in any folder and dump the results into another. It classifies all fragmented files as ones with more than 1 extent. Please check out the informative comments at the top of the script to see how to run this script and to understand the output of the command (as it writes several files that can be used for analysis, the most informative output files have “STATS” in the file name).

Note that this script looks through any folder you ask it to look through. It does so with the default find command (which just enumarates all the files) – default because its a simply find without any filters to skip any files or folders. There are 3 more find commands that follow it for specific use cases (with different filters) such as one for ReadyNAS OS 6 (you wouldnt want to enumarate files in snapshot folders, so it filters out the possible snapshot folder names). Likewise you can use those 4 find commands to make your own. In the end just make sure there is only 1 find command (the other ones should be commented out). The ReadyNAS OS6 find command is uncommented in the script thats in the section below this one.

Here is a summary of the 4 find command

  • Default find command: enumarates all files (no filters). This is the find command we use in the mostfragged.sh script. In this mostfragged.sh script this one is uncommented and the 3 below are commented out (as we can only have 1 find command)
  • Relative ReadyNAS OS6 find command: enumarates all files but skips snapshot by using relative filters. more details in section below.
  • Absolute ReadyNAS OS6 find command:enumartes all files but skips snapshots by using absolute filters. more details in section below. This is the find command that we use in the ReadyNAS OS6 mostfragged script in the section below.
  • Custom find command: use this find command to make your own filter. In case you dont want to enumarate all of the files & skip certain files or folders in the target folder. You can use this and the above commands as a guide/template into making your own. Just dont forget in the end you can only have 1 find command.

So why do we use this find command? These find command enumarate the files which are asked and then the results file by file are pumped into a program called “filefrag” which outputs the name of the file and the number of extents of that file. We use the number of extents in the calculation of fragmentation.

Make the file mostfragged.sh with this content. Also you can download mostfragged.sh using this method:   wget http://ram.kossboss.com/scripts/mostfragged.sh Here is the content of the script:

===========================
THE SCRIPT – READYNAS OS6 
===========================

This script will work for all ReadyNAS OS6 units as it will enumarate all of the files in the specified folder (point the folder at a share or volume). This specific script differs from the top script in that the default find command is commented out (the default find command enumartes every file without a filter) & instead this one uses a modified ReadyNAS find command that enumarates all files except BTRFS snapshot file. This ReadyNAS OS6 mostfragged variation enumartes all files in the folder except it skips snapshot folders (using find filters to skip any possible type of snapshot folder name). Note that there are 2 readynas find commands, the relative and absolute. The relative one filters out snapshots based on relative snapshot folder paths. The absolute one filters out snapshots based on absolute snapshot folder paths. We use (and uncomment) the absolute readynas find command. The relative one is there for my own knowledge (it was a less efficient method of running it). The problem with the relative one is that if a share or subfolder actually contains a folder with one of the catch words in the filter (such as “snapshot”) then that folder is skipped even though that folder will not contain the BTRFS snapshots that we want skipped.

The only thing different in this script from the above section script is that this script comments out the default find command and uncomments the absolute find command (see comments). The other find commands remain commented out.

Tou can also download the mostfragged-rnos6.sh script right here. Or from linux:  wget http://ram.kossboss.com/scripts/mostfragged-rn6.sh Here is the content of this script:

 

The end.

One thought on “FIND MOST FRAGMENTED FILES – mostfragged – using filefrag to analyze file fragmentation

  1. Here is how I monitor my fragmentation
    ########################################

    Just made an interesting crontab that runs every 6 hours. It basically runs mostfrag (with max ionice and and max nice so as to not take up resources) every 6 hours and saves output to /root/docs/frag/ folder. I only keep the stats files and the files that show most fragmented files, so I save the files 100+ and 1000+ extents as well.

    NOTE: running this will degrade performance and might crash the system. Make sure you have a backup of your data. Also you will need to edit your scripts to match what ever directory paths your using. (remember to edit mostfragged.sh and “pick the find” or else the default find might endlessly/forever look through you millions of snapshot folders).

    Here is the cron script (this is root users cron script). “sudo -i” then “crontab -e” to edit. “crontab -l” to check it out. This part guarantees running the script every 6 hours (at 00:30,6:30,12:30,18:30)

    30 0,6,12,18 * * * /root/scripts/frag/cron.sh

    The cron tab above runs the cron.sh at those mentioned intervals.
    And here is cron.sh (which runs mostfragged, note i had to make PATH variable so that certain commands ran, or else get bad output in files, like NaN and 0s without it):

    #!/bin/bash
    VOL=/data/
    D82=$(date +s%s_d%Y-%m-%d_t%H-%M-%S)
    SAVE=/root/docs/frag/$D82
    export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
    logger "Starting mostfragged $SAVE"
    /usr/bin/nice -n 19 /usr/bin/ionice -c2 -n7 /root/scripts/frag/mostfragged.sh $VOL $SAVE
    # delete all output files besides the 4 important files
    for i in
    find $SAVE -type f | egrep -v "STATS|plus-SORTED"; do
    ls -lisah $i >> $SAVE/deleted.txt
    rm -rf $i
    done;
    logger "Done mostfragged $SAVE"

Leave a Reply

Your email address will not be published. Required fields are marked *