Why do disks write in a way that causes fragmentation?
#1
Posted 16 May 2010 - 07:46 AM
I've been watching how quickly and easily disks frag up, and I'm wondering why OSes - all the Windows I've used, anyway, which is up to XP - write to disk in such a way that files immediately begin to fragment. Let's say I have a 1M Word document that is written to disk, with no extra elbow room, and I open it and add one word on the first page. As I understand it, Windows will save the file to the same spot, and place the overflow somewhere else.
Why doesn't it instead write the file to a larger slot that will accept the whole thing, and free up the old space for a smaller file? Doing so would cause more free-space gaps but far less fragmentation. File tables would have to be updated frequently, but I think it would be less than maintaining hundreds of locations for a large file.
This seems to be very common-sense to me, so there must be a good reason why it's not the case, if I understand correctly what's happening.
#2
Posted 16 May 2010 - 08:30 AM
The OS does reserve sectors with-in blocks for future data, however, once those blocks are filled, it moves on to the next set of available blocks on the disk and so-on.
Once this has happened the programs data bits are separated into two different blocks in two different areas of the hard drive, thus fragmentation starts happening.
As more of this data is sent to the drive from the same file structure, it goes to the next available block.
See in theory, your operating system has no clue how many data bits your files will consume a head of time, there is no way, that can be determined, so it uses the next available set of blocks as it continues to store all the data bits the file consists of, sending that information to the drives root directory so it is mapped for future retrieval.
Using a disk defragging program, checks the root directory, sort of like a directory in a library that tells you what self and section of the library a book is located in, and checks for a flag that tells it there is a program with data segments in different sections of the drive, it then collects those files and moves them together in the same blocks with its other counter-parts, reconnecting the broken up files to each other, in one steady stream on the disk.
It is sort of like how a librarian would go about a library and reorganize the books that a kid had strown about the library, thus putting them back on the shelf in the correct order again.
My explanation may not be the best technical way of explaining it, but then again if I was too technical, those who were not lucky like me to have gone to trade school for computer electronics would not understand the lingo we were taught to use.
Thank you for understanding my absence, it is job and college related, so all is good. If I do not answer your PMs this is the reason why. See you all soon!
Bruce.
#3
Posted 16 May 2010 - 09:11 AM
A library receives a shipment of books, and the library has to shelve the books for storage but this particular library has their shelves divided into sectors of 2-inches in order to place books there (Memory sectors). This is because the librarians and pages who have to shelve the books have no idea how much space the books will take up before shelving them (computer doesn't know file size).
So the staff goes about shelving all the books (writing data) and instead of placing them cover to cover they shelve the books in line with the next available section (because this is quicker than adjusting the placement of each book individually). So when the shelving is completed we end up with a bunch of gaps between the books.
So then Mr. Manager (Hard Drive Defrag tool) comes around and shoves all of the books together so they are cover to cover. He is upset his staff wasted so much shelf space. Then at the end of many of the shelves there are large spaces of varying sizes and so he goes around and finds the books that will fit there best. Then in the end we end up with all of the books cover to cover on the shelves, and all of the free shelf space in one spot.
At this point we just installed the OS and programs for the first time, and defragged.
Then the doors of the library open for business and patrons remove books and place them on tables (RAM?), the staff of the library replaces the items on the shelves again but only by the sectors. Also many patrons bring in donations which must be fitted into the system (data creation). So then in a few weeks we need to do the process of organization all over again because things aren't in order anymore.
There, now you can setup defragmentation of your local library.
8GB DDR3 RAM
XFX ATI Radeon HD6850 1 GB DDR5, 26" Widescreen HDMI
500GB + 80GB HDD
Windows 7 Pro, Mozilla Firefox, AutoCAD 2011, Solidworks 2009
1/19/2012
#4
Posted 16 May 2010 - 09:31 AM
p.
#5
Posted 16 May 2010 - 10:01 AM
Team work!
That's what I love about the cool members we have here at BC!! YOU GUYS ROCK!!!!!
Thank you for understanding my absence, it is job and college related, so all is good. If I do not answer your PMs this is the reason why. See you all soon!
Bruce.
#6
Posted 16 May 2010 - 01:57 PM
#7
Posted 17 May 2010 - 01:07 AM
Paul, the two approaches you mention to locating files on disk are in fact two legitimate storage allocation strategies used in file systems. They're called "First (or Next) Fit" and "Best Fit". Each has advantages and disadvantages, the most obvious being the one you've observed - the conflicting outcome regarding file fragmentation and drive (free-space) fragmentation.
First fit is faster to allocate locations. Next fit should be faster again as it works the same way, but the next available physical location is used instead of continually re-scanning from the beginning.
Best fit is slower because potentially all of the available empty slots have to be assessed in order to find the one closest to the required size.
File/operating systems tend to use faster strategies to enhance performance.
The extension of an existing file (concatenation) you've mentioned is another consideration. Adding content to a file like a Word .doc file may or may not change its size. A plain text file enlarges in direct response to its content, a file like a .doc file is a database in which adding a word may not affect the filesize, or may make a disproportionate adjustment in space allocation. And this is complicated by Word creating a backup file while an existing file is being edited. So that will also move the location of available space for the active file.
The choice is made to aviod the complexity and processing overhead of fragmentation reduction during file access, as computers are productivity orientated, especially in commercial environments. The "time-wasting" overhead is offloaded into the separate process of defragmentation, which can then be done in otherwise unproductive "offline" time. The strategies actually trace right back to early computing days when processing time was very expensive and clients paid for it by the minute.
This post has been edited by Platypus: 17 May 2010 - 01:08 AM
I pressed F5, and I'm feeling refreshed...
#8
Posted 26 May 2010 - 09:23 PM
8GB DDR3 RAM
XFX ATI Radeon HD6850 1 GB DDR5, 26" Widescreen HDMI
500GB + 80GB HDD
Windows 7 Pro, Mozilla Firefox, AutoCAD 2011, Solidworks 2009
1/19/2012
#9
Posted 26 May 2010 - 09:37 PM
What you're saying makes a lot of sense. It speaks of real world trade-offs among some interesting potential solutions, and I like the way you describe what's happening as "off-loading" the problem via fragmentation.
I think implicit in all that is the fact that the OS does indeed have an idea of how large the file print will be on disk. Otherwise the various strategies would be meaningless.
Anyway, when I see fragmentation pile up so quickly, at least I no longer will feel that it's completely unnecessary! -though I personally would wish for a more "best-fit" solution.
BW,
p.
#10
Posted 27 May 2010 - 01:12 AM
Boredom Software Stop Highlighting Things
#11
Posted 27 May 2010 - 07:26 AM
Also interesting are the comments to the article, which I'm in the process of reading. The noise filtered out, they add more insight.
bw,
p.

Help

Back to top











