Monday, April 2, 2012

Efficient File I/O From C#

Efficient File I/O From C#:
This article describes and benchmarks different ways to do file I/O from C#.  All of the code referenced in this article is available for download and is free to use.
There are many different ways to do file I/O in C#.  The .NET framework provides the following classes:
  • File – Provides static methods for the creation, opening of files, copying, deleting, moving, reading, and writing of files.
  • BinaryReader – Reads primitive data types as binary values using a specific encoding.
  • BinaryWriter – Writes primitive types in binary to a stream and supports writing strings using a specific encoding.
  • StreamReader – Implements a TextReader that reads characters from a byte stream using a particular encoding.
  • StreamWriter – Implements a TextWriter for writing characters to a stream using a particular encoding.
  • FileStream   – Exposes a Stream around a file, supporting both synchronous and asynchronous read and write operations.
The Windows operating system also provides at least two functions to read and write files:
  • ReadFile  – Reads data from a file.  This function supports both synchronous and asynchronous operations.
  • WriteFile – Writes data to a file. This function supports both synchronous and asynchronous operation.
One small problem with using the Windows ReadFile and WriteFile functions is that the interface requires using pointers.  In C#, pointers can only be used in a method or class declared unsafe.  So, since some organizations are somewhat leery of using pointers and unsafe code, all of this code has been put into its own class and the calling methods do not have to declared unsafe.  The name of this class is named:
WinFileIO – provides the capability to utilize the ReadFile and Writefile windows IO functions.  Based upon extensive testing, these functions are the most efficient way to perform file I/O from C# and C++.
How The Test Program Works
The user interface provides the following buttons:
  • Run All - tests the Read File and Write File functions.  Similar to pressing the Read File and Write File buttons.
  • Read File - tests the read file methods for each class listed above.  Each test consists of reading in 3 text files with sizes of roughly < 1mb, 10mb, and 50mb.
  • Write File - tests the write file methods for each class listed above.  Each test consists of reading in 3 text files with sizes of roughly < 1mb, 10mb, and 50mb.
  • WinFileIO Unit Tests - tests each public interface I/O method in the class to show that it works correctly.
Testing Methodology:
Each test is consistent in the way benchmarking is done.  For each test function, the current time is obtained before the I/O function begins, and is then retrieved right after the test ends.  Benchmarking includes the time it takes to open and close the file, except for the last set of tests which only measure the time it takes to read or write the files.
Each test consists of reading in or writing out 3 files:
  • < 1 MB - approximately .66 megabytes.
  • 10 MB - approximately 10 megabytes.
  • 50 MB - approximately 50 megabytes.
The test machine consists of the following parts:
  • CPU – INTEL i7-950.
  • Memory – 6 GB
  • OS – Windows 7 Pro which is contained on a solid state drive.
  • Hard drive -  Western Digital 1TB SATA III 6GB/S 7200RPM 64MB CACHE, which is where the data files reside.
  • IDE – Visual Studio 2008 standard edition with .NET framework version 3.5 SP1.
Read Methods Tested:
File class methods:
  • ReadAllLines – reads all lines of the file into a string array.  See TestReadAllLines.
  • ReadAllText - reads the entire contents of the file into a string.  See TestReadAllText.
  • ReadAllBytes - reads the entire contents of the file into a byte array. See TestReadAllBytes.
BinaryReader methods:
  • Read - reads the entire contents of the file into a character array.  See TestBinaryReader1.

StreamReader methods:
Read(1) - reads the entire contents of the file into a character array using the single argument constructor.  See TestStreamReader1.
  • Read(2) – reads the entire contents of the file into a character array.  Uses a constructor to specify a sector aligned buffer size.  See TestStreamReader2.
  • Read(3) – a loop is used to read in the entire contents of the file into a character array.  Uses a constructor to specify a sector aligned buffer size.  The loop is terminated when the Peek function indicates there is no more data.
  • ReadBlock - reads the entire contents of the file into a character array.  Uses a constructor to specify a sector aligned buffer size.  See TestStreamReader5.
  • ReadToEnd - a loop is used to read in the entire contents of the file into a character array.  Uses a constructor to specify a sector aligned buffer size.  The loop is terminated when the Peek function indicates there is no more data.  See TestStreamReader4.
  • ReadBlock - reads the entire contents of the file into a character array.  Uses a constructor to specify a sector aligned buffer   See TestStreamReader3.
  • ReadLine - reads the entire contents of the file line by line into a string.  Uses a constructor to specify a sector aligned buffer size.  See TestStreamReader3.
FileStream methods:
  • Read(1) – reads the entire contents of the file into a byte array.  See TestFileStreamRead1.
  • Read(2) - reads the entire contents of the file into a byte array and parse into lines.  See TestFileStreamRead2.
  • Read(3) – reads the entire contents of the file into a byte array and parse into lines.  See TestFileStreamRead2A.
  • Read(4) - reads the entire contents of the file into a byte array.  Uses the RandomAccess option in the constructor to determine if there is any impact on performance.  See TestFileStreamRead3.
  • BeginRead(1) – reads the entire contents of the file into a byte array asynchronously.  See TestFileStreamRead4.
  • BeginRead(2) – reads the entire contents of the file into a byte array asynchronously and parses into lines.  See TestFileStreamRead5.
  • BeginRead(3) – reads the entire contents of the file into a byte array asynchronously and parses into lines.  Identical to TestFileStreamRead5 except for using a different threading locking mechanism.  See TestFileStreamRead6.
  • BeginRead(4) - reads the entire contents of the file into a byte array asynchronously.  Uses no locking mechanism.  See TestFileStreamRead7.
  • BeginRead(5) - reads the entire contents of the file into a byte array asynchronously and parses into lines.  See TestFileStreamRead8.
WinFileIO methods:
  • Read - reads the specified number of bytes into an array.
  • ReadUntilEOF - reads the entire contents of a file into an array.
  • ReadBlocks - reads the specified number of bytes into an array.
Write Methods Tested:
File class methods:
  • WriteAllLines - writes all lines in a string array to a file.  See TestWriteAllLines.
  • WriteAllText - writes the entire contents in a string to a file.  See TestWriteAllText.
  • WriteAllBytes - writes the entire contents of a byte buffer to a file.  See TestWriteAllBytes.
BinaryWriter methods:
  • Write - writes the entire contents of a character array to a file.  See TestBinaryWriter1.
StreamWriter methods:
  • Write - writes the entire contents a character array to a file.  See TestStreamWriter1.
FileStream methods:
  • Write(1) - writes the entire contents of a byte array to a file.  See TestFileStreamWrite1.
  • Write(2) - writes the entire contents of a byte array to a file.  See TestFileStreamWrite2.
  • Write(3) - writes the entire contents of a byte array to a file.  See TestFileStreamWrite3.
WinFileIO methods:
  • Write - writes the entire contents of an array to a file.  See TestWriteFileWinAPI1.
  • WriteBlocks - writes the entire contents of an array to a file.  See TestWriteFileWinAPI2.
WinFileIO Class:
This class was designed to make it easy to use the Windows ReadFile and WriteFile methods.  It handles all of the unsafe operations with pointers.  Calling methods do not have to be declared unsafe.  Implements the IDisposable interface which means that the Dispose method should be called when the object is no longer needed.  If there is a problem with any method in this class, it will throw an exception with the Windows error information.  If the function returns, then this indicates success.  The only exception to this is the Close method.
Constructors:
  • WinFileIO() - default.  If this constructor is used, then the PinBuffer function must be called.
  • WinFileIO(Array Buffer) – this constructor should be used most of the time.  The Buffer is used to read in or write out the data.  The array passed in can be of any type provided it does not contain any references or pointers.  So, byte, char, int, long, and double arrays should all work.  But string arrays will not since strings use pointers.  The code has only been tested with byte arrays.
Methods:
  • void PinBuffer(Array Buffer) – this method pins the buffer in memory and retrieves a pointer to it which is used for all I/O operations.  UnpinBuffer is called by this function so it need not be called by the user.  This function only needs to be called if the default constructor is used or a different buffer needs to be used for reading or writing.
  • void OpenForReading(string FileName) – opens a file for reading.  The argument FileName must contain the path and filename of the file to be read.
  • void OpenForWriting(string FileName) – opens a file for writing.  If the file exists, it will be overwritten.
  • int Read(int BytesToRead) – reads in a file up to BytesToRead The return value is the number of bytes read.  BytesToRead must not be larger than the size of the buffer specified in the constructor or PinBuffer.
  • int ReadUntilEOF() - reads in the entire contents of the file.  The file must be <= 2GB.  If the buffer is not large enough to read the file, then an ApplicationException will be thrown.  No check is made to see if the buffer is large enough to hold the file.  If this is needed, then use the ReadBlocks method.
  • int ReadBlocks(int BytesToRead) – reads a total of BytesToRead at a time.  There is a limit of 2gb per call.  BytesToRead should not be larger than the size of the buffer specified in the constructor or PinBuffer.
  • int Write(int BytesToWrite) – writes a buffer out to a file.  The return value is the number of bytes written to the file.
  • int WriteBlocks(int NumBytesToWrite) - writes a buffer out to a file.  The return value is the number of bytes written to the file.
  • bool Close() - closes the file.  If this method succeeds, then true is returned.  Otherwise, false is returned.
BenchMark Read Results::
Running the read file tests:

Total time reading < 1MB with File.ReadAllLines                            = 00:00:00.0030002

Total time reading 10MB with File.ReadAllLines                             = 00:00:00.0640037

Total time reading 50MB with File.ReadAllLines                             = 00:00:00.3540202

Total time reading < 1MB with File.ReadAllText                             = 00:00:00.0040002

Total time reading 10MB with File.ReadAllText                              = 00:00:00.0360020

Total time reading 50MB with File.ReadAllText                              = 00:00:00.1630093

Total time reading < 1MB with File.ReadAllBytes                            = 00:00:00

Total time reading 10MB with File.ReadAllBytes                             = 00:00:00.0050003

Total time reading 50MB with File.ReadAllBytes                             = 00:00:00.0260015
Total time reading < 1MB with BinaryReader.Read                         = 00:00:00.0020001

Total time reading 10MB with BinaryReader.Read                          = 00:00:00.0270016

Total time reading 50MB with BinaryReader.Read                          = 00:00:00.1260072
Total time reading < 1MB with StreamReader1.Read                      = 00:00:00.0010001

Total time reading 10MB with StreamReader1.Read                       = 00:00:00.0200011

Total time reading 50MB with StreamReader1.Read                       = 00:00:00.0960055

Total time reading < 1MB with StreamReader2.Read(large buf)    = 00:00:00.0010001

Total time reading 10MB with StreamReader2.Read(large buf)     = 00:00:00.0160009

Total time reading 50MB with StreamReader2.Read(large buf)     = 00:00:00.0750043

Total time reading < 1MB with StreamReader3.ReadBlock             = 00:00:00.0010001

Total time reading 10MB with StreamReader3.ReadBlock              = 00:00:00.0150008

Total time reading 50MB with StreamReader3.ReadBlock              = 00:00:00.0750043

Total time reading < 1MB with StreamReader4.ReadToEnd           = 00:00:00.0020001

Total time reading 10MB with StreamReader4.ReadToEnd            = 00:00:00.0320018

Total time reading 50MB with StreamReader4.ReadToEnd            = 00:00:00.1720099

Total time reading < 1MB with mult StreamReader5.Read              = 00:00:00.0020001

Total time reading 10MB with mult StreamReader5.Read               = 00:00:00.0430025

Total time reading 50MB with mult StreamReader5.Read               = 00:00:00.0850048

Total time reading < 1MB with StreamReader6.ReadLine                = 00:00:00.0020002

Total time reading 10MB with StreamReader6.ReadLine                 = 00:00:00.0310017

Total time reading 50MB with StreamReader6.ReadLine                 = 00:00:00.1510087

Total time reading < 1MB with StreamReader7.Read parsing          = 00:00:00.1470084

Total time reading 10MB with StreamReader7.Read parsing           = 00:00:00.1600091

Total time reading 50MB with StreamReader7.Read parsing           = 00:00:00.2260129
Total time reading < 1MB with FileStream1.Read no parsing           = 00:00:00.0080005

Total time reading 10MB with FileStream1.Read no parsing            = 00:00:00.0040002

Total time reading 50MB with FileStream1.Read no parsing            = 00:00:00.0190011

Total time reading < 1MB with FileStream2.Read parsing                 = 00:00:00.1220070

Total time reading 10MB with FileStream2.Read parsing                  = 00:00:00.1220069

Total time reading 50MB with FileStream2.Read parsing                  = 00:00:00.1370079

Total time reading < 1MB with multiFileStream2A.Read parsing     = 00:00:00.1180067

Total time reading 10MB with multiFileStream2A.Read parsing      = 00:00:00.1210070

Total time reading 50MB with multiFileStream2A.Read parsing      = 00:00:00.1320075

Total time reading < 1MB with FileStream3.Read(Rand) no parsing= 00:00:00

Total time reading 10MB with FileStream3.Read(Rand) no parsing = 0:00:00.0030002

Total time reading 50MB with FileStream3.Read(Rand) no parsing = 00:00:00.0170009

Total time reading < 1MB with FileStream4.BeginRead no parsing  = 0:00:00.0020001

Total time reading 10MB with FileStream4.BeginRead no parsing   = 0:00:00.0040002

Total time reading 50MB with FileStream4.BeginRead no parsing   = 00:00:00.0180011

Total time reading < 1MB with FileStream5.BeginRead parsing        = 0:00:00.0020001

Total time reading 10MB with FileStream5.BeginRead parsing         = 0:00:00.0280016

Total time reading 50MB with FileStream5.BeginRead parsing         = 0:00:00.1370079

Total time reading < 1MB with FileStream6.BeginRead parsing       = 00:00:00.0030002

Total time reading 10MB with FileStream6.BeginRead parsing         = 00:00:00.0280016

Total time reading 50MB with FileStream6.BeginRead parsing         = 00:00:00.1360077

Total time reading < 1MB with FileStream7.BeginRead                       = 00:00:00

Total time reading 10MB with FileStream7.BeginRead                      = 00:00:00.0050003

Total time reading 50MB with FileStream7.BeginRead                      = 00:00:00.0240014

Total time reading < 1MB with FileStream8.BeginRead parsing       = 00:00:00.0020001

Total time reading 10MB with FileStream8.BeginRead parsing        = 00:00:00.0310018

Total time reading 50MB with FileStream8.BeginRead parsing        = 00:00:00.1480085
Total time reading < 1MB with WFIO1.Read No Parsing                   = 00:00:00.0020001

Total time reading 10MB with WFIO1.Read No Parsing                    = 00:00:00.0020001

Total time reading 50MB with WFIO1.Read No Parsing                    = 00:00:00.0120007

Total time reading < 1MB with WFIO2.ReadUntilEOF No Parsing  = 00:00:00.0010001

Total time reading 10MB with WFIO2.ReadUntilEOF No Parsing   = 00:00:00.0030001

Total time reading 50MB with WFIO2.ReadUntilEOF No Parsing   = 00:00:00.0140008

Total time reading < 1MB with WFIO3.ReadBlocks API No Parsing= 00:00:00.0010001

Total time reading 10MB with WFIO3.ReadBlocks API No Parsing = 00:00:00.0030002

Total time reading 50MB with WFIO3.ReadBlocks API No Parsing = 00:00:00.0130008
Total time reading < 1MB with BinaryReader.Read                            = 00:00:00.0010001

Total time reading 10MB with BinaryReader.Read                             = 00:00:00.0220012

Total time reading 50MB with BinaryReader.Read                             = 00:00:00.1080062

Total time reading < 1MB with StreamReader2.Read(large buf)      = 00:00:00.0010001

Total time reading 10MB with StreamReader2.Read(large buf)       = 00:00:00.0150008

Total time reading 50MB with StreamReader2.Read(large buf)       = 00:00:00.0690040

Total time reading < 1MB with FileStream1.Read no parsing            = 00:00:00.0010000

Total time reading 10MB with FileStream1.Read no parsing             = 00:00:00.0030002

Total time reading 50MB with FileStream1.Read no parsing             = 00:00:00.0130008

Total time reading < 1MB with WFIO.Read No Open/Close              = 00:00:00.0010001

Total time reading 10MB with WFIO.Read No Open/Close               = 00:00:00.0030001

Total time reading 50MB with WFIO.Read No Open/Close               = 00:00:00.0130008

Read file tests have completed.
Analysis Of Read Results:
The File class provides the simplest way to read in a class.  The ReadAllBytes method of this class provides a fairly efficient way to read in a file and is only bested by the read methods in the FileStream and WinFileIO classes.  From the results, it appears that the best StreamReader and BinaryReader read methods are roughly 3 to 5 times slower than the ReadAllBytes method.
The FileStream read methods were shown to be the fastest way to read a file into memory using a method from the .NET Framework.  The synchronous method of reading the entire file into memory in TestFileStreamRead1 and TestFileStreamRead3 proved to be the best of this set of tests with TestFileStreamRead3 taking top honors by a hair.  The only difference between these two tests is that the file is opened with the SequentialScan option in TestFileStreamRead1 .vs. opening the file with RandomAccess in TestFileStreamRead3.  Since there are always other OS activities going on while running a benchmark, it is hard to know if one method is superior to another when it is this close.  However, these tests have been tested on other systems multiple times with different Windows OSs with the same results, so in this case it appears that the TestFileStreamRead3 method is marginally superior.
The biggest disappointment came with the 5 FileStream asynchronous tests given in TestFileStreamRead4 – TestFileStreamRead8.  These tests all show that reading in a file asynchronously is inferior to reading it in synchronously.  This is even true if other activities like parsing a file is done in between reads.  For example, compare the results of TestFileStreamRead2A which reads in a file synchronously and parses the data against the results of TestFileStreamRead5 which reads in a file asynchronously and parses the data while the next block is read in asynchronously.  Even when the locks have been removed (see TestFileStreamRead8), it is still at least 10% slower than reading the file in synchronously and then parsing the file afterwards (see TestFileStreamRead2A).
The WinFileIO class proved to be the fastest way to read a file in.  It is between 33% to 50% faster than the fastest FileStream read method based upon the measured times above.  However, the last set of tests – TestBinaryReader1NoOpenClose through TestReadFileWinAPINoOpenClose measure how quickly the files are read in after the file is opened.  According to the results, the FileStream read method is just as fast as any of the WinFileIO read methods.  So, it looks like the .NET framework takes longer to open a file than the windows CreateFile function.
Benchmark Write Results:
Running the write file Tests:

Total time writing < 1MB with File.WriteAllLines                             = 00:00:00.0050003

Total time writing 10MB with File.WriteAllLines                              = 00:00:00.0350020

Total time writing 50MB with File.WriteAllLines                              = 00:00:00.1620093

Total time writing < 1MB with File.TestWriteAllText                      = 00:00:00.0040002

Total time writing 10MB with File.TestWriteAllText                       = 00:00:00.0270016

Total time writing 50MB with File.TestWriteAllText                       = 00:00:00.1440082

Total time writing < 1MB with File.WriteAllBytes                             = 00:00:00.3560204

Total time writing 10MB with File.WriteAllBytes                              = 00:00:00.3390194

Total time writing 50MB with File.WriteAllBytes                              = 00:00:00.3530202
Total time writing < 1MB with BinaryWriter.Write                           = 00:00:00.0010001

Total time writing 10MB with BinaryWriter.Write                            = 00:00:00.0050003

Total time writing 50MB with BinaryWriter.Write                            = 00:00:00.3040174
Total time writing < 1MB with StreamWriter1.Write                        = 00:00:00.0030002

Total time writing 10MB with StreamWriter1.Write                         = 00:00:00.0230013

Total time writing 50MB with StreamWriter1.Write                         = 00:00:00.1140065
Total time writing < 1MB with FileStream1.Write no parsing          = 00:00:00.0010001

Total time writing 10MB with FileStream1.Write no parsing           = 00:00:00.0050003

Total time writing 50MB with FileStream1.Write no parsing           = 00:00:00.3670210

Total time writing < 1MB with FileStream2.Write no parsing          = 00:00:00.0070004

Total time writing 10MB with FileStream2.Write no parsing           = 00:00:00.1060061

Total time writing 50MB with FileStream2.Write no parsing           = 00:00:00.5000286

Total time writing < 1MB with FileStream3.Write no parsing          = 00:00:00.0100006

Total time writing 10MB with FileStream3.Write no parsing           = 00:00:00.1150066

Total time writing 50MB with FileStream3.Write no parsing           = 00:00:00.5840334
Total time writing < 1MB with WFIO1.Write No Parsing                  = 00:00:00.0020001

Total time writing 10MB with WFIO1.Write No Parsing                   = 00:00:00.0050003

Total time writing 50MB with WFIO1.Write No Parsing                   = 00:00:00.3530202

Total time writing < 1MB with WFIO2.WriteBlocks No Parsing       = 00:00:00.0010001

Total time writing 10MB with WFIO2.WriteBlocks No Parsing        = 00:00:00.0060003

Total time writing 50MB with WFIO2.WriteBlocks No Parsing        = 00:00:00.0260015

Write file tests have completed.
ANALYSIS OF WRITE RESULTS:
The File class provides the simplest way to write a file out.  Unlike the ReadAllBytes method for reading files, WriteAllBytes is less efficient than WriteAllText according to the results above.
One interesting result is that the times to write out the < 1MB file and 10 MB file for the BinaryWriter, FileStream, and WinFileIO classes are quite similar and fast.  However, the time to write out the 50 MB file takes around 60 times longer than the 10 MB file.  This does not apply to the WinFileIO.WriteBlocks method which proved to be the fastest way to write a file out.  The most likely reason for this is that WriteBlocks writes the file out in 65,536 byte chunks.  However, the TestFileStreamWrite3 test also writes out the file in 65,536 byte chunks and proved to be the slowest method.  I can’t think of a good explanation for this other than perhaps the FileStream.Write method has some issues.


Conclusion:
Any time a benchmark is done trying to test out file I/O methods, it is very difficult to completely trust the results due to the operating system caching files and other OS activities.  If different files are used, then they can be placed on areas of the drive that will yield better performance simply because the drive can access them faster and can impact the results.  So, take a little grain of salt with these benchmark results.  To achieve the best performance for your environment, I would recommend trying out different classes in your production environment to see which yields the best performance.
Having said that and after testing on 3 different machines with similar results, I believe that the best performing file I/O can be obtained from the WinFileIO read methods for reading a file and the WinFileIO.WriteBlocks method for writing files.
I have done similar tests with C++ which are not shown here and believe that the Windows ReadFile and WriteFile methods are the most efficient way to do file I/O from that language as well.
Download and installation:
Click this link or the download button near the top right of this post to download the file.  This file is a zipped file containing 2 visual studio projects.  Extract it into the folder of your choice and leave the folder hierarchy intact.  The code was built and tested with Visual Studio 2008, but it should work with Visual Studio 2010 with little if any modification.  To make it work with previous versions of Visual Studio, you may have to open a new project and  add the individual files to each project.
The following code files are contained in the in the FileTestsForEfficiency folder:
  • TestsForEfficiency.cs - entry point to the application.
  • MainForm.cs - holds the UI designer and button events.
  • FileEffTests.cs - holds the file I/O benchmark tests, which is contained in the FileEfficientTests class.
  • Win32FileIO.cs - holds the class used to implement the Windows ReadFile and WriteFile functionality.
  • WinFileIOUnitTests.cs - holds the unit tests that test out the I/O methods of the WinFileIO class.
The following code files are contained in the FileTestsForEfficiency folder:
  • Win32FileIO.cs - holds the class used to implement file I/O using the Windows ReadFile and Writefile methods.