Some beginners would like to ask but do not dare:

What is a file format?

To start, let's say it is only the way the data are written in your file. In principle, it should not affect the image itself.

Ok, but an image is only a 2D array (the image is a rectangle), with optionally a third dimension to store bands. A very straightforward way of saving this data in a file is simply to copy the matrix into the file in a very plain way: 1st pixel, 2nd pixel, 3rd pixel, etc up to the last pixel of the first line, then continue on the next line, and so on to the end of the image. This is what does the Envi format. When you have bands, you can even choose if you want to write all the bands values of the same pixel in a row, or write the image of the first band first, then append the second band, etc. In any case, you've got something that is very similar to a copy of the computer memory where the file as to be placed when it is read.


In raw file formats like Envi/IDL, the meta-information is written in a second text file. The rule is usually that the text file as the same name as the data file, only the extension changes (.hdr for Envi).
What is this made for? Well, the data stored in the binary file is only a series of numbers (the pixel value for each band), tey are all inlined. To display it as an image, you need to know the image's dimensions. Basically, when you start drawing the image on screen, you must know when to go back to the next line. In addition, the image can have several bands, so you need to know when to start and stop drawing a given band (or to which channel, red, green or blue to copy the data you are reading in the file). The pixel value can be encoded on a byte or an integer or float values in your file, and you need to save this information too. For geo-referenced images, you have to give also the projection, pixel sizes and at least the upper left coordinates of the image.

If you have an Envi file, say africa.hdr, simply display the content of the hdr file:
more africa.hdr
the content should be something like:

ENVI
description = {
ENVI File, Created [Wed Aug 20 10:07:18 2008]}
samples = 9633
lines = 8177
bands = 1
header offset = 0
file type = ENVI Standard
data type = 1
interleave = bsq
sensor type = Unknown
byte order = 0
map info = {Geographic Lat/Lon, 1.0000, 1.0000, -26.00446429, 38.00446429, 8.9285714286e-03, 8.9285714286e-03, WGS-84, units=Degrees}
wavelength units = Unknown


Now, the second beginner's question is

why do we have several files formats?

In the past, most formats used to be proprietary, i.e. they were designed by a company to be used with a given software and the specifications of the format (how you read and write the file) were not publicly available.

Although the idea of making a copy of the memory on disk and of storing meta-information in a text file is easy to understand and to handle through a simple piece of home-made code, it is not robust enough for the most usage, whether it is about photography or satellite imagery. A single file, containing the meta-information of the image is way more robust: easier to carry, no chance to loose the meta-information text file, to rename the image and forgot to rename the text file, or to corrupt the file when editing it! In addition, a plain copy of the memory is not very efficient in term of disk usage and access.

Long stories made short, more sophisticated ways of storing files were found.
First, most file formats store their meta-information and image in a single file, second, the arrangement of data (I mean the pixels values) inside is also optimized. For example, geotiff stores values by groups of lines, a table indicating where to jump in the file to access to a given chunk of data.

For sure, with data format that are not a plain copy of the memory, you need a driver to read/write the file. Learning how to use those libraries is the price to pay to get more efficiency. I'll say more in next posts.

Among the tons of options allowed by modern formats, there is one I appreciate a lot: the internal compression. Satellite images, and especially derived products like classifications and masks, can have higly redundant information: a group of pixels can have the same values (like same class code), or in an image, large surfaces like ocean can have the same value. In this case a lossless compression can save a lot of space on your hard-drive. But if you zip your file, you need to unzip it, which can become very dull. It is way better to have the data compressed inside the image, so the decompression task  is done on the fly by the driver, seamlessly (and very quickly).

I'll say more on lossless compression  in a future post. In the mean time, have a look to format details on http://www.gdal.org/formats_list.html (click on each format name to know more).