Briefly, frequency refers to the rate of change. More precisely, the frequency is the inverse of theperiodof the change — that is, the amount of time it takes to cycle from one brightness (or whatever) to a different brightness and back again. The faster then change (e.g. from light to dark), the higher the visual “frequency” required to represent that part of the image.
In other words, you can think of frequency in an image as the rate of change. Parts of the image that change rapidly from one color to another (e.g. sharp edges) contain high frequencies, and parts that change gradually (e.g. large surfaces with solid colors) contain only low frequencies.
When we talk about DCT and FFT and other similar transforms, we’re usually doing them on a portion of an image (e.g. for JPEG compression, edge detection, and so on). It makes the most sense to talk about the transforms, then, in the context of atransform blockof a given size.
Imagine, if you will, a 32 pixel x 32 pixel block of image data. (This number is arbitrary.) Suppose that the image is a simple gradient that is white on the left side, black in the middle, and white on right side. We would say that this signal has a period that is roughly one wavelength per 32 pixels of width, because it goes through a complete cycle from white to black to white again every 32 pixels.
We might arbitrarily call this frequency “1” – 1 cycle per 32 pixels, that is. I vaguely recall that this is commonly called θ in transform textbooks, or maybe θ / 2, but I could be remembering wrong. Either way, we’ll call it 1 for now, because this truly is arbitrary in an absolute sense; what matters is the relationship between frequencies in a relative sense. : -) *****
Suppose you have a second image that is white at one edge, then faded twice as quickly so that it went from white to black, to white, to black, and to white again at the other edge. We would then call that frequency “2” because it changes twice as often over the width of that 32 pixel block.
If we wanted to reproduce those simple images, we could literally say that every row consists of a signal with a frequency of 1 or 2, and you would know what the images look like. If the images went from black to 50% gray, you could do the same thing, but you’d have to say that they had a frequency of 1 or 2 at an intensity of 50%.
Real-world images, of course, aren’t just a simple gradient. The image changes frequently and not periodically as you scan from left to right. However, within a small enough block (eg 8 pixels, 16 pixels) you can approximate that row of pixels as the sum of a series of signals, starting with the average of the pixel values in the row, followed by the amount of the “frequency 0.5” signal (black on one side, fading to white) to blend in (or with a negative amount , the amount of that signal to subtract), followed by the amount of frequency 1, frequency 2, frequency 4, and so on.
Now an image is unique in that it has frequency in both directions; it can get lighter and darker when moving both horizontally and vertically. For this reason, we use 2D DCT or FFT transforms instead of 1D. But the principle is still basically the same. You can precisely represent an 8×8 image with an 8×8 grid of similarly sized buckets.
Images are also more complex because of colors, but we’ll ignore that for now, and assume that we’re looking only at a single greyscale image as you might get by looking at the red channel of a photograph in isolation . *****
As for how to read the results of a transform, that depends on whether you’re looking at a 1D transform or a 2D transform. For a 1D transform, you have a series of bins. The first is the average of all the input values. The second is the amount of the frequency 1 signal to add, the third is the amount of the frequency 2 signal to add, etc.
For a 2D transform, you have annxngrid of values. The upper left is typically that average, and as you go in the horizontal direction, each bucket contains the amount of signal to mix in with a horizontal frequency of 1, 2, 4, etc. and as you go in the vertical direction, it is the amount of signal to mix in with a vertical frequency of 1, 2, 4, etc.
That is, of course, the complete story if you’re talking about a DCT; by contrast, each bin for an FFT contains real and imaginary parts. The FFT is still based on the same basic idea (sort of), except that the way the frequencies are mapped onto bins is different and the math is hairier. : -) *****
Of course, the most common reason to generate these sorts of transforms is to then go one step further and throw some of the data away. For example, the DCT is used in JPEG compression. By reading the values in a zig-zag pattern starting with the upper left (the average) and moving towards the lower right, the most important data (the average and low-frequency information) gets recorded first, followed by progressively higher frequency data. At some point, you basically say “this is good enough” and throw away the highest-frequency data. This essentially smooths the image by throwing away its fine detail, but still gives you approximately the correct image.
And IIRC, FFTs are also sometimes used for edge detection, where you throw away all but the high frequency components as a means of detecting the areas of high contrast at sharp edges.
National Instruments has a nicearticlethat explains this with pictures. : -) *****