A gentle introduction to video technology, although it’s aimed at software developers / engineers, we want to make it easyfor anyone to learn. This idea was born during amini workshop for newcomers to video technology.
The goal is to introduce some digital video concepts with asimple vocabulary, lots of visual elements and practical exampleswhen possible, and make this knowledge available everywhere. Please, feel free to send corrections, suggestions and improve it.
There will behands-onsections which require you to havedocker installedand this repository cloned.
git clone https://github.com/leandromoreira/digital_video_introduction. gitCDdigital_video_introduction ./setup.sh
WARNING: when you see a
./s / ffmpeg
or./s / mediainfo
command, it means we’re running acontainerized versionof that program, which already includes all the needed requirements.
All thehands-on should be performed from the folder you clonedthis repository. For thejupyter examplesyou must start the server./ s / start_jupyter. sh
and copy the URL and use it in your browser.
- added DRM system
- released version 1.0.0
- added simplified Chinese translation
- Intro
- Index
- Basic terminology
- Redundancy removal
- How does a video codec work?
- Online streaming
- How to use jupyter
- Conferences
- References
Animagecan be thought of as a2D matrix. If we think aboutcolors, we can extrapolate this idea seeing this image as a3D matrixwhere theadditional dimensionsare used to providecolor data.
If we chose to represent these colors using theprimary colors (red, green and blue), we define three planes: the first one forred, the second forgreen, and the last one for theblueColor.
We’ll call each point in this matrixa pixel(picture element). One pixel represents theintensity(usually a numeric value) of a given color. For example, ared pixelmeans 0 of green, 0 of blue and maximum of red. Thepink color pixelcan be formed with a combination of the three colors. Using a representative numeric range from 0 to 255, the pink pixel is defined byRed=255, Green=192 and Blue=203.
Other ways to encode a color image
Many other possible models may be used to represent the colors that make up an image. We could, for instance, use an indexed palette where we’d only need a single byte to represent each pixel instead of the 3 needed when using the RGB model. In such a model we could use a 2D matrix instead of a 3D matrix to represent our color, this would save on memory but yield fewer color options.
For instance, look at the picture down below. The first face is fully colored. The others are the red, green, and blue planes (shown as gray tones).
We can see that thered colorwill be the one thatcontributes more(the brightest parts in the second face) to the final color while the (blue color) contribution can be mostlyonly seen in Mario’s eyes(last face) and part of his clothes, see howall planes contribute less(darkest parts) to theMario’s mustache.
And each color intensity requires a certain amount of bits, this quantity is known as (bit depth. Let’s say we spend8 bits(accepting values from 0 to 255) per color (plane), therefore we have acolor depthof24 bits(8 bits * 3 planes R / G / B), and we can also infer that we could use 2 to the power of (different colors.)
It’s greatto learnhow an image is captured from the world to the bits.
Another property of an image is theresolution, which is the number of pixels in one dimension. It is often presented as width × height, for example, the4 × 4image below.
Hands-on: play around with image and color
You canplay around with image and colorsusingjupyter( python, numpy, matplotlib and etc).
You can also learnhow image filters (edge detection, sharpen, blur …) work.
Another property we can see while working with images or video is theaspect ratiowhich simply describes the proportional relationship between width and height of an image or pixel.
When people says this movie or picture is16 x9they usually are referring to theDisplay Aspect Ratio (DAR), however we also can have different shapes of individual pixels, we call thisPixel Aspect R atio (PAR).
DVD is DAR 4: 3
Although the real resolution of a DVD is 704 x 480 it still keeps a 4: 3 aspect ratio because it has a PAR of 10: 11 ( (x) / 480 x 11)
Finally, we can define avideoas asuccession of (n) framesintimewhich can be seen as another dimension, (n) is the frame rate or frames per second (FPS).
The number of bits per second needed to show a video is itsbit rate.
bit rate=width * height * bit depth * frames per second
For example, a video with 30 frames per second, 24 bits per pixel, resolution of (x) will need82, 944, 00 0 bits per secondor 82. (Mbps) 30 x (x) **************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************** (x) ) if we don’t employ any kind of compression.
When thebit rateis nearly constant it’s called constant bit rate (CBR) but it also can vary then called variable bit rate (VBR).
This graph shows a constrained VBR which doesn’t spend too many bits while the frame is black.
In the early days, engineers came up with a technique for doubling the perceived frame rate of a video displaywithout consuming extra bandwidth. This technique is known asinterlaced video; it basically sends half of the screen in 1 “frame” and the other half in the next “frame”.
Today screens render mostly usingprogressive scan technique. Progressive is a way of displaying, storing, or transmitting moving images in which all the lines of each frame are drawn in sequence.
Now we have an idea about how animageis represented digitally, how its (colors) are arranged, how manybits per seconddo we spend to show a video, if it’s constant (CBR ) or variable (VBR), with a givenresolutionusing a givenframe rateand many other terms such as interlaced, PAR and others.
Hands-on: Check video properties
You cancheck most of the explained properties with ffmpeg or mediainfo.
We learned that it’s not feasible to use video without any compression;a single one hour videoat 720 p resolution with 30 fps wouldrequire (GB) *. Sinceusing solely lossless data compression algorithmslike DEFLATE (used in PKZIP, Gzip, and PNG),won’tdecrease the required bandwidth sufficiently we need to find other ways to compress the video.
*We found this number by multiplying (x) ******************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************* (x) ************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************ (x) x 3600 ( width, height, bits per pixel, fps and time in seconds)
In order to do this, we canexploit how our vision works. We’re better at distinguishing brightness than colors, therepetitions in time, a video contains a lot of images with few changes, and therepetitions within the image, each frame also contains many areas using the same or similar color.
Colors, Luminance and our eyes
Our eyes aremore sensitive to brightness than colors, you can test it for yourself, look at this picture.
If you are unable to see that the colors of thesquares A and B are identicalon the left side, that’s fine, it’s our brain playing tricks on us topay more attention to light and dark than color. There is a connector, with the same color, on the right side so we (our brain) can easily spot that in fact, they’re the same color.
Simplistic explanation of how our eyes workTheeye is a complex organ, it is composed of many parts but we are mostly interested in the cones and rods cells. The eyecontains about 120 million rod cells and 6 million cone cells.
ToOversimplify, let’s try to put colors and brightness in the eye’s parts function. Therod cellsare mostly responsible for brightnesswhile thecone cellsare responsible for color, there are three types of cones, each with different pigment, namely:S-cones (Blue), M-cones (Green) and L-cones (Red).
Since we have many more rod cells (brightness) than cone cells (color), one can infer that we are more capable of distinguishing dark and light than colors.
Contrast sensitivity functions
Researchers of experimental psychology and many other fields have developed many theories on human vision. And one of them is called Contrast sensitivity functions. They are related to spatio and temporal of the light and their value presents at given init light, how much change is required before an observer reported there was a change. Notice the plural of the word “function”, this is for the reason that we can measure Contrast sensitivity functions with not only black-white but also colors. The result of these experiments shows that in most cases our eyes are more sensitive to brightness than color.
Once we know that we’re more sensitive toluma(the brightness in an image) we can try to exploit it.
Color model
We first learnedhow to color imageswork using theRGB model, but there are other models too. In fact, there is a model that separates luma (brightness) from chrominance (colors) and it is known asYCbCr*.
*there are more models which do the same separation.
This color model usesYto represent the brightness and two color channelsCb(chroma blue) andCr(chroma red). The (YCbCr) can be derived from RGB and it also can be converted back to RGB. Using this model we can create full colored images as we can see down below.
Converting between YCbCr and RGB
Some may argue, how can we produce all thecolors without using the green?
To answer this question, we’ll walk through a conversion from RGB to YCbCr. We’ll use the coefficients from thestandard BT. 601that was recommended by the(ITU-R group) *. The first step is tocalculate the luma, we’ll use the constants suggested by ITU and replace the RGB values.
Y=0. (R 0.) G 0. 114 B
Once we had the luma, we cansplit the colors(chroma blue and red):
Cb=0. 564 (B - Y) Cr=0. (R - Y)
And we can alsoconvert it backand even get thegreen by using YCbCr.
R=Y 1. 402 Cr B=Y 1. 772 Cb G=Y - 0. (Cb - 0.) Cr
*groups and standards are common in digital video, they usually define what are the standards, for instance,what is 4K? what frame rate should we use? resolution? color model?
Generally,displays(monitors, TVs, screens and etc) utilizeonly the RGB model, organized in different manners, see some of them magnified below:
Chroma subsampling
With the image represented as luma and chroma components, we can take advantage of the human visual system’s greater sensitivity for luma resolution rather than chroma to selectively remove information.Chroma subsamplingis the technique of encoding images usingless resolution for chroma than for luma.
How much should we reduce the chroma resolution ?! It turns out that there are already some schemas that describe how to handle resolution and the merge (final color=Y Cb Cr
).
These schemas are known as subsampling systems and are expressed as a 3 part ratio –A: X: Y
which defines the chroma resolution in relation to aax 2
block of luma pixels.
a
is the horizontal sampling reference (usually 4)- (x) is the number of chroma samples in the first row of (a) ********************************* (pixels) horizontal resolution in relation to
a
) - (y) is the number of changes of chroma samples between the first and seconds rows of (a) ********************************* (pixels.)
An exception to this exists with 4: 1: 0, which provides a single chroma sample within each (4 x 4) block of luma resolution.
Common schemes used in modern codecs are:4: 4: 4(no subsampling),4: 2: 2, 4: 1 : 1, 4: 2: 0, 4: 1: 0 and 3: 1: 1.
YCbCr 4: 2: 0 merge
Here’s a merged piece of an image using YCbCr 4: 2: 0, notice that we only spend 12 bits per pixel.
You can see the same image encoded by the main chroma subsampling types, images in the first row are the final YCbCr while the last row of images shows the chroma resolution. It’s indeed a great win for such small loss.
Previously we had calculated that we needed278 GB of storage to keep a video file with one hour at 720 p resolution and 30 FPS. If we useYCbCr 4: 2: 0we can cutthis size in half (139 GB)*but it is still far from ideal.
*we found this value by multiplying width, height, bits per pixel and fps. Previously we needed 24 bits, now we only need 12.
Hands-on: Check YCbCr histogram
You cancheck the YCbCr histogram with ffmpeg.This scene has a higher blue contribution, which is showed by theHistogram.
Hands-on: A video with a single I -frameSince a P-frame uses less data why can’t we encode an entirevideo with a single I-frame and all the rest being P-frames?
After you encoded this video, start to watch it and do aseek for an advancedpart of the video, you’ll noticeit takes some timeto really move to that part. That’s because aP-frame needs a reference frame(I-frame for instance) to be rendered.
Another quick test you can do is to encode a video using a single I-Frame and thenencode it inserting an I-frame each 2sandcheck the size of each rendition.
B Frame (bi-predictive)
Hands-on: A video with a single I -frame
Since a P-frame uses less data why can’t we encode an entirevideo with a single I-frame and all the rest being P-frames?
After you encoded this video, start to watch it and do aseek for an advancedpart of the video, you’ll noticeit takes some timeto really move to that part. That’s because aP-frame needs a reference frame(I-frame for instance) to be rendered.
Another quick test you can do is to encode a video using a single I-Frame and thenencode it inserting an I-frame each 2sandcheck the size of each rendition.
What about referencing the past and future frames to provide even a better compression ?! That’s basically what a B-frame is.
Hands-on: Compare videos with B-frame
You can generate two renditions, first with B-frames and other withno B-frames at alland check the size of the file as well as the quality.
Summary
These frames types are used toprovide better compression. We’ll look how this happens in the next section, but for now we can think ofI-frame as expensive while P-frame is cheaper but the cheapest is the B-frame.
Temporal redundancy (inter prediction)
Let’s explore the options we have to reduce therepetitions in time, this type of redundancy can be solved with techniques ofinter prediction.
We will try tospend fewer bitsto encode the sequence of frames 0 and 1.
One thing we can do it’s a subtraction, we simplysubtract frame 1 from frame 0and we get just what we need toencode the residual.
But what if I tell you that there is abetter methodwhich uses even fewer bits ?! First, let’s treat theframe 0
as a collection of well-defined partitions and then we’ll try to match the blocks from (frame 0) on (frame 1. We can think of it asmotion estimation.
We could estimate that the ball moved fromx=0, y=25
tox=6, y=26
, thexandyvalues are themotion vectors. Onefurther stepwe can do to save bits is toencode only the motion vector differencebetween the last block position and the predicted, so the final motion vector would be (x=6 (6-0), y=1) 26 – 25)
In a real-world situation, thisball would be sliced into n partitionsbut the process is the same.
The objects on the framemove in a 3D way, the ball can become smaller when it moves to the background. It’s normal thatwe won’t find the perfect matchto the block we tried to find a match. Here’s a superposed view of our estimation vs the real picture.
But we can see that when we applymotion estimationthedata to encode is smallerthan using simply delta frame techniques.
How real motion compensation would look
This technique is applied to all blocks, very often a ball would be partitioned in more than one block.Source:https://web.stanford.edu/class/ee 398 a / handouts / lectures / EE 398 a_MotionEstimation _ 2012 .pdf
You canplay around with these concepts using jupyter.
Hands-on: See the motion vectors
We cangenerate a video with the inter prediction (motion vectors) with ffmpeg.
Or we can use theIntel Video Pro Analyzer(which is paid but there is a free trial version which limits you to only the first (frames).
Spatial redundancy (intra prediction)
If we analyzeeach framein a video we’ll see that there are alsomany areas that are correlated.
Let’s walk through an example. This scene is mostly composed of blue and white colors.
This is anI-frame
and wecan’t use previous framesto predict from but we still can compress it. We will encode the red block selection. If welook at its neighbors, we canestimatethat there is atrend of colors around it.
We willpredictthat the frame will continue tospread the colors vertically, it means that the colors of theunknown pixels will hold the values of its neighbors.
Ourprediction can be wrong, for that reason we need to apply this technique (intra prediction) and thensubtract the real values which gives us the residual block, resulting in a much more compressible matrix compared to the original.
Hands-on: Check intra predictions
You cangenerate a video with macro blocks and their predictions with ffmpeg.Please check the ffmpeg documentation to understand themeaning of each block color
Or we can use theIntel Video Pro Analyzer(which is paid but there is a free trial version which limits you to only the first (frames).
What? Why? How?
What?It’s a piece of software / hardware that compresses or decompresses digital video.Why?Market and society demands higher quality videos with limited bandwidth or storage. Remember when wecalculated the needed bandwidthfor 30 frames per second, 24 bits per pixel, resolution of a 480 x 240 video ? It was82. (Mbps) with no compression applied. It’s the only way to deliver HD / FullHD / 4K in TVs and the Internet.How?We’ll take a brief look at the major techniques here.
CODEC vs Container
One common mistake that beginners often do is to confuse digital video CODEC anddigital video container. We can think ofcontainersas a wrapper format which contains metadata of the video (and possible audio too), and thecompressed videocan be seen as its payload.
Usually, the extension of a video file defines its video container. For instance, the file
video.mp4
is probably a(MPEG-4 Part)container and a file namedvideo.mkv
it’s probably aMatroska. To be completely sure about the codec and container format we can useffmpeg or mediainfo.
History
Before we jump into the inner workings of a generic codec, let’s look back to understand a little better about some old video codecs.
The video codec (H.) was born in 1990 (technically 1988), and it was designed to work withdata rates of 64 kbit / s. It already uses ideas such as chroma subsampling, macro block, etc. In the year of 1995, theH. 263video codec standard was published and continued to be extended until 2001.
In 2003 the first version ofH. 264 / AVCwas completed. In the same year, a company calledTrueMotionreleased their video codec as aroyalty-freelossy video compression calledVP3. In 2008,Google boughtthis company, releasing (VP8) in the same year. In December of 2012, Google released theVP9and it’ssupported by roughly ¾ of the browser market(mobile included).
AV1(is a new) royalty-freeand open source video codec that’s being designed by theAlliance for Open Media (AOMedia), which is composed of thecompanies: Google, Mozilla, Microsoft, Amazon, Netflix, AMD, ARM, NVidia, Intel and Ciscoamong others. Thefirst version0.1.0 of the reference codec waspublished on April 7,.
Thor.
, Xiph (Mozilla) was working on (Daalaand Cisco open-sourced its royalty-free video codec calledThen MPEG LA first announced annual caps for HEVC (H. 265) and fees 8 times higher than H. 264 but soon they changed the rules again:
- no annual cap,
- content fee(0.5% of revenue) and
- per-unit fees about 10 times higher than h 264.
Thealliance for open mediawas created by companies from hardware manufacturer (Intel, AMD, ARM, Nvidia, Cisco), content delivery (Google, Netflix, Amazon), browser maintainers (Google, Mozilla), and others.
The companies had a common goal, a royalty-free video codec and then AV1 was born with a muchsimpler patent license.Timothy B. Terriberrydid an awesome presentation, which is the source of this section, about theAV1 conception, license model and its current state.
You’ll be surprised to know that you cananalyze the AV1 codec through your browser, go tohttp://aomanalyzer.org/
PS: If you want to learn more about the history of the codecs you must learn the basics behindvideo compression patents.
Probability | 0.3 | 0.3 | 0.2 | 0.2 |
We can assign unique binary codes (preferable small) to th e most probable and bigger codes to the least probable ones.
Probability | 0.3 | 0.3 | 0.2 | 0.2 |
0 | 10 | 110 | 1110 |
Let’s compress the stream (eat) , assuming we would spend 8 bits for each symbol, we would spend24 bitswithout any compression. But in case we replace each symbol for its code we can save space.
The first step is to encode the symbolewhich is10
and the second symbol isawhich is added (not in a mathematical way)[10] [0]
and finally the third symboltwhich makes our final compressed bitstream to be[0] [1110]
or1001110
which only requires (7 bits) (3.4 times less space than the original).
Notice that each code must be a unique prefixed codeHuffman can help you to find these numbers. Though it has some issues there arevideo codecs that still offersthis method and it’s the algorithm for many applications which requires compression.
Both encoder and decodermust knowthe symbol table with its code, therefore, you need to send the table too.
Arithmetic coding:
Let’s suppose we have a stream of the symbols:a, (e) , (r) ,sandtand their probability is represented by this table.
s | |||||
---|---|---|---|---|---|
Probability | 0.3 | 0.3 | 0. 15 | 0. 05 | 0.2 |
With this table in mind, we can build ranges containing al l the possible symbols sorted by the most frequents.
Now let’s encode the streameat, we pick the first symbol (e) which is located within the subrange0.3 to 0.6(but not included) and we take this subrange and split it again using the same proportions used before but within this new range.
Let’s continue to encode our streameat, now we take the second symbolawhich is within the new subrange0.3 to 0. 39and then we take our last symbol (t) and we do the same process again and we get the last subrange0. 354 to 0. 372.
We just need to pick a number within the last subrange0. 354 to 0. 372, let’s choose0. 36but we could choose any number within this subrange. Withonlythis number we’ll be able to recover our original streameat. If you think about it, it’s like if we were drawing a line within ranges of ranges to encode our stream.
Thereverse process(AKA decoding) is equally easy, with our number0. 36and our original range we can run the same process but now using this number to reveal the stream encoded behind this number.
With the first range, we notice that our number fits at the slice, therefore, it’s our first symbol, now we split this subrange again, doing the same process as before, and we’ll notice that0. 36fits the symbolaand after we repeat th e process we came to the last symbolt(forming our original encoded streameat).
Both encoder and decodermust knowthe symbol probability table, therefore you need to send the table.
Pretty neat, isn’t it? People are damn smart to come up with a such solution, somevideo codecs usethis technique (or at least offer it as an option).
The idea is to lossless compress the quantized bitstream, for sure this article is missing tons of details, reasons, trade-offs and etc. Butyou should learn moreas a developer. Newer codecs are trying to use differententropy coding algorithms like ANS.
Hands-on: CABAC vs CAVLC
You cangenerate two streams, one with CABAC and other with CAVLCandcompare the timeit took to generate each of them as well asthe final size.
6th step – bitstream format
After we did all these steps we need topack the compressed frames and context to these steps. We need to explicitly inform to the decoder aboutthe decisions taken by the encoder, such as bit depth, color space, resolution, predictions info (motion vectors, intra prediction direction), profile, level, frame rate, frame type, frame number and much more.
We’re going to study, superficially, the H. 264 Bitstream. Our first step is togenerate a minimal H. 264* (bitstream) , we can do that using our own repository andffmpeg.
./ s / ffmpeg -i /files/i/minimal.png -pix_fmt yuv 420 p / files / v / minimal_yuv 420 .H 264
*ffmpeg adds, by default, all the encoding parameter as aSEI NAL, soon we’ll define what is a NAL.
This command will generate a raw h 264 bitstream with asingle frame, (x) , with color space yuv 420 and using the following image as the frame.
H. 264 Bitstream
The AVC (H. 264) standard defines that the information will be sent inmacro frames(in the network sense), called(NAL)(Network Abstraction Layer). The main goal of the NAL is the provision of a “network-friendly” video representation, this standard must work on TVs (stream based), the Internet (packet based) among others.
There is asynchronization markerto define the boundaries of the NAL’s units. Each synchronization marker holds a value of0x (0x) ******************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************* (0x)
except to the very first one which is0x (0x) ******************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************* (0x) ******************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************* (0x)
. If we run thehexdumpon the generated h 264 bitstream, we can identify at least three NALs in the beginning of the file.
As we said before, the decoder needs to know not only the picture data but also the details of the video, frame, colors, used parameters, and others. Thefirst byteof each NAL defines its category andtype.
NAL type id | Description |
---|---|
Undefined | |
Coded slice of a non-IDR picture | |
Coded slice data partition A | |
Coded slice data partition B | |
Coded slice data partition C | |
IDRCoded slice of an IDR picture | |
SEISupplemental enhancement information | |
SPSSequence parameter set | |
PPSPicture parameter set | |
Access unit delimiter | |
10 | End of sequence |
11 | End of stream |
… | … |
Usually, the first NAL of a bitstream is aSPS, this type of NAL is responsible for informing the general encoding variables likeprofile,level,resolutionand others.
If we skip the first synchronization marker we can decode thefirst byteto know whattype of NALis the first one.
For instance the first byte after the synchronization marker is01100111
, where the first bit ( (0) ) is to the fieldforbidden_zero_bit, the next 2 bits11
) tell us the fieldnal_ref_idcwhich indicates whether this NAL is a reference field or not and the rest 5 bits (00111
) inform us the fieldnal_unit_type, in this case, it’s a (SPS) ************************ (7) NAL unit.
The second byte (binary=01100100, hex=0x 64, dec=100
) of an SPS NAL is the fieldprofile_idcwhich shows the profile that the encoder has used, in this case, we used theconstrained high-profile, it’s a high profile without the support of B (bi-predictive) slices.
When we read the H. 264 bitstream spec for an SPS NAL we’ll find many values for the (parameter name,categoryand adescription, for instance, let’s look atpic_width_in_mbs_minus_1
andpic_height_in_map_units_minus_1
fields.
Parameter name | Description | |
---|---|---|
pic_width_in_mbs_minus_1 | 0 | ue (v) |
pic_height_in_map_units_minus_1 | 0 | ue (v) |
ue (v): unsigned integerExp-Golomb-coded
If we do some math with the value of these fields we will end up with theresolution. We can represent a1920 x 1080
using apic_width_in_mbs_minus_1
with the value of119 ((119 1) * macroblock_size=120 * 16=1920)
, again saving space, instead of encode1920
we did it with119
.
If we continue to examine our created video with a binary view (ex:xxd -b -c 11 v / minimal_yuv 420 .H 264
), we can skip to the last NAL which is the frame itself.
We can see its first 6 bytes values:01100101 10001000 10000100 00 00 00 00 00100001 11111111
. As we already know the first byte tell us about what type of NAL it is, in this case, (00 101
) it’s anIDR Slice (5)and we can further inspect it:
Using the spec info we can decode what type of slice (slice_type), the frame number (frame_num) among others important fields.
In order to get the values of some fields (ue (v), me (v), se (v) or te (v)
) we need to decode it using a special decoder calledExponential-Golomb, this method isvery efficient to encode variable values , mostly when there are many default values.
The values ofslice_typeandframe_numof this video are 7 (I slice) and 0 (the first frame).
We can see thebitstream as a protocoland if you want or need to learn more about this bitstream please refer to the (ITU H.) spec.Here’s a macro diagram which shows where the picture data (compressed YUV) resides.
We can explore others bitstreams like theVP9 bitstream, (H.) *********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************** ((HEVC)or even our (new best friend)(AV1) ************************ (bitstream,do they all look similar? No, but once you learned one you can easily get the others.
Hands-on: Inspect the H. 264 Bitstream
We cangenerate a single frame videoand use (mediainfo) to inspect its H. 264 Bitstream. In fact, you can even see thesource code that parses h 264 (AVC)bitstream.
We can also use theIntel Video Pro Analyzerwhich is paid but there is a free trial version which limits you to only the first 10 frames but that’s okay for learning purposes.
Review
We’ll notice that many of themodern codecs uses this same model we learned. In fact, let’s look at the Thor video codec block diagram, it contains all the steps we studied. The idea is that you now should be able to at least understand better the innovations and papers for the area.
Previously we had calculated that we needed139 GB of storage to keep a video file with one hour at 720 p resolution and 30 FPSif we use the techniques we learned here, likeinter and intra prediction, transform, quantization, entropy coding and otherwe can achieve, assuming we are spending (0.0) bit per pixel, the same perceivable quality videorequiring only 367. (MB vs) ******************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************* (GB) of store.
We choose to use0.0 (bit per pixel) based on the example video provided here.
How does H. 265 achieve a better compression ratio than H. 264?
Now that we know more about how codecs work, then it is easy to understand how new codecs are able to deliver higher resolutions with fewer bits.
We will compare AVC and HEVC, let’s keep in mind that it is almost always a trade-off between more CPU cycles (complexity) and compression rate.
HEVC has bigger and morepartitions(andsub-partitions) options than AVC, moreintra predictions directions,improved entropy codingand more, all these improvements made H. (capable to compress) ********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************* (% more than H.) .
General architecture
[TODO]
Progressive download and adaptive streaming
[TODO]
Content protection
We can usea simple token systemto protect the content. The user without a token tries to request a video and the CDN forbids her or him while a user with a valid token can play the content, it works pretty similarly to most of the web authentication systems.
The sole use of this token system still allows a user to download a video and distribute it. Then theDRM (digital rights management)systems can be used to try to avoid this.
In real life production systems, people often use both techniques to provide authorization and authentication.
DRM
Main systems
- (FPS -)FairPlay Streaming
- PR –(PlayReady) ************************
- WV –Widevine
What?
DRM means Digital rights management, it’s a wayto provide copyright protection for digital media, for instance, digital video and audio. Although it’s used in many places (it’s not universally accepted) .
Why?
Content creator (mostly studios) want to protect its intelectual property against copy to prevent unauthorized redistribution of digital media.
How
We’re going to describe an abstract and generic form of DRM in a very simplified way.
Given acontent C1(ie an hls or dash video streaming), with aplayer P1(ie shaka-clappr, exo-player or ios) in a (device D1) ************************ (ie a smartphone, TV, tablet or desktop / notebook (using a) DRM system DRM1(widevine, playready or FairPlay).
The content C1 is encrypted with asymmetric-key K1from the system DRM1, generating theencrypted content C’1.
The player P1, of a device D1, has two keys (asymmetric), aprivate key PRK1(this key is protected (1) and only known by (D1) ) and apublic key PUK1.
(1) protected: this protection can bevia hardware, for instance, this key can be stored inside a special (read-only) chip that works like (a black-box) to provide decryption, orby software(less safe), the DRM system provides means to know which type of protection a given device has.
When theplayer P1 wants to playthecontent C’1, it needs to deal with theDRM system DRM1, giving its public keyPUK1. The DRM system DRM1 returns thekey K1 encryptedwith the client”s public keyPUK1. In theory, this response is something thatonly D1 is capable of decrypting.
K1P1D1=enc (K1, PUK1)
P1uses its DRM local system (it could be aSOC, a specialized hardware or software), this system isable to decryptthe content using its private key PRK1, it can decryptthe symmetric-key K1 from the K1P1D1andplay C’1. At best case, the keys are not exposed through RAM.
K1=dec (K1P1D1, PRK1) P1.play (dec (C'1, K1))
Make sure you havedocker installedand just run./ s / start_jupyter.sh
and follow the instructions on the terminal.
The richest content is here, it’s where all the info we saw in this text was extracted, based or inspired by. You can deepen your knowledge with these amazing links, books, videos and etc.
Online Courses and Tutorials:
- https://www.coursera.org/learn/digital/
- https://people.xiph.org/~ tterribe / pubs / lca (/ Auckland / intro_to_video1.pdf)
- https://xiph.org/video/ vid1.shtml
- https://xiph.org/video /vid2.shtml
- http://slhck.info/ ffmpeg-encoding-course
- http: // www .cambridgeincolour.com / tutorials / camera-sensors.htm
- http: //www.slideshare .net / vcodex / a-short-history-of-video-coding
- http: // www. slideshare.net/vcodex/introduction-to-video-compression-13394338
- https: // developer .android.com / guide / topics / media / media-formats.html
- http: // www.slideshare.net/MadhawaKasun/audio-compression-23398426
- http: / /inst.eecs.berkeley.edu/~ee290 T / SP 04 / lectures / 02 – Motion_Compensation_girod.pdf
Books:
- https://www.amazon.com/Understanding-Compression -Modern-Developers / dp / 1491961538 / ref=sr_1_1? s=books & ie=UTF8 & qid=1486395327 & sr=1-1
- https : //www.amazon.com/H- 264 – Advanced-Video-Compression-Standard / dp / ()
- https://www.amazon.com/Practical-Guide-Video-Audio-Compression/dp/0240806301 / ref=sr_1_3? s=books & ie=UTF8 & qid=1486396914 & sr=1-3 & keywords=A PRACTICAL GUIDE TO VIDEO AUDIO
- (https://www.amazon.com/Video-Encoding-Numbers-Eliminate-Guesswork/dp/) / ref=sr_1_1? s=books & ie=UTF8 & qid=1486396940 & sr=1-1 & keywords=jan ozer
Bitstream Specifications:
- http://www.itu.int/rec/T-REC-H. – (I)
- http://www.itu.int/ITU-T/recommendations/rec.aspx?rec=12904 & lang=en
- https://storage.googleapis.com/ downloads.webmproject.org/docs/vp9/vp9-bitstream-specification-v0.6-20160331 – draft.pdf
- http://iphome.hhi.de/wiegand/assets/pdfs/ 2012 _ 12 _ IEEE-HEVC-Overview.pdf
- http://phenix.int-evry.fr/jct/doc_end_user/current_document.php?id=7243
- http://gentlelogic.blogspot.com.br/ 2011 / 11 / exploring-h 264 -part-2-h 264 – bitstream.html
- https://forum.doom9.org/showthread.php?t=167081
- https://forum.doom9.org/showthread.php?t=168947
Software:
- https://ffmpeg.org/
- https://ffmpeg.org/ffmpeg-all.html
- https://ffmpeg.org/ffprobe.html
- https://trac.ffmpeg.org/wiki/
- https://software.intel.com/en-us/intel-video-pro-analyzer
- https://medium.com/ @ mbebenita / av1-bitstream-analyzer-d 25 F1C 27072 b # .d5a (oxz8)
Non-ITU Codecs:
- https : //aomedia.googlesource.com/
- https://github.com/webmproject/libvpx/tree/master/vp9
- https://people.xiph.org/~xiphmont/demo/daala/demo1.shtml
- https://people.xiph.org/~jm/daala/revisiting/
- https://www.youtube.com/watch?v=lzPaldsmJbk
- https://fosdem.org/ 2017 / schedule / event / om_av1 /
- https://jmvalin.ca/papers/AV1_tools.pdf
Encoding Concepts:
- http: // x 265 .org / hevc-h 265 /
- http://slhck.info/video/ 2017 / 03 / 01 / rate-control.html
- http://slhck.info/video/ 2017 / 02 / 24 / vbr-settings. html
- http://slhck.info/video/ 2017 / 02 / 24 / crf-guide .html
- https://arxiv.org/pdf/ 1702. 00817 v1. pdf
- https://trac.ffmpeg.org/wiki/Debug/MacroblocksAndMotionVectors
- http://web.ece.ucdavis.edu/cerl/ReliableJPEG/Cung/jpeg.html
- http://www.adobe.com/devnet/adobe-media-server/articles/h (_ encoding.html)
- https://prezi.com/8m7thtvl4ywr/mp3-and-aac-explained/
- https://blogs.gnome.org/rbultje/ 2016 / 12 / 13 / overview-of-the-vp9-video-codec /
- https://videoblerg.wordpress.com/ 2017 / 11 / 10 / ffmpeg-and-how-to-use-it-wrong /
Video Sequences for Testing:
- http://bbb3d.renderfarming.net/download.html
- https://www.its.bldrdoc.gov/vqeg/video-datasets-and- organizations.aspx
Miscellaneous:
- https://github.com/Eyevinn / streaming-onboarding
- http://stackoverflow.com/a/ 24890903
- http://stackoverflow.com/questions/ 38094302 / how-to-understand-header-of-h 264
- http://techblog.netflix.com/ 2016 / 08 / a-large- scale-comparison-of-x 264 – x 265 .html
- http://vanseodesign.com/web-design/color-luminance/
- http://www.biologymad.com/nervoussystem/eyenotes.htm
- http://www.compression.ru/video/codec_comparison/h 264 _ 2012 / mpeg4_avc_h 264 _ video_codecs_comparison.pdf
- http://www.csc.villanova.edu/~rschumey / csc 4800 / dct.html
- http://www.explainthatstuff.com/digitalcameras.html
- http://www.hkvstar.com
- (http://www.hometheatersound.com/)
- http://www.lighterra.com/ papers / videoencodingh 264 /
- http://www.red.com / learn / red – 101 / video- chroma-subsampling
- http: //www.slideshare. net / ManoharKuse / hevc-intra-coding
- http: //www.slideshare .net / mwalendo / h (vs-hevc)
- http: // www. slideshare.net/rvarun7777 / final-seminar – 46117193
- http: // www .springer.com / cda / content / document / cda_downloaddocument / 9783642147029 – c1.pdf
- http: // www.streamingmedia.com/Articles/Editorial/Featured-Articles/A-Progress-Report-The-Alliance-for-Open-Media-and-the-AV1-Codec-110383 .aspx
- http: / /www.streamingmediaglobal.com/Articles/ReadArticle.aspx?ArticleID=116505 & PageNum=1
- http: //yumichan.net/video-processing/video-compression/introduction-to-h264 – nal-unit /
- https : //cardinalpeak.com/blog/the-h- 264 – sequence-parameter-set /
- https://cardinalpeak.com/blog/worlds-smallest-h-264 – encoder /
- https://codesequoia.wordpress.com/category/video/
- https://developer.apple.com/library/content/technotes/tn 2224 / _ index.html
- https://en.wikibooks.org/wiki/MeGUI/x 264 _ Settings
- https://en.wikipedia.org/wiki/Adaptive_bitrate_streaming
- https://en.wikipedia.org/wiki/AOMedia_Video_1
- https://en.wikipedia.org/wiki/Chroma_subsampling#/media/File:Colorcomp.jpg
- https://en.wikipedia.org/wiki/Cone_cell
- https://en.wikipedia.org/wiki/File:H. 264 _ block_diagram_with_quality_score.jpg
- https://en.wikipedia.org/wiki/Inter_frame
- https://en.wikipedia.org/wiki/Intra-frame_coding
- https://en.wikipedia.org/wiki/Photoreceptor_cell
- https://en.wikipedia.org/wiki/Pixel_aspect_ratio
- https://en.wikipedia.org/wiki/Presentation_timestamp
- https://en.wikipedia.org/wiki/Rod_cell
- https://it.wikipedia.org/wiki/File:Pixel_geometry_ 01 _ Pengo.jpg
- https://leandromoreira.com.br/ 2016 / 10 / 09 / how-to -measure-video-quality-perception /
- https://sites.google.com/site/linuxencoding/x 264 – ffmpeg-mapping
- https://softwaredevelopmentperestroika.wordpress.com/ 2014 / 02 / 11 / image-processing-with-python-numpy-scipy-image-convolution /
- https://tools.ietf.org/html/draft-fuldseth- netvc-thor – 03
- https://www.encoding.com/android/
- https://www.encoding.com/http-live-streaming-hls/
- https://web.archive.org/web/ 20150129171151 /https://www.iem.thm.de/telekom-labor/zinke/mk/mpeg2beg/whatisit.htm
- https://www.lifewire.com/cmos-image-sensor- 493271
- https://www.linkedin.com/pulse/brief-history-video-codecs-yoav-nativ
- https://www.linkedin.com/pulse/video-streaming-methodology-reema-majumdar
- https://www.vcodex.com/h 264 avc-intra-precition /
- https://www.youtube.com/watch?v=9vgtJJ2wwMA
- https://www.youtube.com/watch?v=LFXN9PiOGtY
- https://www.youtube.com/watch?v=Lto-ajuqW3w&list=PLzH6n4zXuckpKAj1_ 88 VS-8Z6yn9zX_P6
- https://www.youtube.com/watch?v=LWxu4rkZBLw
- https: // web. stanford.edu/class/ee398 a / handouts / lectures / EE (a_MotionEstimation _) .pdf
GIPHY App Key not set. Please check settings