Welcome to Part II of our case study, where we take three training DVDs and convert them to multiple H.264 files for adaptive streaming (Click here if you missed Part I). In this section, I’ll detail how we came up with the number of files to offer in the adaptive streaming offering and their configurations, and how we tested the multiple file offering. As a reminder, finishing the project depended upon meeting the client’s quality-related goals, which I discuss at the end of this part.
I’ve written multiple times on adaptive streaming, most comprehensively in Adaptive Streaming in the Field for Streaming Media Magazine. In that article, I asked a number of producers like Turner Broadcasting, NBC (through Microsoft), Harvard University, and Deutsche Welle specific questions relating to stream count, data rate, resolution, and encoding methods. If you’re unfamiliar with adaptive streaming, I would read that article to get a good feel for the compression-related decisions, and come back here to see how they applied to this project.
Stream Count and Data Rate
Our first decision was the number of streams, which ranged from three to eight in the respondents from the article. To a great degree in making this decision, we considered the target viewer, how SunCam envisioned that they would watch the videos, and the resolution of the source video footage.
Specifically, because the viewers were professional engineers, the client felt that connection bandwidths would all be very high. In addition, since the video was very long form and for training purposes, the client felt that the quality had to be good, so dropping down to ultra low data rates was neither necessary nor advised.
Finally, since the video was 4:3 SD source, the max resolution would be 640×480, so we could produce only three or four streams and still cover the most relevant viewers. Originally, we decided to try four streams at 400 kbps, 700 kbps, 1.0 Mbps and 1.5 Mbps, but ended up dropping the lowest quality stream after our first demo to the client. More on that experience below.
The next question was the encoding resolution. Again, because of the nature of the video, SunCam felt that the viewing resolution had to be 640×480. We could get there two ways: by encoding at that resolution or by encoding at smaller resolutions and zooming to fit the display window, which is the technique used by many adaptive streaming producers.
To a degree, our decision here was driven by some of the introductory footage located at the very start of the video which you can see in Figure 2. Basically, it looked like crap when encoded at 480×360 resolution and then zoomed to 640×480 (shown on the left), but presentable at the same data rate when encoded at 640×480 (on the right). Since the client’s goal was a NetFlix-like experience we didn’t want to blow it during the intro, so we went with native resolution. Interestingly, this footage also dictated where we deinterlaced the ripped DVD footage and via which algorithm.
Note that this decision might have been different if we were targeting a much lower data rate or if the content didn’t include sections with this kind of fine detail. Our original plan had been to test the 400 kbps stream at both 640×480 and 480×360, both displayed at 640×480, and choose the highest quality option. If you’re trying to decide between the two approaches, you should run some side-by-side tests and see what works best for your content.
Other Encoding Parameters
When encoding for adaptive streaming, there are several considerations to keep in mind, all fully fleshed out in the aforementioned article. First, I recommend encoding using constant bit rate encoding, since variable bit rate encoding can trigger stream changes artificially.
Second, when producing multiple files for adaptive streaming, you need a consistent key frame interval with no key frames at scene changes. Since Adobe’s Dynamic Streaming can only switch streams at a key frame, the key frame interval should be shorter than usual. Normally I recommend a key frame interval of ten seconds, with these videos I went with three seconds.
Note that when producing for other technologies, like Apple’s HTTP Live Streaming, Microsoft’s Smooth Streaming, and Adobe’s RTMP-based Adaptive Streaming, other considerations, like chunk size, also matter. I cover some of this in the aforementioned article, but provide much more information in my book, Video Compression for Adobe Flash, Apple iDevices and HTML5.
The other area where you have to encode specially for adaptive streaming relates to audio parameters, which should be identical for all streams. Otherwise, you may hear popping or other artifacts upon scene changes. Since most of the audio in the videos was monaural speech, I encoded at 64kbps, mono, in all files.
Testing the Adaptive Streams
While I was messing around on the encoding side, Stefan Richter, who brought me into the project, was working on the player in parallel. When we finalized our thinking regarding stream count and configuration, we wanted to show the client what stream switching would look like. In Stefan’s final player, however, there would be no mechanism to force a stream switch, since that’s all handled by the player and server automatically.
Fortunately, Stefan found a player in Adobe’s Open Source Media Framework (OSMF) that offered controls to force a stream switch, and that he could modify to play our files. This is shown in Figure 4, using a video supplied by Adobe. If you look on the player control bar, you’ll see the letter Q, which you click to reveal the Q- and Q+ controls that you click to switch between bitstreams. As you can see, the player displays the bitrate of the file then playing between the controls (that’s the 408 kbps).
Most of us have watched video distributed in one form of adaptive streaming or another, but it’s hard to tell when streams are switched, and it’s tough to force a stream switch without sophisticated network gear. While the OSMF player was admittedly crude, the stream switching itself was absolutely flawless, with no stops, skips, or audio artifacts. I was impressed, and so was the client — with the stream switching, but not the quality of some of the streams. As I mentioned in the first installment, the client didn’t want to go forward if we couldn’t match or come close to Netflix quality. Some of the streams that I produced didn’t meet that standard.
Here’s why. Most of the video we were encoding was simple talking head shots interspersed with slides, some figures, and b-roll of the audience — really easy to compress stuff. On the other hand, the introductions of the videos were pretty highly produced, with some very artsy pans, zooms, transitions, and hard-to-compress footage like the text shown in Figure 2.
My original test videos grabbed a section from the middle, and the video looked great at all data rates. But then I realized that this wouldn’t be representative of overall quality, so I reencoded a clip containing a mix of hard-to-encode introductory footage, and easy-to-encode talking head, and that’s what we showed the client in the OMSF player. Overall, the quality was good, except one fast zoom-out at the start of the clip, which looked awful at all data rates (Figure 5), and some highly detailed footage discussed below. Right at the start of the video, it stuck out like a sore thumb, making an immediate bad impression.
Bill, the client, was resolute, matter of factly stating, “The sample is not really very close to the Netflix experience even at 1500. If this is the best that we can get from Flash then you should not take it any further. I may need to rethink this project and find a Silverlight host to replace AWS.”
To explain the last comment, since Netflix uses Silverlight, Bill thought the problem was technology, not encoding technique or footage. At this point, I called Bill and we discussed three points.
First, that whether he used Silverlight or Flash, H.264 would likely be the codec and that changing the platform wouldn’t affect quality at all. Second, that the footage at the start of the project was simply impossible to compress at high quality, irrespective of platform and technology. Third, that the original encodes were more or less draft encodes to test the operation of adaptive streaming, and that I thought I could do better quality wise for final.
We agreed to remove the first fast zoom-out from the test clip, and that I would reencode the resulting test clip to the proposed final parameters before he looked at it again. This is also where we decided to drop the 400 kbps clip and go with the remaining three at 700 kbps and 1.0 and 1.5 Mbps.
Overall, Stefan had done his job — the adaptive streaming bit looked great. But if I couldn’t get the quality up, the project was going down the drain. Tune in next week to see how I did.