Data Storage and Video Technology
Relating display rates and Video transfers
Modern video editing involves working with the video data in digital format. It is much easier to work with digitally encoded data rather than analogue data laid down on tapes.
The first step in manipulating video data is to digitize the footage via a suitable hardware capture device such as a digital video camera or through an analogue camera and then passing the analogue data to an analogue to digital converter. This device converts the video analogue signals to a computer usable digital format. Video information depends on a number of factors such as color depth, image resolution and frame rate. Consider a 24 bit (3 bytes) color scheme (approximately 16 million colors) at a resolution of 640 x 480 pixels being shown at a frame rate of 30 frames per second (NTSC), the data rate here is 3 x 640 x 480 x 30 = 27,648,000 bytes (27MB) per second. Refer to Table 1 for some other examples. It is essential that margins are built into the calculations as there are many variables that can affect real world performance.
Since frames need to be displayed at high frequencies to “fool” the brain into perceiving continuous motion, this will consume a high degree of bandwidth. This is a constant tradeoff as too slow a repetition rate will produce “flicker” and will result in poor quality. There are a number of different “standards” in use such as NTSC TV at 30 frames per second, PAL TV at 25 frames per second and film at 24 frames per second. To reduce bandwidth requirements “tricks” such as interlacing can be used. With interlacing alternate lines are displayed on the screen to build up the image. This means that instead of displaying line1, followed by line2, followed by line 3 until all lines have been shown, the system will display line1, followed by line3 and then line5 and so on. When the system has finished displaying all the odd lines it will return to the beginning and display the even lines – line2, followed by line4 and so on. This technique reduces the performance requirements of the system. Of course we cannot neglect the audio component of a system. Typically audio will consume far less bandwidth than video as the ear is receptive to a very narrow frequency band, however new formats such as surround sound will use more bandwidth than ordinary mono sound. As a result we normally talk in terms of Audio/Video (A/V) rather than just video.
How do disk based systems differ from tape systems ?
The random access nature of disk devices allows the user to access any portion of an edit sequence in a minimal time frame period unlike the time delays incurred due to the sequential nature of tape. Tape editing is termed linear editing. Disk devices allow the use of more productive editing techniques such as those employed within non linear editing systems. The software used within a non linear editing system allows video and audio clips to be cut and pasted as simply as text can be manipulated within a word processor program. This is important since most video productions are filmed in a non linear sequence and needs to be assembled in the correct order to make sense to the viewing audience. Video effects such as transitions between scenes can be easily added and backgrounds can be inserted over “blue screen shots”.
Physical data storage characteristics
Consideration must be given to how the data is distributed across volumes and this is where insight on the physical underlying disk layout is invaluable. Many users partition disk devices, but this practice can lead to excessive disk seeks which may cause unacceptable latencies. To take a specific example if a disk drive is partitioned then a sequential application naturally uses minimal head movement as the data is should be collected into contiguous space. When an application uses the other partition (which it sees as a totally distinct volume) the heads will have to move right over to the other half of the disk. Compounding the problem is that the disk will now also have to wait for the appropriate sector to come under the read/write head. This becomes more acute when high resolution formats are used.
Arrays of disk devices in the form of RAID are normally used to enhance performance. RAID allows data to be spread across multiple disks. The idea behind this is that n disks drives can perform at n x speed of a single disk, so for example an array of 5 drives should give performance of 250 MB/s if each disk is capable of reaching 50 MB/s throughput rates. Of course the server infrastructure must also be capable of supporting this throughput..
Video on Demand
A more recent development is the concept of Video on Demand (VOD) where multiple viewers may simultaneously view different stages of a movie. Figure 45 shows a two hour movie being accessed by 6 viewers. This can be achieved by having a number of servers available to consumers, serving data over a medium such as Ethernet. However fast disk arrays can also service multiple users from a single copy of a file across a high speed interface such as Fibre Channel. This is possible because the random access features of disk coupled with very fast performance naturally lends itself to multi-tasking unlike the sequential methods of tape systems.
Figure 1 Viewing different parts of a movie simultaneously
Viewer1 Viewer 2/3 Viewer 4 Viewer 6
0 – 15 mins
16 – 30 mins
31 – 45 mins
46 – 60 mins
61 – 75 mins
76 – 90 mins
91 – 105 mins
Movie duration à
In this example, viewer 6 has nearly finished watching the movie and viewer 1 has only just started.
There are a number of ways in which the performance and capacity demands of video applications can be reduced. In order to decrease the amount of data being transferred (which affect the time factor) a technique known as data compression has been introduced. There are two types of compression generally available lossy and lossless. The former method uses approximations to represent the data and is appropriate for image processing where the human brain is good at re-constructing image data.
Lossless data compression is more common in commercial applications where data must be re-constructed exactly as it was prior to the compression process. An example of lossless compression is one that exploits text character frequency in a language. Typically eight bits of information are used to represent English language characters within the ASCII code. Since the character “E” appears far more often than the character “X” we can allocate less than eight bits to represent “E” and more than eight bits to represent “X”. Since we use the letter “E” more often than “X” we will tend to use the pattern of less bits more frequently thus reducing the overall data content. The ISO MPEG (Moving Picture Experts Group) is a lossy compression standard widely used throughout the industry to reduce data rates whilst retaining a high degree of image quality. By compressing the video information less traffic has to be moved across the storage bus which results in higher throughput. The downside of this is that the time taken to compress the data and to subsequently uncompress it may be substantial and a number of manufacturers provide dedicated high speed hardware devices to help alleviate this penalty. It is important to achieve this in Real-Time. Using the NTSC standard the information must be capable of capturing, compressing, decompressing and playing back the frames at a rate of approximately 30 frames per second. If this rate is not achieved then dropped frames will result, which show up as a loss of smoothness and lack of synchronization between audio and video such as when a person is talking. Techniques used to reduce data traffic include taking advantage of redundancy between frames. Typically only a small amount of information varies between an individual frame and subsequent ones. The differences are normally caused by the representation of motion. A good example of this is an interview where the only part of the frame that changes might be the head of the person speaking. The technique here is to send only the difference information between successive frames. The initial (starting frame) is known as a key frame and the difference frames are termed delta frames. MPEG4 is the latest standard as of this time of writing, which can use interpolation to reconstruct data between frames.
Another way of reducing the amount of data is to reduce the color depth. This will reduce the overall quality but depending on the context it may be acceptable to do this. One such application is streaming video over the Internet where typically the content does not need to be high resolution. Compression and Decompression is achieved by devices known as Codecs (Compression/Decompression). Codecs may be implemented by software or by dedicated hardware devices.
One way of representing color is to use three channels comprised of the primary channels Red Green and Blue. This representation is termed RGB. Each channel can be made up of 8 bits or 10 bits. The 8 bit representation provides 256 steps (shades) and the 10 bit scheme provides for 1024 steps. Some software only handles 8 bits so sampling at 10 bit resolution can consume unnecessary bandwidth. When discussing video transmission, rates are often quoted using terminology such as two streams of 8 bit uncompressed or 1 stream of 10 bit compressed etc. From a data storage perspective we can no longer look on multiple streams as either sequential or random I/O patterns, we really have a random sequence of sequential patterns.
RGB color space
The RGB color space is simple but in actual fact the human eye is less receptive to certain colors and therefore it may we wasteful to have high resolution when the eye can barely detect it. In actual fact the eye is most sensitive to green shades and least sensitive to blue shades, so it might make sense to dedicate more detail to the green parts of an image and less to the blue parts. In practical terms though, different color models are used.
YUV color space
The YUV color space is a better model (though may be more complex to implement). In this scheme the Y is for the luminance (brightness) component and U and V make up the chrominance (color) part. The Y component is actually generated from the RGB parts by combining them and deriving the brightness part. The RGB parts are actually weighted so the Y component is actually 0.299R + 0.587G + 0.114B. The U and V components are derived from the Y signal and the B and R signals of the RGB stream. A closely related scheme used in component video is Y`CbCr.
To more effectively utilize resources each component may use a varying amount of bandwidth. For example with 4:2:2 the Y component is sampled at every pixel whereas the Cb and Cr components are sampled on alternate pixels. Another technique 4:1:1 only samples the Cb and Cr at every fourth pixel. High end editors may use 4:4:4 which samples each component equally but at the expense of higher bandwidth requirements. Usually the numbers quoted refer to the ratio of luminance (1st number) part to the ratio of the chroma (2nd and 3rd numbers) parts. This is termed chroma subsampling and works because the eye is less sensitive to color than luminance information.
Figure 2 Chroma subsampling
Higher transfer rates (in the order of 200MB+/Sec) are required for digitally encoded High Definition (HD). These applications are demanding both in terms of performance and capacity since a sustained transfer rate of 200MB/Sec uses 1 GB of storage in 5 seconds. A single disk drive of 400GB in capacity will therefore provide only a little over ½ an hour of HDTV uncompressed playback time. To circumvent this many video developers will only work on a small amount of footage at a time during editing sessions and then save the work to a secondary medium such as tape. Today, however SATA RAID arrays are ideally suited with their large capacities (750 GB+ per drive bay) to the video market and can emulate tape devices if needs be.
Two common High Definition standards are:
The first – 1080i has a frame definition of 1,920×1,080 pixels and uses interlaced scanning (alternative lines), 720p uses progressive scan (non interlaced) techniques and has a pixel resolution of 1,280×720. The aspect ratio (width:height) is 16:9. High definition may also be expressed as <frame height>/<field rate> such as 720/60p or 1080/60i.
One common capture format is HD-SDI which stands for high definition serial digital interface. This format is typically 10 bits per channel and it uses 4:2:2 sampling.
Table 44 shows some approximate throughput requirements for a subset of uncompressed video formats, which should be taken as minimum values for guideline purposes only; the data storage device employed should be capable of exceeding these formats with a healthy margin.
Table 1 Data throughput rates
|Frame size and format||MB/Second||GB/Hour|
|640×480 NTSC frame rate||27||97|
|1080/50i 4:4:4 RGB||200||720|
|720/60p YUV 4:2:2||140||500|
Pre-Production Vs Post production
The process of capturing footage is part of the pre-production stage. Post production involves editing and the final edited footage is rendered to become the finished product. During the Render stage decisions must be made relating to the quality of the final product – film, broadcast, Internet streaming etc.
During a live broadcast it is essential that there is no interruption to service. A good example of this may be a live News broadcast which must be done in real time. Service level Agreements which can impose severe financial (and other) penalties may be imposed on suppliers of broadcasting equipment that do not perform reliability. Typically this environment features redundant equipment which can continue to function when individual components fail.
Adapting Data Storage to the Video Market
Disk Drive Optimization
As discussed earlier, modern video equipment uses disk drives rather than tape technology. The requirements of vertical markets are similar to that of the enterprise but differ somewhat in the way that data is accessed.
Enterprise class disk drives allow for optimization methods when used with various types of A/V applications. This is achieved by altering parameters that are accessed via SCSI mode pages. This is a specialized area and may require third party software to access these pages. The mode page parameters that are important for a disk to be optimized for video applications include:
Enabling write caching (often disabled by default) and maximizing the pre-fetch capabilities of the read ahead cache.
Adjusting memory buffer full/buffer empty ratios.
Lowering the retry counts.
SATA drives are still playing catch-up in this respect but a number of vendors are beginning to announce features which are optimized for the video market. A recent feature that has been introduced is the concept of Native Command Queuing (NCQ). This is similar to Tag Command Queuing which is a staple of the SCSI protocol. The idea behind NCQ is to optimize the order of I/O processes to minimize the overall response time to multiple tasks. This is well suited to small block random transfers but is of less benefit within a large block sequential based operation which is more typical of Video applications. Consideration should also be given to the amount of free space on disks. Disk performance can vary according to capacity as data becomes fragmented across a disk and also physical performance characteristics vary depending on where the data is physically located.
RAID within video applications
RAID came into being because of the requirement to enhance I/O performance. One method of achieving this is to spread the data across a number of array members and thus reduce the amount of data being transferred by a single disk. The disadvantage in using a number of disks in a parallel configuration is that the increased component count will reduce the overall system reliability. RAID is often thought of a means of enhancing data reliability by incorporating extra hardware that allows data recovery, should one of the individual storage members fail. In fact, the original RAID paper did not include the RAID 0 level as RAID 0 does not include any redundancy features. Whilst this is seen as a major disadvantage in a commercial environment such as banking it may not be as big an issue in the pre-production phase of the A/V market. This is because the data may be transient during an editing session and can be re-constructed.
RAID 0 is often achieved by selecting drives and using the Operating System to build stripe sets which spreads the data across all the drives. This can be a cost-effective way of achieving high throughputs but ties the computer and hardware together in a single domain. This may or may not be a disadvantage and will depend on a particular application’s requirements.
In situations where data protection such as the post-production phase is required Parity protected RAID is used. This is in the form of RAID 3 or RAID 5. RAID 3 uses a dedicated disk to maintain redundant information that allows data recovery should an individual disk fail. It is cost effective in that in an array consisting of n members only 1/n of the data capacity is unavailable. For example with a 6 drive RAID set consisting of 6 x 146GByte drives 730 GBytes are available to the user. The design of RAID 3 is such that data is striped across disks at a small chunk size which guarantees that all disks will be involved in a single I/O operation. The strategy here is essentially Divide and Conquer as the file is divided up into portions and each portion of the file is dealt with by individual disks. Since the disks are operating in parallel the data transfer will take place much faster. RAID 3 is somewhat slower than RAID 0 but it does have the advantage of data protection.
RAID 5 does not use a dedicated disk for parity but interleaves the parity information across multiple disks instead, so a 6 drive RAID set will use 1/6 of the capacity of each disk to hold parity information.
RAID 6 is a dual data recovery scheme which uses two independent redundancy calculations. With disk capacities soon to approach 750 GB, the rebuild time for a disk failure is substantial. During this period if another drive were to fail or unrecoverable read errors were encountered on another member disk, data loss would occur. RAID 6 is more often deployed in mission critical applications due to its more resource intensive implementation.
Within A/V applications the RAID levels that are commonly deployed are RAID 0, RAID 3 and RAID 5.
Storage Systems must be able to deliver and accept the data at sufficient rates to satisfy the application. It is necessary to take into account all paths to and from the storage. To take a specific example if the storage array is capable of streaming data at the rate of 250 MB/s and it is connected to a U160 SCSI interface adapter then the limiting point is the HBA. Similarly if the computer bus structure is 32 bit PCI then the PCI bus will become the bottleneck. To ensure that the storage bus is sufficient to meet the demands of HD, then storage interfaces such as U320 SCSI and 4 gigabit Fibre Channel are deployed. Within the computer system PCI-X and PCI expresses architectures should be used with a corresponding Host Bus Adapter.
Typical performance requirements of HD range from just over a 100MB/s to around 275 MB/s depending on the format. The actual figures are basically derived by multiplying the pixel resolution (such as 720×1280) then multiplying by the refresh rate and then by the number of bits per channel (such as RGB 8 bit =24).
Cache Policies within multi-streaming applications
Data transfers within many video applications do not always fall into neat categories of random or sequential. Instead an application may involve pseudo random streams of sequential applications. An example of this is Video on demand. Video on demand may involve a number of users each asking for different large sequential transfers of a file (or different files). In this case default caching policies may not work so well for the video environment. One basic heuristic might be to segment the cache into various sections and allow each I/O request to use their own allocated section. The algorithm has to make a decision on how much cache to allocate and also how to recognize an individual stream. This may be done through I/O pattern recognition such as observing that a request for perhaps 10 MB of data at a particular incrementing sequence of block addresses occurs every tenth I/O request. The approach then might be to divide the cache into ten sections. The ability to understand the I/O patterns will lead to substantial performance gains by optimizing read look ahead parameters and making best use of the available cache when using in multi-streaming applications.
Data transfer and Sharing
Large projects such as movie editing are normally divided up and split into multiple stages. For example editing of special effects may take place on an Apple Computer system but the final rendering may take place on a high end P.C. Also, multiple special effect editors or graphic artists may wish to work on various sections of a movie. To do this effectively data needs to be shared and transferred in an efficient manner. There are two main implementations that may be used for this purpose – Ethernet which may be used to transfer files across two other machines for different aspects of the process and Fibre Channel Storage Area Networks (SANs) for high speed data transfers.
Ethernet typically uses file transfer oriented protocols such as NAS or CIFS whereas Fibre Channel uses faster block oriented protocols such as SCSI. As well as implementing the associated hardware, software is required to ensure that the data is shared in a co-coordinated manner. This may be done by using one particular system as a master node which controls all the resources and prevents other users from overwriting changes made on other machines. There are a number of vendors that supply solutions for this type of application that work well in homogeneous or heterogeneous environments.
Assessing array Performance
It is important to ensure that the storage array being used is adequate to meet the stringent demands of high performance sequential streaming applications. There are a number of ways of doing this but usually purpose designed benchmarks are deployed to emulate the actual environment. Two tests such tests are Disk Speed Test by Blackmagic Design and Iometer which was originally designed by Intel.