The Metadata in RIFF (.avi) files (Under construction)

Please note: Four bytes words in caps have special significance in this document, which are called Chunk headers, and appear exactly same in the original RIFF file.

This page brief about the Riff(Resource Interchange File Format) type video file container, popular video files using Riff container being AVI files.


Since video files most likely to have multiple data chunks like video,audio,subtitle,metadata which essentially describe about the video,it's been convenient to have structured data rather than in a random order.

What is RIFF?

RIFF is a structured video container format Issued as a joint design by IBM Corporation and Microsoft Corporation in 1991.

How it is structured?

Everything in RIFF is structured in Chunk, sub-chunk hierarchy. Every chunk contains ChunkID, ChunkSize, and ChunkData. First four bytes of a RIFF file type must be ASCII values corresponding to letters RIFF. This is the ChunkId parent of every other sub-chunk. Rules dominating RIFF structure which describe how a typical RIFF file are:

  • Every chunk should have ChunkId(four bytes), ChunkSize(four bytes). Do not include size of ChunkId and ChunkSize in ChunkSize.
  • ChunkData having odd number of bytes should be padded with one more dummy byte,and ChunkSize is always even.
  • ChunkData can be simple data like video frames (encoded or plane), audio samples or another Chunk having similar structure as parent Chunk. Hence ChunkData can have sub-chunks and it is recursive, which means LIST chunk can contain another LIST chunk.
  • Every chunk will have data which has to be decoded according to either predefined schema(Fixed size structured metadata like AVIH,*STRH* primitive chunks) or variable size metadata chunks like INFO main Chunk.

Following picture is a typical RIFF file.

Here Tags RIFF,AVI (notice single space after AVI),LIST,hdrl,movi are four byte character codes,and are present right in the beginning of the chunks and appear exactly same(corresponding ASCII) in the file.Following are the some of the predefined Primitive chunk Structure.

AVIH primitive Chunk (only child of hdrl main Header Chunk)

Byte Number Metadata Description
0 to 3 avih Primitive Header tag
Signifying start of avih chunk
4 to 7 size 56(fixed)
8 to 11 Microseconds Per Frame Video related Data
12 to 15 Maximum Bytes Per Second Video related Data
16 to 19 Padding Granularity Video related Data
20 to 23 Flags Video related Data
24 to 27 Total Frames Video related Data
28 to 31 Initial Frames Video related Data
32 to 35 Streams Video related Data
36 to 39 Suggested Buffer Size Video related Data
40 to 43 Width Video Dimention
44 to 47 Height Video Dimention
48 to 63 Reserved Reserved

For More Detail refer

STRH Primitive Chunk (First child of STRL main Header Chunk)

Byte Number Metadata Description
0 to 3 strh Stream Header Chunk
4 to 7 size 56
8 to 11 Type,'auds'
Define Data about
either of Audio,Midi
text or Video
12 to 15 Handler For Audio and Video,
This will be codec name which
will be used by Audio/Video Players
16 to 19 Flags Defines whether Stream should be
20 to 23 Priority Stream priority
24 to 27 Language Stream Language
28 to 31 Initial Frames Significant data for
interleaved Files,specifies Audio
position relative to Video
32 to 35 Scale Used internally
for calculations
36 to 39 Rate Stream Rate
40 to 43 Start Specifies Starting Time of
the Stream
44 to 47 Length Stream Length
48 to 51 Suggested Buffer
Used by Video Players
52 to 55 Quality Stream Quality
56 to 59 Sample Size Size of Single sample
60 to 63 Frame Rectangle position
,upper Left corner

For More Detail refer

STRF Primitive Chunk (Second child of STRL main Header Chunk located right after STRH)

-------This is Brother of 'vids' type STRH Header--------
Byte Number Metadata Description
0 to 3 STRF Stream format Header Flag
4 to 7 size 40 size for Video
8 to 11 Width Number of Horizontal pixels
12 to 15 Height Number of Vertical pixels
16 to 19 Bit Planes(1) Number of planes for the target device
20 to 23 Bits Per Pixel Average bits per Pixels for compressed
24 to 27 Bits Compression Image compression
28 to 31 Image Size Image Size
32 to 35 X Pixels per Meter Horizontal resolution
36 to 39 Y Pixels per Meter Vertical Resolution
40 to 43 Color indices Number of color indices in the color table
44 to 49 Number of Imp. Color Number of Important color Indices in Bitmap
-------This is Brother of 'auds' type STRH Header--------
Byte Number Metadata Description
0 to 3 STRF Stream format Header Flag
4 to 7 size(16/20) For Audio
8 to 9 Compressor Predefined Code of Compressor used
10 to 11 Channels Number of channels Present in Audio Stream
12 to 13 Sample Rate Number of Sample to be played in One second
14 to 15 Bytes Per Sec Average Bytes Per One Second
16 to 17 Block Align How Audio Stream Blocks are aligned
18 to 19 Bits per Sample Size of one Sample in Bits

IDIT Main Header

IDIT tags can Exist independently inside a LIST Chunk

Byte Number Metadata Description
0 to 3 IDIT IDIT Header Flag
4 to 7 size Size of IDIT Chunk
8 to size <Data> Data which can be decoded as String Value

INFO Main Header Chunk

Byte Number Metadata Description
0 to 3 INFO Info Header Tag
4 to 7 Info Primitive Tag Predefined Tag Header
8 to 11 size Size of Info Primitive Chunk
12 to 12+size Info Primitive
String Array of Metadata
12+size+1 to 12+size+4 Info Primitive Chunk Optional Second Info Primitive Chunk
... size Size of Second Info Primitive Chunk

Info is slightly special Metadata Chunk,because we can store variable sized metadata in here.These are basically user related data and video subjects related information.Here we have flexibility to add tags,having standard tag Ids like Comment,Copyright,Location Description,Genre,Country and more.

In Exiv2, all video files are treated as Image Files and the RIFF read flow is as follows:


Updated by Mahesh Hegde about 8 years ago ยท 6 revisions