The unofficial MO3 format description

open source MO3 decompression and decoding support (but not playback)

News:

2015Nov10 : updated specification by Johannes Schultz. OpenMPT library is now able to decode and play MO3 music using open source code (since 0.2-beta17) !
2014May21 : code and specification pushed to Github, to ease third party contributions (Hi Johannes Schultz)
2009Jul19 : v2.4 MO3 encoder updates: code updated to v0.6, and specification updated to 0.91.
2006feb26 : unmo3 source code v0.5 is released (for technical people only). See section 4. and Readme.
2006feb15 : format descriptions updates (infos about IT MIDI macros and instruments envelopes from Ian Luck)
2006feb05 : the unpack algorithms source code (5 functions) is released. The unpack algorithms are explained.

See

Version 0.92
May 18th, 2014

0. Credits

Laurent Laubin for the PEtite decompression
Matthew T. Russotto for the unpack compression explanation
Ian Luck for answering questions about missing parts of thespec: IT macros, IT instruments structure, modules flags, IT fields of samples, plus some others...
Stuart for initial discussion about the unpack compression.

1. Introduction

This description is applicable to mo3 encoder v1.8, v2.1, v2.2 and v2.4.

1.1 Overview

The MO3 format means "MOdule with MP3", because the main initial idea was to reduce the size of a module (in .mod, IT, XM) by compressing the samples using MPEG audio layer 3.

This format has been created by Ian Luck (http://www.un4seen.com/).

But not only the samples are compressed, but also the music data, containing mainly notes, instruments number and effects, as well as instruments information.
And a lot of effort has been used to reduce the size of this part. Then a specific lossless compression is applied on music data.

The samples can be compressed using OGG, MP3, and 2 kind of specific lossless algorithms.

We can define "compression" a scheme which detects all kind repetition in some data, then encodes these repetition in a more compact way. This principle is very well applied in MO3 for music data encoding.

1.2 The music data size reduction

The first idea of the MO3 encoder is to parse the music data and detect unused samples, then they are removed from the module.

When composing a module, musicians often cut and paste the content of a channel (later called a "voice") from one pattern to another one : for example drums, and bass. So this kind of repetition is detected by the encoder : a list of unique voice is built, then the patterns are encoded using pointers onto these unique voice data. This idea is used in mtm format.
This is very efficient especially for the "empty" voice : for example in a module with 4 channels, often only 3 voices are used per pattern. So the "empty" voice is repeated a lot of time.

In a voice, one row can be repeated several times. This is true for the empty row. This kind of repetition is detected and stored in a compact way.

The last size optimisation is how the row in a voice data is encoded, using a list of type/value items.

1.3 The sample delta encodings

Digitized audio data are generally signed values stored in 8bits or 16bits. If directly compressed using general purpose lossless compression algorithms, best result are about 10% of reduction, which is poor.
Audio data are roughly sinus data (or sums of it), so with few repetition.
But the successive values are close to each other, so a good idea is first encode them as delta values, (the first being 0 for example) before compression. This is done in the 'delta' version of the MO3 lossless packers, and in XM modules.
But there is more smart as the mathematical slope of audio data is often constant : encode the error for the next predicted delta, instead of directly the delta value. Then the 'prediction' is adjusted with the error : so the prediction is converging to the right 'next' delta value.
This method is more efficient than simple delta one, especially on 16 bits data.

2. The file format

2.1 Compressed form

0x10 is the notation of an hexadecimal value (16 in decimal)
short (2 bytes) and long (4 bytes) are stored in file in little endian order (intel x86)

Address Length Type Description

0x0000 3 char "MO3"

0x0003 1 byte version (0 with mp3 and lossless, 4 with v2.1, 1 with ogg) related with sample compression, 4 should means "with no LAME header".
3 means v2.2 and 5 v2.4

0x0004 1 long uncompressed length of header (music data)

Encoder version 2.2 and earlier (version == 0, 1, 3 or 4):

0x0008 computed byte[] compressed header (see 2.2 and 2.3 section)

computed computed byte[] samples, compressed or not, using lossless, mp3 or OGG

Encoder version 2.4 (version == 5):

0x0008 1 long data offset in compressed data after decompression

0x000c computed byte[] compressed header (see 2.2 and 2.3 section)

computed computed byte[] samples, compressed or not, using lossless, mp3 or OGG

2.2 The music data decompression algorithm

Here is Matthew explanation:

"The first byte is always uncompressed. After that, you've got two interleaved streams of control bytes and data bytes. The control bytes are read by the shift_dl routine.
In the unpack routine, the control bits are read most-significant first.
A zero bit indicates "uncompressed byte". A one bit indicates compressed data.
The next two control bits control which kind of compression
-- if they are '00' it's LZ with the same (relative) pointer as a previous LZ.
The next two bits of the control stream are the length, unless they are both zero.

If they are both zero, the true length minus 2 is encoded in the control stream, two bits per bit.
The first bit in each pair is the actual data, the second bit is 0 on the last pair.

If the first control bits are '11', '10' or '01', then the LZ pointer is in the control stream.
The most significant bit of the pointer is a '1', then the next most significant bits of the pointer are read from the control stream two bits at a time as described above (including the initial 11 or 01 or 10). Then 3 is subtracted from that value and it is shifted left by 8 bits, and the 8 least significant bits if the pointer are taken from the data stream. The one's-complement of the result is taken.
The length adjustment for -1280 and -32000 is saved and added back in later (it's always at least one). Then it goes into the same LZ as before, with the next two bits of the control stream being the length unless they are both zero, etc.

Example:

64 6d 08 69 61

64 = 01100100
0 = next byte is literal 0x6d
1 = compressed data
10 = LZ with MSB of pointer zero after subtracting 3

08 -- byte from data stream, pointer to -9 bytes back (points to the 'a' in Danny)

01 -- from control stream, a length of 1, plus the adjustment 1 from earlier = 2.
0 -- indicates a literal 69
0 -- indicates a literal 61. "

For more details, look in the source code here.

This algorithm is copyrighted Ian Luck, and is also used in PEtite.

2.3 The music data, after decompression

2.3.1 General data

Address	Length	Type	Description
0x0000	variable	char[]	song name (terminated by 00)
computed	variable	char[]	message (in IT, terminated by 00)

then, 0x1a6 bytes :

0x0000 1 byte number of channel (for example 04 for .mod, 0x20 for .xm)

0x0001 1 short song len (at 0x3b6 in .mod, at 0x40 in .xm)

0x0003 1 short restart position

0x0005 1 short pattern number

0x0007 1 short unique voice number

0x0009 1 short instrument number

0x000b 1 short sample number

0x000d 1 byte ticks/row

0x000e 1 byte initial tempo (default = 125)

0x000f 1 long flags

if (mo3Hdr->flags & 0x0100) printf("IT"); else if (mo3Hdr->flags & 0x0002) printf("S3M"); else if (mo3Hdr->flags & 0x0080) printf("MOD"); else if (mo3Hdr->flags & 0x0008) printf("MTM"); else printf("XM");
bit#0 : 1 means linear frequency table, 0 means Amiga table (cleared in .mod)
bit#14: (0x00004000) currently used
bit#17: (0x00020000) always set
examples: 0x00024001 for .xm, 0x00020088 for .mod

0x0013 1 byte global volume

0x0014 1 byte pan separation

0x0015 1 byte internal volume (could be ignored)

0x0016 64 byte[] default channel volumes (for 64 channels).

0x0056 64 byte[] default channel panning (for 64 channels).

0x0096 16 byte[] IT MIDI macros : SF0-SFF settings (equate to "F0F0<value-1>z" in IT)

0x00a6 128*2 byte IT MIDI macros : Z80-ZFF settings (2 bytes each, equate to "F0F0<value1><value2>")

then :

2.3.2 Song and pattern data

Address	Length	Type	Description
0x0000	songlen	byte[]	song sequence (pattern #)
computed	nb unique voice	short[]	voice seq (for each pattern, and each channel, number of voice data) : identical voices are detected and factorized at compression.
computed	nb pattern	short[]	pattern length table (size=nb_patt*2)

In .mod each pattern has 64 row, and 4 channels (for protracker). Each row is coded using 4 bytes. So a voice takes 4*64 bytes, and a pattern 4*64*4 bytes.
In MO3, the number of row per pattern is variable (like in XM) and stored in 'pattern length table'. To rebuild a pattern you have to use the voice seq table. The voice data are stored as described above :

For each voice data, repeated "nb unique voice" times.

0x0000 1 long voice data encoded as type/value list
(one empty voice is encoded "len=7, 10 f0 f0 f0 f0 30 00" or "len=5, 10 f0 f0 10 00" depending of pattern length by v1.8 and v2.1
with v2.2 encoder, the empty voice is really empty : only the ending 00)

The first byte is coding both the length of the type/value list (using the 4 right most bits) and if this row is repeat or not (using the 4 left most bits).
For example 0x30 means "3 times a empty row", and 0x32 means "this row list has 2 type/value, and is repeat 3 times".

type value type description value value Description

1 note note number 0=C-0, 1=C#0, 2=D-0, 3=D#0, 4=E-0, ..., 0x58=E-6, 0x59=F-7, 0x5a=F#7, 0x5b=G-7.
0xff means "note off, ==", 0xfe means "^^"

2 instrument intrument number-1 the intrument 1 is coded "0".

MOD/MTM effects

type value	type description	value value	Description
3	0	effect parameter	arpeggio
4	1	effect parameter	portamento up
5	2	effect parameter	portamento down
6	3	effect parameter	tone portamento
7	4	effect parameter	vibrato
8	5	effect parameter	volume slide + tone portamento
9	6	effect parameter	volume slide + vibrato
0xc	9	effect parameter	set offset
0xd	A	effect parameter	volume slide
0xf	C	effect parameter	set volume
0x10	D	effect parameter	pattern break
0x11	E	effect + effect parameter	extended effect (E0->EF)
0x12	F	effect parameter	set speed

One good example (almost all effects in a great music) is Danny Elfman by Moby

XM effects

type value	type description	value value	Description
4	1	effect parameter	portamento up
6	3	effect parameter	tone portamento
7	4	effect parameter	vibrato
0xb	p	effect parameter	set panning. 'p02' is coded '0b 00', 'p62' '0b f0', 'p10' '0b 20'
0xd	A	effect parameter	volume slide
0xf	v	effect parameter	set volume
0x10	D	effect parameter	pattern break
0x11	E	effect parameter	pattern delay
0x12	F	effect parameter	set speed
0x14	c	effect parameter	volume slide up. 'c03' is coded "14 30"
0x15	b	effect parameter	fine volume down
0x16	G	effect parameter	set global volume

IT effects

type value	type description	value value	Description
6	G	effect parameter	tone portamento
7	H	effect parameter	vibrato
0xb	X	effect parameter	set panning
0xf	v	effect parameter	set volume
0x22	D	effect parameter	volume slide
0x22	K	effect parameter	volume slide + vibrato
0x28	M	effect parameter	set channel volume
0x30	a	effect parameter	fine volume up
0x30	b	8 + effect parameter	fine volume down
0x30	d	(effect parameter)<<4 + 0xf	volume slide down. 'd01' is '0x30 0x1f'

S3M effects

type value	type description	value value	Description
6	G	effect parameter	tone portamento
7	H	effect parameter	vibrato
7 and 0x22	K	effect parameter	vibrato + volume slide (K 01 is coded "07 00 22 01")
0xa	R	effect parameter	tremolo
0xc	O	effect parameter	set offset
0xf	v	effect parameter	set volume
0x10	C	effect parameter	pattern break
0x12	T	effect parameter	set tempo
0x16	V	effect parameter	set global volume
0x21	A	effect parameter	set speed
0x22	D	effect parameter	volume slide
0x23	E	effect parameter	portamento down
0x26	Q	effect parameter	retrigger note
0x2b	S	effect parameter	set high offset

Example (in hexa): 13 01 38 02 0d 0f 20

13 : 1 row of 3 type/value
01 38 : note is G#5
02 0d : instrument is 14
0f 20 : effect : C20

Data for one voice is terminated with 00

2.3.3 Instruments data

Instrument data takes 0x33a bytes, after the instrument name (0 terminated)

Address Length Type Description

? char[] instrument name (0 terminated)

0x0000 1 long flags : 1 = play on MIDI, 2 = mute. These are hardly ever used

0x0004 10*12*4 byte sample map : 10 octaves * 12 notes * 4 bytes (1 byte, 1 byte, 1 short = sample number)

0x01e4 1 byte volume enveloppe : flags

0x01e5 1 byte volume enveloppe : number of node points

0x01e6 1 byte volume enveloppe : loop beginning

0x01e7 1 byte volume enveloppe : loop end

0x01e8 1 byte volume enveloppe : sustain loop beginning

0x01e9 1 byte volume enveloppe : sustain loop end

0x01ea 25*2 short volume enveloppe, 25 nodes : position (short), value 0->64 (short)

0x024e 1 byte panning enveloppe : flags

0x024f 1 byte panning enveloppe : number of node points

0x0250 1 byte panning enveloppe : loop beginning

0x0251 1 byte panning enveloppe : loop end

0x0252 1 byte panning enveloppe : sustain loop beginning

0x0253 1 byte panning enveloppe : sustain loop end

0x0254 25*2 short panning enveloppe, 25 nodes : position (short), value +32/-32(short)

0x02b8 1 byte pitch enveloppe : flags

0x02b9 1 byte pitch enveloppe : number of node points

0x02ba 1 byte pitch enveloppe : loop beginning

0x02bb 1 byte pitch enveloppe : loop end

0x02bc 1 byte pitch enveloppe : sustain loop beginning

0x02bd 1 byte pitch enveloppe : sustain loop end

0x02be 25*2 short pitch enveloppe, 25 nodes : position (short), value +32/-32 (short)

0x0322 1 byte vibrato type (0=sine, 1=Ramp down, 2=square, 3=random)

0x0323 1 byte vibrato sweep

0x0324 1 byte vibrato depth

0x0325 1 byte vibrato rate

0x0326 1 short fade out

0x0328 1 byte midi channel

0x0329 1 byte midi bank

0x032a 1 byte midi patch

0x032b 1 byte midi bend

0x032c 1 byte global volume *2

0x032d 1 short panning

0x032f 1 byte New Note Action [IT]

0x0330 1 byte Pitch Pan Separation [IT]

0x0331 1 byte Pitch Pan Center [IT]

0x0332 1 byte Duplicate Check Type [IT]

0x0333 1 byte Duplicate Check Action [IT]

0x0334 1 short Randon Volume variation (%) [IT]

0x0336 1 short Randon Panning variation [IT]

0x0338 1 byte Initial Filter Cutoff [IT]

0x0339 1 byte Initial Filter Resonance [IT]

The number of sample for given instrument is deducted from the sample map.

2.3.4 Samples data

Samples data takes 0x29 bytes after the sample name, which is 0 terminated.

Address Length Type Description

? char[] sample name (0 terminated)

? char[] sample filename (0 terminated)
Update: with v2.4 encoder, Johannes Schultz pointed to me "this 0-byte comes from the fact that v2.4 can also store sample filenames right after sample names, so a double-zero simply means that the sample filename is empty. "

0x0000 1 long finetune (0x00 in file = -128, 0x80 = 0, 0x76 = -10, 0x90 = 16) [MOD,MTM, XM]
or "C4/5 speed" for S3M and IT (with Amiga slides), unless linear bit is also set.

0x0004 1 byte transpose

0x0005 1 byte volume (max 64)

0x0006 1 short panning

0x0008 1 long size (in bytes for 8bits, in short for 16bits).
if size==0 and end!=0 : means removed sample (not used)

0x000c 1 long start

0x0010 1 long end

0x0014 1 short flags
bit #0 (0x0001): 1=16bits, 0=8bits
bit #4 (0x0010): 1=loop
bit #5 (0x0030): 1=bi-loop (both set with bit#4) [IT]
bit #12 (0x1000): 1=lossy compression (1=mp3 0x1000, together with bit#13 (0x3000) means ogg)
bit #13 (0x2000) : lossless compression 'delta' (with bit#12 cleared)
bit #14 (0x4000) : lossless compression 'delta prediction'
0x0000 means "not compressed" if size!=0

0x0016 1 byte vibrato type (0=sine, 1=ramp down, 2=square, 3=random) [IT]

0x0017 1 byte vibrato sweep [IT]

0x0018 1 byte vibrato depth [IT]

0x0019 1 byte vibrato rate [IT]

0x001a 1 byte global volume [IT]

0x001b 1 long sustain loop start [IT]

0x001f 1 long sustain loop end [IT]

0x0023 1 long compressed size (lossless, mp3 or ogg)

0x0027 1 short encoder delay

3. Samples

The samples lossless algorithms are here (mo3_unpack.c).
There is one version for 8 bits sample and another one for 16 bits samples.
There is 2 kind of algorithm : 1 based on delta encoding (as in xm), a second one based on delta prediction encoding. Then these delta values are stored in a compact way.

4. Code source

The source code version 0.6 is available here. Here is an extract of the 'readme' file:

"The piece of code has been written as a compagnion (validation code) of the document "the unofficial MO3 specification".

It is targeted to developpers or technical people, not for end users. It can be used by IT/XM/S3M modules specialists (tracker editor developper or modules players) to write a MO3 import loader, or more generally to handle MO3 modules in any way.

The MO3 format has been created by Ian Luck (http://www.un4seen.com). If you are looking for a good encoder and decoder (but without the source code) and a good module player, Ian's web site is the right place to go.

The features of unmo3 (opensource version) are:

uncompress the MO3 header and samples with lossless compressions.
able to save to a file uncompressed header and samples
able to extract mp3 and ogg compressed samples
can display a channel of a given pattern into 2 forms
- as encoded inside MO3 file
- as it usually appears in a tracker editor (for .mod only)

You can see output of the 'demo' here and the auto tests here. For tests, a second archive 'unmo3_test.zip' is required (has to be uncompressed in the same place as the 'unmo3_src.zip' archive).

The Win32 binary is here unmo3.exe.

5. Other information

unmo3 Win32 executable is compressed with PEtite, and Linux executable with UPX.

Other modules file formats are available here (from Wotsit):

Previous work on Amiga module compression is available here (Sylvain Chipaux and Gryzor).

You can find on Exotica a huge collection of Amiga music formats descriptions.

State of the art waveform compression is explained here : TTA, Shorten, AudioPak, and FLAC.

DUMB is a free opensource library to replay XM, IT, MOD and S3M modules.

XMP is a portable and opensource module player.

-end of the document-

0x0008	computed	byte[]	compressed header (see 2.2 and 2.3 section)
computed	computed	byte[]	samples, compressed or not, using lossless, mp3 or OGG

0x0008	1	long	data offset in compressed data after decompression
0x000c	computed	byte[]	compressed header (see 2.2 and 2.3 section)
computed	computed	byte[]	samples, compressed or not, using lossless, mp3 or OGG

0x0000	1	byte	number of channel (for example 04 for .mod, 0x20 for .xm)
0x0001	1	short	song len (at 0x3b6 in .mod, at 0x40 in .xm)
0x0003	1	short	restart position
0x0005	1	short	pattern number
0x0007	1	short	unique voice number
0x0009	1	short	instrument number
0x000b	1	short	sample number
0x000d	1	byte	ticks/row
0x000e	1	byte	initial tempo (default = 125)
0x000f	1	long	flags if (mo3Hdr->flags & 0x0100) printf("IT"); else if (mo3Hdr->flags & 0x0002) printf("S3M"); else if (mo3Hdr->flags & 0x0080) printf("MOD"); else if (mo3Hdr->flags & 0x0008) printf("MTM"); else printf("XM"); bit#0 : 1 means linear frequency table, 0 means Amiga table (cleared in .mod) bit#14: (0x00004000) currently used bit#17: (0x00020000) always set examples: 0x00024001 for .xm, 0x00020088 for .mod
0x0013	1	byte	global volume
0x0014	1	byte	pan separation
0x0015	1	byte	internal volume (could be ignored)
0x0016	64	byte[]	default channel volumes (for 64 channels).
0x0056	64	byte[]	default channel panning (for 64 channels).
0x0096	16	byte[]	IT MIDI macros : SF0-SFF settings (equate to "F0F0<value-1>z" in IT)
0x00a6	128*2	byte	IT MIDI macros : Z80-ZFF settings (2 bytes each, equate to "F0F0<value1><value2>")