Launched in 2006, the prize awards 5000 euros for each one percent improvement (with 500,000 euros total funding) in the compressed size of the file enwik9, which is the larger of two files used in the Large Text Compression Benchmark; enwik9 consists of the first 1,000,000,000 characters of a specific version of English Wikipedia. The ongoing competition is organized by Hutter, Matt Mahoney, and Jim Bowery.
The contest is open-ended. It is open to everyone. To enter, a competitor must submit a compression program and a decompressor that decompresses to the file enwik9. It is also possible to submit a compressed file instead of the compression program. The total size of the compressed file and decompressor (as a Win32 or Linux executable) must not be larger than 99% of the previous prize winning entry. For each one percent improvement, the competitor wins 5,000 euros. The decompression program must also meet execution time and memory constraints.
The prize was announced on August 6, 2006 with a smaller text file: enwik8 consisting of 100MB. On February 21, 2020 it was expanded by a factor of 10, to enwik9 of 1GB, similarly, the prize goes from 50,000 to 500,000 euros. The original prize baseline was 18,324,887 bytes, achieved by PAQ8F. The expanded prize baseline was 116MB.
On August 20 of that same year, Alexander Ratushnyak submitted PAQ8HKCC, a modified version of PAQ8H, which improved compression by 2.6% over PAQ8F. He continued to improve the compression to 3.0% with PAQ8HP1 on August 21, 4% with PAQ8HP2 on August 28, 4.9% with PAQ8HP3 on September 3, 5.9% with PAQ8HP4 on September 10, and 5.9% with PAQ8HP5 on September 25. At that point he was declared the first winner of the Hutter prize, awarded 3416 euros, and the new baseline was set to 17,073,018 bytes.
Lossy compression consists of a transform to separate important from unimportantdata, followed by lossless compression of the important part and discarding therest. The transform is an AI problem because it requires understanding what thehuman brain can and cannot perceive.
Lossless compressors ignore meaningless data in selecting contexts. Meaninglessor random data has no predictive value and is itself not compressible. A lossy compressornot only ignores the meaningless data, but also discards it completely. Decidingwhich data is meaningful is a hard AI problem that applies to both lossless andlossy compression. Both require a deep understanding of human cognitive psychology. 2b1af7f3a8