Large Text Compression Benchmark
(mattmahoney.net)10 points by redeux 2 days ago | 3 comments
10 points by redeux 2 days ago | 3 comments
pama 12 minutes ago | prev | next |
It would be nice to also have a competition of this type where within ressonable limits the size of the compressor does not matter and the material to be compressed is hidden and varied over time. For example up to 10GB compressor size and the dataset is a different random chunk of fineweb every week.
pmayrgundter 13 minutes ago | prev |
The very notable thing here is that the best method uses a Transformer, and no other entry does
hyperpape 15 minutes ago | next |
It's worth noting that the benchmark has not been updated as frequently for the past several years, and some versions of compressors are quite far behind the current implementations (http://www.mattmahoney.net/dc/text.html#history).
The one instance I double-checked (zstd) I don't recall it making a massive difference, but it did make a difference (iirc, the current version was slightly smaller than what was listed in the benchmark).