dropshell release 2025.0513.2134
Some checks failed
Dropshell Test / Build_and_Test (push) Has been cancelled

This commit is contained in:
Your Name
2025-05-13 21:34:59 +12:00
parent adcb3567d4
commit bd1ad20990
1055 changed files with 168339 additions and 0 deletions

View File

@ -0,0 +1,26 @@
Zstandard Documentation
=======================
This directory contains material defining the Zstandard format,
as well as detailed instructions to use `zstd` library.
__`zstd_manual.html`__ : Documentation of `zstd.h` API, in html format.
Unfortunately, Github doesn't display `html` files in parsed format, just as source code.
For a readable display of html API documentation of latest release,
use this link: [https://raw.githack.com/facebook/zstd/release/doc/zstd_manual.html](https://raw.githack.com/facebook/zstd/release/doc/zstd_manual.html) .
__`zstd_compression_format.md`__ : This document defines the Zstandard compression format.
Compliant decoders must adhere to this document,
and compliant encoders must generate data that follows it.
Should you look for resources to develop your own port of Zstandard algorithm,
you may find the following resources useful :
__`educational_decoder`__ : This directory contains an implementation of a Zstandard decoder,
compliant with the Zstandard compression format.
It can be used, for example, to better understand the format,
or as the basis for a separate implementation of Zstandard decoder.
[__`decode_corpus`__](https://github.com/facebook/zstd/tree/dev/tests#decodecorpus---tool-to-generate-zstandard-frames-for-decoder-testing) :
This tool, stored in `/tests` directory, is able to generate random valid frames,
which is useful if you wish to test your decoder and verify it fully supports the specification.

View File

@ -0,0 +1,148 @@
Decompressor Errata
===================
This document captures known decompressor bugs, where the decompressor rejects a valid zstd frame.
Each entry will contain:
1. The last affected decompressor versions.
2. The decompressor components affected.
2. Whether the compressed frame could ever be produced by the reference compressor.
3. An example frame (hexadecimal string when it can be short enough, link to golden file otherwise)
4. A description of the bug.
The document is in reverse chronological order, with the bugs that affect the most recent zstd decompressor versions listed first.
No sequence using the 2-bytes format
------------------------------------------------
**Last affected version**: v1.5.5
**Affected decompressor component(s)**: Library & CLI
**Produced by the reference compressor**: No
**Example Frame**: see zstd/tests/golden-decompression/zeroSeq_2B.zst
The zstd decoder incorrectly expects FSE tables when there are 0 sequences present in the block
if the value 0 is encoded using the 2-bytes format.
Instead, it should immediately end the sequence section, and move on to next block.
This situation was never generated by the reference compressor,
because representing 0 sequences with the 2-bytes format is inefficient
(the 1-byte format is always used in this case).
Compressed block with a size of exactly 128 KB
------------------------------------------------
**Last affected version**: v1.5.2
**Affected decompressor component(s)**: Library & CLI
**Produced by the reference compressor**: No
**Example Frame**: see zstd/tests/golden-decompression/block-128k.zst
The zstd decoder incorrectly rejected blocks of type `Compressed_Block` when their size was exactly 128 KB.
Note that `128 KB - 1` was accepted, and `128 KB + 1` is forbidden by the spec.
This type of block was never generated by the reference compressor.
These blocks used to be disallowed by the spec up until spec version 0.3.2 when the restriction was lifted by [PR#1689](https://github.com/facebook/zstd/pull/1689).
> A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block).
Compressed block with 0 literals and 0 sequences
------------------------------------------------
**Last affected version**: v1.5.2
**Affected decompressor component(s)**: Library & CLI
**Produced by the reference compressor**: No
**Example Frame**: `28b5 2ffd 2000 1500 0000 00`
The zstd decoder incorrectly rejected blocks of type `Compressed_Block` that encodes literals as `Raw_Literals_Block` with no literals, and has no sequences.
This type of block was never generated by the reference compressor.
Additionally, these blocks were disallowed by the spec up until spec version 0.3.2 when the restriction was lifted by [PR#1689](https://github.com/facebook/zstd/pull/1689).
> A Compressed_Block has the extra restriction that Block_Size is always strictly less than the decompressed size. If this condition cannot be respected, the block must be sent uncompressed instead (Raw_Block).
First block is RLE block
------------------------
**Last affected version**: v1.4.3
**Affected decompressor component(s)**: CLI only
**Produced by the reference compressor**: No
**Example Frame**: `28b5 2ffd a001 0002 0002 0010 000b 0000 00`
The zstd CLI decompressor rejected cases where the first block was an RLE block whose `Block_Size` is 131072, and the frame contains more than one block.
This only affected the zstd CLI, and not the library.
The example is an RLE block with 131072 bytes, followed by a second RLE block with 1 byte.
The compressor currently works around this limitation by explicitly avoiding producing RLE blocks as the first
block.
https://github.com/facebook/zstd/blob/8814aa5bfa74f05a86e55e9d508da177a893ceeb/lib/compress/zstd_compress.c#L3527-L3535
Tiny FSE Table & Block
----------------------
**Last affected version**: v1.3.4
**Affected decompressor component(s)**: Library & CLI
**Produced by the reference compressor**: Possibly until version v1.3.4, but probably never
**Example Frame**: `28b5 2ffd 2027 c500 0080 f3f1 f0ec ebc6 c5c7 f09d 4300 0000 e0e0 0658 0100 603e 52`
The zstd library rejected blocks of type `Compressed_Block` whose offset of the last table with type `FSE_Compressed_Mode` was less than 4 bytes from the end of the block.
In more depth, let `Last_Table_Offset` be the offset in the compressed block (excluding the header) that
the last table with type `FSE_Compressed_Mode` started. If `Block_Content - Last_Table_Offset < 4` then
the buggy zstd decompressor would reject the block. This occurs when the last serialized table is 2 bytes
and the bitstream size is 1 byte.
For example:
* There is 1 sequence in the block
* `Literals_Lengths_Mode` is `FSE_Compressed_Mode` & the serialized table size is 2 bytes
* `Offsets_Mode` is `Predefined_Mode`
* `Match_Lengths_Mode` is `Predefined_Mode`
* The bitstream is 1 byte. E.g. there is only one sequence and it fits in 1 byte.
The total `Block_Content` is `5` bytes, and `Last_Table_Offset` is `2`.
See the compressor workaround code:
https://github.com/facebook/zstd/blob/8814aa5bfa74f05a86e55e9d508da177a893ceeb/lib/compress/zstd_compress.c#L2667-L2682
Magicless format
----------------------
**Last affected version**: v1.5.5
**Affected decompressor component(s)**: Library
**Produced by the reference compressor**: Yes (example: https://gist.github.com/embg/9940726094f4cf2cef162cffe9319232)
**Example Frame**: `27 b5 2f fd 00 03 19 00 00 66 6f 6f 3f ba c4 59`
v1.5.6 fixes several bugs in which the magicless-format decoder rejects valid frames.
These include but are not limited to:
* Valid frames that happen to begin with a legacy magic number (little-endian)
* Valid frames that happen to begin with a skippable magic number (little-endian)
If you are affected by this issue and cannot update to v1.5.6 or later, there is a
workaround to recover affected data. Simply prepend the ZSTD magic number
`0xFD2FB528` (little-endian) to your data and decompress using the standard-format
decoder.

View File

@ -0,0 +1,80 @@
Decompressor Permissiveness to Invalid Data
===========================================
This document describes the behavior of the reference decompressor in cases
where it accepts formally invalid data instead of reporting an error.
While the reference decompressor *must* decode any compliant frame following
the specification, its ability to detect erroneous data is on a best effort
basis: the decoder may accept input data that would be formally invalid,
when it causes no risk to the decoder, and which detection would cost too much
complexity or speed regression.
In practice, the vast majority of invalid data are detected, if only because
many corruption events are dangerous for the decoder process (such as
requesting an out-of-bound memory access) and many more are easy to check.
This document lists a few known cases where invalid data was formerly accepted
by the decoder, and what has changed since.
Truncated Huffman states
------------------------
**Last affected version**: v1.5.6
**Produced by the reference compressor**: No
**Example Frame**: `28b5 2ffd 0000 5500 0072 8001 0420 7e1f 02aa 00`
When using FSE-compressed Huffman weights, the compressed weight bitstream
could contain fewer bits than necessary to decode the initial states.
The reference decompressor up to v1.5.6 will decode truncated or missing
initial states as zero, which can result in a valid Huffman tree if only
the second state is truncated.
In newer versions, truncated initial states are reported as a corruption
error by the decoder.
Offset == 0
-----------
**Last affected version**: v1.5.5
**Produced by the reference compressor**: No
**Example Frame**: `28b5 2ffd 0000 4500 0008 0002 002f 430b ae`
If a sequence is decoded with `literals_length = 0` and `offset_value = 3`
while `Repeated_Offset_1 = 1`, the computed offset will be `0`, which is
invalid.
The reference decompressor up to v1.5.5 processes this case as if the computed
offset was `1`, including inserting `1` into the repeated offset list.
This prevents the output buffer from remaining uninitialized, thus denying a
potential attack vector from an untrusted source.
However, in the rare case where this scenario would be the outcome of a
transmission or storage error, the decoder relies on the checksum to detect
the error.
In newer versions, this case is always detected and reported as a corruption error.
Non-zeroes reserved bits
------------------------
**Last affected version**: v1.5.5
**Produced by the reference compressor**: No
The Sequences section of each block has a header, and one of its elements is a
byte, which describes the compression mode of each symbol.
This byte contains 2 reserved bits which must be set to zero.
The reference decompressor up to v1.5.5 just ignores these 2 bits.
This behavior has no consequence for the rest of the frame decoding process.
In newer versions, the 2 reserved bits are actively checked for value zero,
and the decoder reports a corruption error if they are not.

View File

@ -0,0 +1,2 @@
# Build artifacts
harness

View File

@ -0,0 +1,36 @@
Educational Decoder
===================
`zstd_decompress.c` is a self-contained implementation in C99 of a decoder,
according to the [Zstandard format specification].
While it does not implement as many features as the reference decoder,
such as the streaming API or content checksums, it is written to be easy to
follow and understand, to help understand how the Zstandard format works.
It's laid out to match the [format specification],
so it can be used to understand how complex segments could be implemented.
It also contains implementations of Huffman and FSE table decoding.
[Zstandard format specification]: https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md
[format specification]: https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md
While the library's primary objective is code clarity,
it also happens to compile into a small object file.
The object file can be made even smaller by removing error messages,
using the macro directive `ZDEC_NO_MESSAGE` at compilation time.
This can be reduced even further by foregoing dictionary support,
by defining `ZDEC_NO_DICTIONARY`.
`harness.c` provides a simple test harness around the decoder:
harness <input-file> <output-file> [dictionary]
As an additional resource to be used with this decoder,
see the `decodecorpus` tool in the [tests] directory.
It generates valid Zstandard frames that can be used to verify
a Zstandard decoder implementation.
Note that to use the tool to verify this decoder implementation,
the --content-size flag should be set,
as this decoder does not handle streaming decoding,
and so it must know the decompressed size in advance.
[tests]: https://github.com/facebook/zstd/blob/dev/tests/

View File

@ -0,0 +1,119 @@
/*
* Copyright (c) Meta Platforms, Inc. and affiliates.
* All rights reserved.
*
* This source code is licensed under both the BSD-style license (found in the
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
* in the COPYING file in the root directory of this source tree).
* You may select, at your option, one of the above-listed licenses.
*/
#include <stdio.h>
#include <stdlib.h>
#include "zstd_decompress.h"
typedef unsigned char u8;
// If the data doesn't have decompressed size with it, fallback on assuming the
// compression ratio is at most 16
#define MAX_COMPRESSION_RATIO (16)
// Protect against allocating too much memory for output
#define MAX_OUTPUT_SIZE ((size_t)1024 * 1024 * 1024)
// Error message then exit
#define ERR_OUT(...) { fprintf(stderr, __VA_ARGS__); exit(1); }
typedef struct {
u8* address;
size_t size;
} buffer_s;
static void freeBuffer(buffer_s b) { free(b.address); }
static buffer_s read_file(const char *path)
{
FILE* const f = fopen(path, "rb");
if (!f) ERR_OUT("failed to open file %s \n", path);
fseek(f, 0L, SEEK_END);
size_t const size = (size_t)ftell(f);
rewind(f);
void* const ptr = malloc(size);
if (!ptr) ERR_OUT("failed to allocate memory to hold %s \n", path);
size_t const read = fread(ptr, 1, size, f);
if (read != size) ERR_OUT("error while reading file %s \n", path);
fclose(f);
buffer_s const b = { ptr, size };
return b;
}
static void write_file(const char* path, const u8* ptr, size_t size)
{
FILE* const f = fopen(path, "wb");
if (!f) ERR_OUT("failed to open file %s \n", path);
size_t written = 0;
while (written < size) {
written += fwrite(ptr+written, 1, size, f);
if (ferror(f)) ERR_OUT("error while writing file %s\n", path);
}
fclose(f);
}
int main(int argc, char **argv)
{
if (argc < 3)
ERR_OUT("usage: %s <file.zst> <out_path> [dictionary] \n", argv[0]);
buffer_s const input = read_file(argv[1]);
buffer_s dict = { NULL, 0 };
if (argc >= 4) {
dict = read_file(argv[3]);
}
size_t out_capacity = ZSTD_get_decompressed_size(input.address, input.size);
if (out_capacity == (size_t)-1) {
out_capacity = MAX_COMPRESSION_RATIO * input.size;
fprintf(stderr, "WARNING: Compressed data does not contain "
"decompressed size, going to assume the compression "
"ratio is at most %d (decompressed size of at most "
"%u) \n",
MAX_COMPRESSION_RATIO, (unsigned)out_capacity);
}
if (out_capacity > MAX_OUTPUT_SIZE)
ERR_OUT("Required output size too large for this implementation \n");
u8* const output = malloc(out_capacity);
if (!output) ERR_OUT("failed to allocate memory \n");
dictionary_t* const parsed_dict = create_dictionary();
if (dict.size) {
#if defined (ZDEC_NO_DICTIONARY)
printf("dict.size = %zu \n", dict.size);
ERR_OUT("no dictionary support \n");
#else
parse_dictionary(parsed_dict, dict.address, dict.size);
#endif
}
size_t const decompressed_size =
ZSTD_decompress_with_dict(output, out_capacity,
input.address, input.size,
parsed_dict);
free_dictionary(parsed_dict);
write_file(argv[2], output, decompressed_size);
freeBuffer(input);
freeBuffer(dict);
free(output);
return 0;
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,61 @@
/*
* Copyright (c) Meta Platforms, Inc. and affiliates.
* All rights reserved.
*
* This source code is licensed under both the BSD-style license (found in the
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
* in the COPYING file in the root directory of this source tree).
* You may select, at your option, one of the above-listed licenses.
*/
#include <stddef.h> /* size_t */
/******* EXPOSED TYPES ********************************************************/
/*
* Contains the parsed contents of a dictionary
* This includes Huffman and FSE tables used for decoding and data on offsets
*/
typedef struct dictionary_s dictionary_t;
/******* END EXPOSED TYPES ****************************************************/
/******* DECOMPRESSION FUNCTIONS **********************************************/
/// Zstandard decompression functions.
/// `dst` must point to a space at least as large as the reconstructed output.
size_t ZSTD_decompress(void *const dst, const size_t dst_len,
const void *const src, const size_t src_len);
/// If `dict != NULL` and `dict_len >= 8`, does the same thing as
/// `ZSTD_decompress` but uses the provided dict
size_t ZSTD_decompress_with_dict(void *const dst, const size_t dst_len,
const void *const src, const size_t src_len,
dictionary_t* parsed_dict);
/// Get the decompressed size of an input stream so memory can be allocated in
/// advance
/// Returns -1 if the size can't be determined
/// Assumes decompression of a single frame
size_t ZSTD_get_decompressed_size(const void *const src, const size_t src_len);
/******* END DECOMPRESSION FUNCTIONS ******************************************/
/******* DICTIONARY MANAGEMENT ***********************************************/
/*
* Return a valid dictionary_t pointer for use with dictionary initialization
* or decompression
*/
dictionary_t* create_dictionary(void);
/*
* Parse a provided dictionary blob for use in decompression
* `src` -- must point to memory space representing the dictionary
* `src_len` -- must provide the dictionary size
* `dict` -- will contain the parsed contents of the dictionary and
* can be used for decompression
*/
void parse_dictionary(dictionary_t *const dict, const void *src,
size_t src_len);
/*
* Free internal Huffman tables, FSE tables, and dictionary content
*/
void free_dictionary(dictionary_t *const dict);
/******* END DICTIONARY MANAGEMENT *******************************************/

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff