MXQuantizeDequantize#
MXQuantizeDequantize - 1#
Version#
name: MXQuantizeDequantize
domain: com.amd.quark
Summary#
Microscaling, also known as OCP (Open Compute Project) MX, assigns a small-scale adjustment to individual exponent within the block: in addition to the shared exponent, each element also has its own micro exponent, meaning that the element has an independent data type. This finer granularity improves precision, as each value can adjust more dynamically to its specific range. The OCP MX specification introduces several concrete formats, including MXFP8(E5M2), MXFP8(E4M3), MXFP6(E3M2), MXFP6(E2M3), MXFP4(E2M1) and MXINT8.
This operator converts floating-point values (typically 32-bit floating-point numbers) into Microscaling values, and then convert them back. It approximates the Quantize-Dequantize process and introduces quantization errors.
Note
Compared with MicroeXponents, Microscaling applies linear scaling per element and is simpler, hardware-friendly, and more stable, making it ideal for production and general-purpose deployment. In contrast, MicroeXponents shares a dynamic exponent within groups, offering better compression and dynamic range at the cost of higher complexity and potential numerical instability. For most applications, especially those targeting standard inference engines or requiring robustness, perhaps Microscaling is the preferred choice.
Attributes#
element_dtype - STRING (default is ‘int8’):
(Optional) Specify the type of elements, options are ‘fp8_e5m2’, ‘fp8_e4m3’, ‘fp6_e3m2’, ‘fp6_e2m3’, ‘fp4_e2m1’, ‘int8’.
axis - INT (default is ‘1’):
(Optional) The axis for spliting the input tensor to blocks.
block_size - INT (default is ‘32’):
(Optional) Number of elements in the block.
rounding_mode - INT (default is ‘0’):
(Optional) Rounding mode, 0 for rounding half away from zero, 1 for rounding half upward and 2 for rounding half to even.
Table 1. Configurations of OCP MX data types
MXFP8(E5M2) |
MXFP8(E4M3) |
MXFP6(E3M2) |
MXFP6(E2M3) |
MXFP4(E2M1) |
MXINT8 |
|
---|---|---|---|---|---|---|
element_dtype |
fp8_e5m2 |
fp8_e4m3 |
fp6_e3m2 |
fp6_e2m3 |
fp4_e2m1 |
int8 |
axis |
1 |
1 |
1 |
1 |
1 |
1 |
block_size |
32 |
32 |
32 |
32 |
32 |
32 |
rounding_mode |
2 |
2 |
2 |
2 |
2 |
2 |
Inputs#
x (heterogeneous) - T:
N-D input tensor.
Outputs#
y (heterogeneous) - T:
N-D output tensor. It would have accuracy loss compared to the input tensor x.
Type Constraints#
T in ( tensor(float) ):
Constrain input and output types to float tensors.