Development:MXU
From DingooWiki
MXU is the name for the XBurst SIMD instructions. SIMD means Single Instruction Multiple Data and is often used to speed up audio/video processing. Examples of SIMD instruction sets for other CPUs are MMX, SSE and AltiVec.
Contents |
[edit] Instruction Naming
The initial letter indicates the number of elements in the vector(s) operated upon: S(ingle) for 1, D(ual) for 2, Q(uad) for 4. The letter is followed by a number, which denotes the length of the input elements in bits. The number is followed by the name of the operation that will be performed.
[edit] Register Naming
There is a dedicated register set for the MXU operations. It contains 17 registers which will be referred to as xr0..xr16. Registers xr0..xr15 are used in computations, xr16 is a control register. MXU register xr0 always has value 0; writes to it have no effect.
The main MIPS registers will be referred to as r0..r31.
[edit] Enabling MXU
Before the MXU can be used, it must be enabled. This is done by setting bit 0 (the lowest bit) of xr16 to 1.
[edit] Load and Store Instructions
[edit] S32I2M
S32I2M xr, r
Assigns the value of main register r to MXU register xr.
[edit] S32M2I
S32M2I xr, r
Assigns the value of MXU register xr to main register r.
[edit] S32LDD
S32LDD xr, p, o
Loads the contents of the memory at p + o (pointer + offset) into MXU register xr.
[edit] S32LDDV
S32LDDV xr, p, o, s
Loads the contents of the memory at p + o * 2s (pointer + shifted offset) into MXU register xr.
[edit] S32LDI
S32LDI xr, p, o
Loads the contents of the memory at p + o (pointer + offset) into MXU register xr. After that, p is incremented by o.
[edit] S32LDIV
S32LDIV xr, p, o, s
Loads the contents of the memory at p + o * 2s (pointer + shifted offset) into MXU register xr.
After that, p is incremented by o * 2s.
[edit] S32STD
S32STD xr, p, o
Stores the contents of MXU register xr into the memory at p + o (pointer + offset).
[edit] S32STDV
S32STDV xr, p, o, s
Stores the contents of MXU register xr into the memory at p + o * 2s (pointer + shifted offset).
[edit] S32SDI
S32SDI xr, p, o
Stores the contents of MXU register xr into the memory at p + o (pointer + offset). After that, p is incremented by o.
[edit] S32SDIV
S32SDIV xr, p, o, s
Stores the contents of MXU register xr into the memory at p + o * 2s (pointer + shifted offset). After that, p is incremented by o * 2s.
[edit] Addition and Subtraction Instructions
[edit] D32ADD, Q16ADD
D32ADD xra, xrb, xrc, xrd, addsub
Q16ADD xra, xrb, xrc, xrd, addsub, swizzle
Performs addition and/or subtraction on vectors xrb and xrc and writes the results to vectors xra and xrd.
Whether the values are added or subtracted is controlled by addsub, as shown in the following table:
| addsub = AA: | xra := xrb + xrc; | xrd := xrb + xrc |
| addsub = AS: | xra := xrb + xrc; | xrd := xrb - xrc |
| addsub = SA: | xra := xrb - xrc; | xrd := xrb + xrc |
| addsub = SS: | xra := xrb - xrc; | xrd := xrb - xrc |
When the vector elements are 16-bit, it is possible to swizzle the values read from vector xrb as follows:
| swizzle = WW: | xrb.hl | (as-is) |
| swizzle = XW: | xrb.lh | (exchanged) |
| swizzle = HW: | xrb.hh | (clone high) |
| swizzle = LW: | xrb.ll | (clone low) |
The values read from vector xrc are always used as-is.
[edit] D32ACC, Q16ACC
D32ACC xra, xrb, xrc, xrd, addsub
Q16ACC xra, xrb, xrc, xrd, addsub, swizzle
Performs addition and/or subtraction on vectors xrb and xrc and adds the results to vectors xra and xrd.
Whether the values are added or subtracted is controlled by addsub, as shown in the following table:
| mode = AA: | xra += xrb + xrc; | xrd += xrb + xrc |
| mode = AS: | xra += xrb + xrc; | xrd += xrb - xrc |
| mode = SA: | xra += xrb - xrc; | xrd += xrb + xrc |
| mode = SS: | xra += xrb - xrc; | xrd += xrb - xrc |
When the vector elements are 16-bit, it is possible to swizzle the values read from vector xrb as follows:
| swizzle = WW: | xrb.hl | (as-is) |
| swizzle = XW: | xrb.lh | (exchanged) |
| swizzle = HW: | xrb.hh | (clone high) |
| swizzle = LW: | xrb.ll | (clone low) |
The values read from vector xrc are always used as-is.
[edit] Q8ADD
Q8ADD xra, xrb, xrc, addsub
Adds or subtracts the four 8-bit values in the vectors xrb and xrc. The four 8-bit results are stored in the vector xra.
Whether the values are added or subtracted is controlled by addsub, as shown in the following table:
| addsub = AA: | xra.h := xrb.h + xrc.h; | xra.l := xrb.l + xrc.l |
| addsub = AS: | xra.h := xrb.h + xrc.h; | xra.l := xrb.l - xrc.l |
| addsub = SA: | xra.h := xrb.h - xrc.h; | xra.l := xrb.l + xrc.l |
| addsub = SS: | xra.h := xrb.h - xrc.h; | xra.l := xrb.l - xrc.l |
[edit] Q8ADDE
Q8ADDE xra, xrb, xrc, xrd, addsub
Adds or subtracts the four 8-bit unsigned values in the vectors xrb and xrc. The four 16-bit results are stored in the vectors xra and xrd.
Whether the values are added or subtracted is controlled by addsub, as shown in the following table:
| addsub = AA: | xra := xrb.h + xrc.h; | xrd := xrb.l + xrc.l |
| addsub = AS: | xra := xrb.h + xrc.h; | xrd := xrb.l - xrc.l |
| addsub = SA: | xra := xrb.h - xrc.h; | xrd := xrb.l + xrc.l |
| addsub = SS: | xra := xrb.h - xrc.h; | xrd := xrb.l - xrc.l |
[edit] Q8ACCE
Q8ACCE xra, xrb, xrc, xrd, addsub
Adds or subtracts the four 8-bit unsigned values in the vectors xrb and xrc. The four 16-bit results are added to the vectors xra and xrd.
Whether the values are added or subtracted is controlled by addsub, as shown in the following table:
| addsub = AA: | xra += xrb.h + xrc.h; | xrd += xrb.l + xrc.l |
| addsub = AS: | xra += xrb.h + xrc.h; | xrd += xrb.l - xrc.l |
| addsub = SA: | xra += xrb.h - xrc.h; | xrd += xrb.l + xrc.l |
| addsub = SS: | xra += xrb.h - xrc.h; | xrd += xrb.l - xrc.l |
[edit] D16AVG, Q8AVG
D16AVG xra, xrb, xrc
Q8AVG xra, xrb, xrc
Computes the average, rounded down, of the unsigned values in vectors xrb and xrc and assigns the result to vector xra.
[edit] D16AVGR, Q8AVGR
D16AVGR xra, xrb, xrc
Q8AVGR xra, xrb, xrc
Computes the average, rounded up, of the unsigned values in vectors xrb and xrc and assigns the result to vector xra.
[edit] Q8SAD
Q8SAD xra, xrb, xrc, xrd
Computes the absolute difference of the unsigned values in vectors xrb and xrc. The sum of these 4 differences is assigned to the full register xra and added to the full register xrd.
[edit] Multiply Instructions
[edit] D16MUL, Q8MUL
D16MUL xra, xrb, xrc, xrd, swizzle
Q8MUL xra, xrb, xrc, xrd
Multiplies the signed values in vector xrb by the signed values in vector xrc and assigns the results to vectors xra and xrd.
When the vector elements are 16-bit, it is possible to swizzle the values read from vector xrb as follows:
| swizzle = WW: | xrb.hl | (as-is) |
| swizzle = XW: | xrb.lh | (exchanged) |
| swizzle = HW: | xrb.hh | (clone high) |
| swizzle = LW: | xrb.ll | (clone low) |
The values read from vector xrc are always used as-is.
[edit] D16MAC, Q8MAC
D16MAC xra, xrb, xrc, xrd, addsub, swizzle
Q8MAC xra, xrb, xrc, xrd, addsub
Multiplies the signed values in vector xrb by the signed values in vector xrc and adds or subtracts the results to vectors xra and xrd.
Whether the values are added or subtracted is controlled by addsub, as shown in the following table:
| addsub = AA: | xra += xrb.h * xrc.h; | xrd += xrb.l * xrc.l |
| addsub = AS: | xra += xrb.h * xrc.h; | xrd -= xrb.l * xrc.l |
| addsub = SA: | xra -= xrb.h * xrc.h; | xrd += xrb.l * xrc.l |
| addsub = SS: | xra -= xrb.h * xrc.h; | xrd -= xrb.l * xrc.l |
When the vector elements are 16-bit, it is possible to swizzle the values read from vector xrb as follows:
| swizzle = WW: | xrb.hl | (as-is) |
| swizzle = XW: | xrb.lh | (exchanged) |
| swizzle = HW: | xrb.hh | (clone high) |
| swizzle = LW: | xrb.ll | (clone low) |
The values read from vector xrc are always used as-is.
[edit] D16MADL, Q8MADL
D16MADL xra, xrb, xrc, xrd, addsub, swizzle
Q8MADL xra, xrb, xrc, xrd, addsub
Multiplies the signed values in vector xrb by the signed values in vector xrc. The results of the multiplication are added or subtracted from the values in vector xra and that final result is written to vector xrd.
Whether the values are added or subtracted is controlled by addsub, as shown in the following table:
| addsub = AA: | xrd.h := xra.h + xrb.h * xrc.h; | xrd.l := xra.l + xrb.l * xrc.l |
| addsub = AS: | xrd.h := xra.h + xrb.h * xrc.h; | xrd.l := xra.l - xrb.l * xrc.l |
| addsub = SA: | xrd.h := xra.h - xrb.h * xrc.h; | xrd.l := xra.l + xrb.l * xrc.l |
| addsub = SS: | xrd.h := xra.h - xrb.h * xrc.h; | xrd.l := xra.l - xrb.l * xrc.l |
When the vector elements are 16-bit, it is possible to swizzle the values read from vector xrb as follows:
| swizzle = WW: | xrb.hl | (as-is) |
| swizzle = XW: | xrb.lh | (exchanged) |
| swizzle = HW: | xrb.hh | (clone high) |
| swizzle = LW: | xrb.ll | (clone low) |
The values read from vector xrc are always used as-is.
[edit] D16MULF
D16MULF xra, xrb, xrc, swizzle
Multiplies the signed values in vector xrb by the signed values in vector xrc. The highest 16 bits of the results of the multiplication are written to vector xra. Note that the result of multiplying two 16-bit signed numbers is a 31-bit signed number (bit 30 being the sign bit), so vector xra will contain bits 30..15 of the two multiplication results, not bits 31..16.
It is possible to swizzle the values read from vector xrb as follows:
| swizzle = WW: | xrb.hl | (as-is) |
| swizzle = XW: | xrb.lh | (exchanged) |
| swizzle = HW: | xrb.hh | (clone high) |
| swizzle = LW: | xrb.ll | (clone low) |
The values read from vector xrc are always used as-is.
[edit] D16MACF
D16MACF xra, xrb, xrc, xrd, addsub, swizzle
Multiplies the signed values in vector xrb by the signed values in vector xrc. These results are doubled to make two 32-bit signed numbers. Those numbers are then added to or subtracted from vector xra and xrd. The upper 16 bits of those numbers, rounded up, are written to vector xra.
Whether the values are added or subtracted is controlled by addsub, as shown in the following table:
| addsub = AA: | xra.h := ceil((xra + xrb.h * xrc.h * 2) / 2^16); | xra.l := ceil((xrd + xrb.l * xrc.l * 2) / 2^16) |
| addsub = AS: | xra.h := ceil((xra + xrb.h * xrc.h * 2) / 2^16); | xra.l := ceil((xrd - xrb.l * xrc.l * 2) / 2^16) |
| addsub = SA: | xra.h := ceil((xra - xrb.h * xrc.h * 2) / 2^16); | xra.l := ceil((xrd + xrb.l * xrc.l * 2) / 2^16) |
| addsub = SS: | xra.h := ceil((xra - xrb.h * xrc.h * 2) / 2^16); | xra.l := ceil((xrd - xrb.l * xrc.l * 2) / 2^16) |
It is possible to swizzle the values read from vector xrb as follows:
| swizzle = WW: | xrb.hl | (as-is) |
| swizzle = XW: | xrb.lh | (exchanged) |
| swizzle = HW: | xrb.hh | (clone high) |
| swizzle = LW: | xrb.ll | (clone low) |
The values read from vector xrc are always used as-is.
[edit] S16MAD
S16MAD xra, xrb, xrc, xrd, addsub, select
Multiplies a 16-bit signed value from vector xrb with a 16-bit signed value from vector xrc. The result is added to or subtracted from xra and the final result is written to xrd.
Whether the multiplication result is added or subtracted is controlled by addsub, as shown in the following table:
| addsub = A: | xrd := xra + x * y |
| addsub = S: | xrd := xra - x * y |
Which parts of xrb and xrc are used is controlled by select, as shown in the following table:
| select = HH: | x := xrb.h; | y := xrc.h |
| select = HL: | x := xrb.h; | y := xrc.l |
| select = LH: | x := xrb.l; | y := xrc.h |
| select = LL: | x := xrb.l; | y := xrc.l |
[edit] Other Math
[edit] S32MAX, D16MAX, Q8MAX
S32MAX xra, xrb, xrc
D16MAX xra, xrb, xrc
Q8MAX xra, xrb, xrc
Takes the maximum of the signed values of vector xrb and vector xrc and assigns those to vector xra.
[edit] S32MIN, D16MIN, Q8MIN
S32MIN xra, xrb, xrc
D16MIN xra, xrb, xrc
Q8MIN xra, xrb, xrc
Takes the minimum of the signed values of vector xrb and vector xrc and assigns those to vector xra.
[edit] Q16SAT
Q16SAT xra, xrb, xrc
Saturate: The values in xrb and xrc are taken as four 16-bit signed integers and clamped to the range [0..255]. The result is written to xra, with from high to low: upper half of xrb, lower half of xrb, upper half of xrc, lower half of xrc.
[edit] S32CPS, D16CPS
S32CPS xra, xrb, xrc
D16CPS xra, xrb, xrc
Copy Sign: For each signed value in vector xrc: If it is non-negative signed value, assign the corresponding value from vector xrb, unmodified, to vector xra. Otherwise, assign the corresponding value from vector xrb, negated, to vector xra.
[edit] Q8ABD
Q8ABD xra, xrb, xrc
Absolute difference: Computes the absolute value of the difference of the unsigned values in vector xrb and vector xrc and assigns the result to vector xra.
[edit] Q8SLT
Q8SLT xra, xrb, xrc
Set on Less Than: Compares the signed values in vector xrb and vector xrc. If the value from xrb is less than the value from xrc, 1 is assigned to the corresponding position in vector xra, otherwise 0 is assigned.
This is a vectorized version of the MIPS instruction SLT.
[edit] Shift and Shuffle Instructions
[edit] D32SLL
D32SLL xra, xrb, xrc, xrd, S
Shift Logical Left: The value of xrb is shifted S bits to the left and the result is assigned to xra. Also, the value of xrc is shifted S bits to the left and the result is assigned to xrd. S is a constant in the range [0..31].
[edit] D32SLLV
D32SLLV xra, xrb, rs
Shift Logical Left: The value of xra is shifted S bits to the left and the result is assigned to xra. Also, the value of xrb is shifted S bits to the left and the result is assigned to xrb. S is [0..31]: the value of the lowest 5 bits of main MIPS register rs.
[edit] D32SLR
D32SLR xra, xrb, xrc, xrd, S
Shift Logical Right: The unsigned value of xrb is shifted S bits to the right and the result is assigned to xra. Also, the unsigned value of xrc is shifted S bits to the right and the result is assigned to xrd. S is a constant in the range [0..31].
[edit] D32SLRV
D32SLRV xra, xrb, rs
Shift Logical Right: The unsigned value of xra is shifted S bits to the right and the result is assigned to xra. Also, the unsigned value of xrb is shifted S bits to the right and the result is assigned to xrb. S is [0..31]: the value of the lowest 5 bits of main MIPS register rs.
[edit] D32SAR
D32SAR xra, xrb, xrc, xrd, S
Shift Arithmetic Right: The signed value of xrb is shifted S bits to the right and the result is assigned to xra. Also, the signed value of xrc is shifted S bits to the right and the result is assigned to xrd. S is a constant in the range [0..31].
[edit] D32SARV
D32SARV xra, xrb, rs
Shift Arithmetic Right: The signed value of xra is shifted S bits to the right and the result is assigned to xra. Also, the signed value of xrb is shifted S bits to the right and the result is assigned to xrb. S is [0..31]: the value of the lowest 5 bits of main MIPS register rs.
[edit] D32SARL
D32SARL xra, xrb, xrc, S
Shift Arithmetic Right: The signed value of xrb is shifted S bits to the right and the lower 16 bits of the result are assigned to the higher 16 bits of xra. Also, the signed value of xrc is shifted S bits to the right and the lower 16 bits of the result are assigned to the lower 16 bits of xra. S is a constant in the range [0..31].
[edit] D32SARW
D32SARW xra, xrb, xrc, rs
Shift Arithmetic Right: The signed value of xrb is shifted S bits to the right and the lower 16 bits of the result are assigned to the higher 16 bits of xra. Also, the signed value of xrc is shifted S bits to the right and the lower 16 bits of the result are assigned to the lower 16 bits of xra. S is [0..31]: the value of the lowest 5 bits of main MIPS register rs.
[edit] Q16SLL
Q16SLL xra, xrb, xrc, xrd, S
Shift Logical Left: The values of the upper and lower halves of xrb are shifted S bits to the left and the result is assigned to xra. Also, the values of the upper and lower halves of xrc are shifted S bits to the left and the result is assigned to xrd. S is a constant in the range [0..15].
[edit] Q16SLLV
Q16SLLV xra, xrb, rs
Shift Logical Left: The values of the upper and lower halves of xra are shifted S bits to the left and the result is assigned to xra. Also, the values of the upper and lower halves of xrb are shifted S bits to the left and the result is assigned to xrb. S is [0..15]: the value of the lowest 4 bits of main MIPS register rs.
[edit] Q16SLR
Q16SLR xra, xrb, xrc, xrd, S
Shift Logical Right: The unsigned values of the upper and lower halves of xrb are shifted S bits to the right and the result is assigned to xra. Also, the unsigned values of the upper and lower halves of xrc are shifted S bits to the right and the result is assigned to xrd. S is a constant in the range [0..15].
[edit] Q16SLRV
Q16SLRV xra, xrb, rs
Shift Logical Right: The unsigned values of the upper and lower halves of xra are shifted S bits to the right and the result is assigned to xra. Also, the unsigned values of the upper and lower halves of xrb are shifted S bits to the right and the result is assigned to xrb. S is [0..15]: the value of the lowest 4 bits of main MIPS register rs.
[edit] Q16SAR
Q16SAR xra, xrb, xrc, xrd, S
Shift Arithmetic Right: The signed values of the upper and lower halves of xrb are shifted S bits to the right and the result is assigned to xra. Also, the signed values of the upper and lower halves of xrc are shifted S bits to the right and the result is assigned to xrd. S is a constant in the range [0..15].
[edit] Q16SARV
Q16SARV xra, xrb, rs
Shift Arithmetic Right: The signed values of the upper and lower halves of xra are shifted S bits to the right and the result is assigned to xra. Also, the signed values of the upper and lower halves of xrb are shifted S bits to the right and the result is assigned to xrb. S is [0..15]: the value of the lowest 4 bits of main MIPS register rs.
[edit] S32ALN
S32ALN xra, xrb, xrc, s
Takes the value of xrb:xrc, shifts it s bytes (0..4) to the left and assigns the highest 32 bits of the result to xra. Can be used to realign values that are not aligned in memory.
[edit] S32SFL
S32SFL xra, xrb, xrc, xrd, ptn
Shuffles (swizzles) the bytes of xrb and xrc as indicated in the table below and writes the result into xra and xrd.
| Input | xrb | xrc | ||||||
|---|---|---|---|---|---|---|---|---|
| b3 | b2 | b1 | b0 | c3 | c2 | c1 | c0 | |
| Output | xra | xrd | ||||||
| ptn=0 | b3 | c3 | b2 | c2 | b1 | c1 | b0 | c0 |
| ptn=1 | b3 | b1 | c3 | c1 | b2 | b0 | c2 | c0 |
| ptn=2 | b3 | c3 | b1 | c1 | b2 | c2 | b0 | c0 |
| ptn=3 | b3 | b2 | c3 | c2 | b1 | b0 | c1 | c0 |

