Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize bit reverse table #1

Open
X-Ryl669 opened this issue Apr 28, 2020 · 1 comment
Open

Optimize bit reverse table #1

X-Ryl669 opened this issue Apr 28, 2020 · 1 comment

Comments

@X-Ryl669
Copy link

Although the BitReverseTable is only used in a single function, it's possible to optimize it away either by doing the bit test in the function itself, or to use this code:

uint8_t reverseByte(const uint8_t a)
{
    uint8 v = a;
    v = ((v >> 1) & 0x55555555) | ((v & 0x55555555) << 1);
    v = ((v >> 2) & 0x33333333) | ((v & 0x33333333) << 2);
    v = ((v >> 4) & 0x0F0F0F0F) | ((v & 0x0F0F0F0F) << 4);
    return (uint8)v;
}

This compiles to a lot less than 256 bytes, it's using 15 cycles with no multiply or division, so is probably faster than accessing the SPI flash for the table.

There's also a version with 2 multiplication:

uint8_t reverveByte2(const uint8_t b)
{
    return ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16;
}

It's using only 7 operations. Those operations are described in Bit Twiddling Hacks

By the way, even better the code could probably be replaced by:

		for (int r = Height >> 3; --r >= 0;) {
			uint8_t Byte = *Data++; 
			*optr = (*optr & bit) | ((Byte & 0x80) >> shift); optr += DWidth; Byte <<= 1;
			*optr = (*optr & bit) | ((Byte & 0x80) >> shift); optr += DWidth; Byte <<= 1;
			*optr = (*optr & bit) | ((Byte & 0x80) >> shift); optr += DWidth; Byte <<= 1;
			*optr = (*optr & bit) | ((Byte & 0x80) >> shift); optr += DWidth; Byte <<= 1;
			*optr = (*optr & bit) | ((Byte & 0x80) >> shift); optr += DWidth; Byte <<= 1;
			*optr = (*optr & bit) | ((Byte & 0x80) >> shift); optr += DWidth; Byte <<= 1;
			*optr = (*optr & bit) | ((Byte & 0x80) >> shift); optr += DWidth; Byte <<= 1;
			*optr = (*optr & bit) | ((Byte & 0x80) >> shift); optr += DWidth;
		}	
@philippe44
Copy link
Owner

It's difficult to say what's really going to be faster as SPI and cache make things more difficult to evaluate. Maybe you could try to measure it using xthal_get_ccount()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants