-
Notifications
You must be signed in to change notification settings - Fork 13
Cross compiler and a few other fixes on top of #4 #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit implements CRC32 using power8 vector intrinsics
and gcc builtins instead pure assembly. The performance is
the same compared to .S version:
time ./vec_crc32_bench 32768 5000000
CRC: 165b4c91
real 0m2.799s
user 0m2.799s
sys 0m0.000s
time ./crc32_bench 32768 5000000
CRC: 165b4c91
real 0m2.803s
user 0m2.803s
sys 0m0.000s
Perf results:
perf stat -a ./vec_crc32_bench 32768 5000000
CRC: 165b4c91
Performance counter stats for 'system wide':
360774.660732 task-clock (msec) # 128.683 CPUs utilized
529 context-switches # 0.001 K/sec
8 cpu-migrations # 0.000 K/sec
208 page-faults # 0.001 K/sec
12,468,436,530 cycles # 0.035 GHz (66.62%)
18,068,249 stalled-cycles-frontend # 0.14% cycles idle
466,739,548 stalled-cycles-backend # 3.74% cycles idle
49,670,139,591 instructions # 3.98 insns per cycle
# 0.01 stalled cycles
per insn (66.82%)
1,370,729,619 branches # 3.799 M/sec (50.09%)
5,759,980 branch-misses # 0.42% of all branches
2.803581718 seconds time elapsed
perf stat -a ./crc32_bench 32768 5000000
CRC: 165b4c91
Performance counter stats for 'system wide':
360942.638504 task-clock (msec) # 128.498 CPUs utilized
535 context-switches # 0.001 K/sec
12 cpu-migrations # 0.000 K/sec
287 page-faults # 0.001 K/sec
12,476,309,108 cycles # 0.035 GHz (66.67%)
17,688,340 stalled-cycles-frontend # 0.14% cycles idle
477,872,611 stalled-cycles-backend # 3.83% cycles idle
48,459,294,347 instructions # 3.88 insns per cycle
# 0.01 stalled cycles
per insn (66.69%)
1,371,856,316 branches # 3.801 M/sec (50.01%)
5,771,271 branch-misses # 0.42% of all branches
2.808943029 seconds time elapsed
Tested on (tulibee): P8 / LE DD2.1 Murano 32G RAM, 16 Cores.
RHEL7.2 LE
Signed-off-by: Rogerio Alves <[email protected]>
Included quickstart instruction for vec_crc32.c on README. Signed-off-by: Rogerio Alves <[email protected]>
Signed-off-by: Daniel Black <[email protected]>
This ensures that: defining __ASSEMBLY__ (gcc builtin) isn't needed for C implementation. MAX_SIZE is defined in both C and __ASSEMBLY__ generations Signed-off-by: Daniel Black <[email protected]>
Signed-off-by: Daniel Black <[email protected]>
…ures Signed-off-by: Daniel Black <[email protected]>
Signed-off-by: Daniel Black <[email protected]>
Signed-off-by: Daniel Black <[email protected]>
Signed-off-by: Daniel Black <[email protected]>
Signed-off-by: Daniel Black <[email protected]>
Signed-off-by: Daniel Black <[email protected]>
Signed-off-by: Daniel Black <[email protected]>
Signed-off-by: Daniel Black <[email protected]>
Add example crc32_two_implementations on how to use this. Signed-off-by: Daniel Black <[email protected]>
Signed-off-by: Daniel Black <[email protected]>
Contributor
Author
|
all part of #4 now. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added the ability to cross build the crc32_constants.h in 41d1bd2
Added a few other minor changes.