Skip to content

Conversation

@grooverdan
Copy link
Contributor

Added the ability to cross build the crc32_constants.h in 41d1bd2

Added a few other minor changes.

Rogerio Alves and others added 15 commits August 22, 2017 10:42
This commit implements CRC32 using power8 vector intrinsics
and gcc builtins instead pure assembly. The performance is
the same compared to .S version:

time ./vec_crc32_bench 32768 5000000
CRC: 165b4c91
real  0m2.799s
user  0m2.799s
sys   0m0.000s

time ./crc32_bench 32768 5000000
CRC: 165b4c91
real  0m2.803s
user  0m2.803s
sys   0m0.000s

Perf results:

perf stat -a ./vec_crc32_bench 32768 5000000
CRC: 165b4c91
Performance counter stats for 'system wide':
360774.660732   task-clock (msec)   #   128.683 CPUs utilized
529             context-switches    #   0.001 K/sec
8               cpu-migrations      #   0.000 K/sec
208             page-faults         #   0.001 K/sec
12,468,436,530  cycles              #   0.035 GHz (66.62%)
18,068,249      stalled-cycles-frontend #   0.14% cycles idle
466,739,548     stalled-cycles-backend  #   3.74% cycles idle
49,670,139,591  instructions        #   3.98  insns per cycle
                                    #   0.01  stalled cycles
                                    per insn  (66.82%)
1,370,729,619  branches             #   3.799 M/sec (50.09%)
5,759,980      branch-misses        #   0.42% of all branches

2.803581718 seconds time elapsed

perf stat -a ./crc32_bench 32768 5000000
CRC: 165b4c91
Performance counter stats for 'system wide':
360942.638504   task-clock (msec)   #   128.498 CPUs utilized
535             context-switches    #   0.001 K/sec
12              cpu-migrations      #   0.000 K/sec
287             page-faults         #   0.001 K/sec
12,476,309,108  cycles              #   0.035 GHz (66.67%)
17,688,340      stalled-cycles-frontend #   0.14% cycles idle
477,872,611     stalled-cycles-backend  #   3.83% cycles idle
48,459,294,347  instructions        #   3.88  insns per cycle
                                    #   0.01  stalled cycles
                                        per insn  (66.69%)
1,371,856,316   branches            #   3.801 M/sec (50.01%)
5,771,271       branch-misses       #   0.42% of all branches

2.808943029 seconds time elapsed

Tested on (tulibee): P8 / LE DD2.1 Murano 32G RAM, 16 Cores.
RHEL7.2 LE

Signed-off-by: Rogerio Alves <[email protected]>
Included quickstart instruction for vec_crc32.c on README.

Signed-off-by: Rogerio Alves <[email protected]>
This ensures that:

defining __ASSEMBLY__ (gcc builtin) isn't needed for C implementation.

MAX_SIZE is defined in both C and __ASSEMBLY__ generations

Signed-off-by: Daniel Black <[email protected]>
Add example crc32_two_implementations on how to use this.

Signed-off-by: Daniel Black <[email protected]>
@grooverdan
Copy link
Contributor Author

all part of #4 now.

@grooverdan grooverdan closed this Sep 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant