Optimize kernel
#2
by
TaehyunKimMotif
- opened
No description provided.
forward result (before : w/ fused kernel before, after: w/ fused kernel after)
polynorm-perf:
dim | batch_size | seq_len | Before | After |
---|---|---|---|---|
2048 | 1 | 1024 | 108.096004 | 19.648001 |
2048 | 1 | 2048 | 201.056004 | 33.472002 |
2048 | 1 | 4096 | 385.055989 | 59.648000 |
2048 | 1 | 8192 | 742.079973 | 128.175996 |
2048 | 2 | 1024 | 200.736001 | 33.504002 |
2048 | 2 | 2048 | 384.959996 | 59.360001 |
2048 | 2 | 4096 | 741.952002 | 128.191993 |
2048 | 2 | 8192 | 1458.288014 | 231.168002 |
2048 | 4 | 1024 | 384.992003 | 59.455998 |
2048 | 4 | 2048 | 742.016017 | 128.224000 |
2048 | 4 | 4096 | 1458.335996 | 231.279999 |
2048 | 4 | 8192 | 2887.520075 | 469.232008 |
2048 | 8 | 1024 | 741.855979 | 128.431998 |
2048 | 8 | 2048 | 1457.744002 | 231.104001 |
2048 | 8 | 4096 | 2887.648106 | 469.247997 |
2048 | 8 | 8192 | 5744.448185 | 966.560006 |
4096 | 1 | 1024 | 211.648002 | 39.168000 |
4096 | 1 | 2048 | 395.904005 | 71.007997 |
4096 | 1 | 4096 | 757.503986 | 135.647997 |
4096 | 1 | 8192 | 1461.183965 | 264.640003 |
4096 | 2 | 1024 | 395.520002 | 70.975997 |
4096 | 2 | 2048 | 757.344007 | 135.647997 |
4096 | 2 | 4096 | 1461.407959 | 264.607996 |
4096 | 2 | 8192 | 2872.575998 | 523.199975 |
4096 | 4 | 1024 | 757.535994 | 135.327995 |
4096 | 4 | 2048 | 1461.632013 | 264.975995 |
4096 | 4 | 4096 | 2872.096062 | 523.519993 |
4096 | 4 | 8192 | 5689.792156 | 1042.639971 |
4096 | 8 | 1024 | 1461.840034 | 264.928013 |
4096 | 8 | 2048 | 2872.096062 | 523.455977 |
4096 | 8 | 4096 | 5689.536095 | 1042.528033 |
4096 | 8 | 8192 | 11328.287601 | 2081.568003 |
backward result (before : w/ fused kernel before, after: w/ fused kernel after)
polynorm-perf:
dim | batch_size | seq_len | Before | After |
---|---|---|---|---|
2048 | 1 | 1024 | 109.024003 | 52.255999 |
2048 | 1 | 2048 | 178.783998 | 76.191999 |
2048 | 1 | 4096 | 319.200009 | 119.231999 |
2048 | 1 | 8192 | 570.464015 | 207.424000 |
2048 | 2 | 1024 | 178.847998 | 76.143999 |
2048 | 2 | 2048 | 318.816006 | 119.231999 |
2048 | 2 | 4096 | 570.464015 | 207.424000 |
2048 | 2 | 8192 | 1099.695981 | 433.183998 |
2048 | 4 | 1024 | 317.856014 | 119.167998 |
2048 | 4 | 2048 | 568.831980 | 207.839996 |
2048 | 4 | 4096 | 1101.007998 | 432.976007 |
2048 | 4 | 8192 | 2172.784090 | 811.999977 |
2048 | 8 | 1024 | 569.887996 | 207.519993 |
2048 | 8 | 2048 | 1100.048006 | 432.767987 |
2048 | 8 | 4096 | 2174.511909 | 810.176015 |
2048 | 8 | 8192 | 4274.255991 | 1632.272005 |
4096 | 1 | 1024 | 203.408003 | 89.887999 |
4096 | 1 | 2048 | 361.359999 | 139.904007 |
4096 | 1 | 4096 | 665.983975 | 247.999996 |
4096 | 1 | 8192 | 1282.112002 | 463.007987 |
4096 | 2 | 1024 | 362.304002 | 139.936000 |
4096 | 2 | 2048 | 666.832000 | 248.032004 |
4096 | 2 | 4096 | 1282.112002 | 462.976009 |
4096 | 2 | 8192 | 2541.552067 | 888.351977 |
4096 | 4 | 1024 | 665.440023 | 248.032004 |
4096 | 4 | 2048 | 1282.271981 | 462.976009 |
4096 | 4 | 4096 | 2541.231990 | 888.000011 |
4096 | 4 | 8192 | 4977.663994 | 1746.320009 |
4096 | 8 | 1024 | 1281.631947 | 462.848008 |
4096 | 8 | 2048 | 2541.312099 | 888.383985 |
4096 | 8 | 4096 | 4976.480007 | 1746.352017 |
4096 | 8 | 8192 | 10092.736244 | 3456.832051 |
Could you add some information about pre-commit to the README? Other than that, LGTM!
TaehyunKimMotif
changed pull request status to
open
TaehyunKimMotif
changed pull request status to
merged