MEGA is a bit of an interesting project, as it is right on the line between running in L3 cache and hitting ram. At 2MB/unit, this means it will squeeze in the 8MB cache of an i7, but not quite into the 6MB of an i5. Does it work out that way?
My systems are a bit of a mix as follows:
i7-6700k @ 4.2 GHz, 3200 dual rank ram
i7-6700k @ 4.0 GHz, 3000 dual rank ram
i5-6600k @ 3.6 GHz, 3000 single rank ram
i5-5675C @ 3.5 GHz, ram doesn't matter
All are dual channel ram.
After a recent post saying the next mega prime will be PrimeGrid's 100, I thought I'd have a go and see if that could be me finding it. I don't do GFN so MEGA is the obvious choice. In my haste to get started, there were some configuration mistakes. Both i7s had HT enabled, and I didn't find out until later that the slower one was running 5 units at a time, not 4. The faster one I knew was set to 4 and I also set affinity to prevent the performance loss.
Case 1: i7-6700k running 4 tasks, HT on, using affinity, vs HT off. Result: no significant difference, with average differing by less than 0.2%.
Case 2: i7-6700k running 5 tasks, HT on, no affinity, vs HT off. Result: running 5 tasks with HT on was 7.5% slower in total throughput. This roughly matches what I've seen before, with about 10% slowness if running with HT, one task per core, without affinity. I believe running the 5th unit negates some of that disadvantage.
Case 3: Comparing the two i7-6700k systems with their different CPU and ram clocks. I normalised the CPU clock so the question then is, does the slightly higher ram to CPU ratio of the 1st system provide any benefit? That depends if you think a 0.5% improvement is significant or not. Remember these tasks are right on the limit, arguably over the limit since the task size doesn't allow for code or other possible overheads. So a small contribution from ram performance isn't unexpected, small being the keyword here.
Case 4: Comparing i5-6600k to the faster i7-6700k. Again, I CPU clock normalised, and this should work somewhat in favour of the i5, although it may be still hindered from the single rank ram. With CPU clock normalisation, the i5 was still 12% slower than the i7. This is the difference between staying in L3 cache, and having to hit ram. Note the difference would be expected to vary according to the ram speed, so it could be higher or lower than this.
Case 5: Comparing the i5-5675C to the faster i7-6700k. This is possibly a more extreme case of #4, as this Broadwell i5 only have 4MB L3 cache, but makes up for it with a ton of L4 cache. The L4 cache bandwidth would be comparable to 3200 dual channel ram, hence me saying ram doesn't matter in this case. How did it perform? Clock normalised, 19% slower than the 6700k! There are multiple things happening here. The small L3 cache pushes it into L4 cache, which is faster than in the 6600k's ram. Past testing showed ~20% drop in IPC relative to Skylake, so being 19% slower is in line with that, assuming it means the L4 is as good as the L3. I'm not sure that's the case, but the lower CPU performance overall would also lower demands on the memory system.