20030310, 09:46  #1 
Mar 2003
121_{8} Posts 
Other mersenne progs, why so slow?
I'm not a Gimps member, because the
probability to be a winner is zero, zero, module zero and and zero periodically... I have tried other progs available in Internet, but they as slow as continent drift. When I wrote my own, but it still 10 or 100 times slower. I don't know why and what is the secret of Prime95? It seems the 39th number was discovered without any SSE for only 1.5 month. But my prog requier 41 year to check it!!! I use two FFT for squaring, and modular reduction which worth nothing for 11111111...b. 
20030310, 18:00  #2 
"Richard B. Woods"
Aug 2002
Wisconsin USA
1111000001100_{2} Posts 
The "secret" of Prime95 is twofold  hardware and software.
Intel Pentium CPUs execute floatingpoint (FP) instructions more rapidly and with higher precision than most other manufacturers' CPUs. They can pipeline as fast as one FP instruction completion per clock cycle, whereas many others can execute FP no faster than two, three, or more cycles per instruction completed. In addition, Intel FP units have 80bit internal registers, longer than the 64bit registers common on other brands. With proper programming to keep computations inside those internal registers as long as possible before the eventual conversion to a 64bit external result, Intel CPUs can produce 16 more bits of numerical result per FP instruction than other brands. George Woltman has intensively studied the details of Intel CPUs and worked for years to maximize the throughput of Prime95 software, programming the most speedcritical modules in assembler language in order to carefully interleave machine instructions so as to keep the various parts of the CPU as busy as possible churning out results with minimum wasted time. (I'm quite envious of George's work because I've done similar software optimization for speed, in the past, on older computers, but by the time I discovered what George was up to with Prime95, he was waaay ahead of where I'd have started. :) ) 
20030311, 16:42  #3  
∂^{2}ω=0
Sep 2002
República de California
2^{2}·3·7·139 Posts 
Quote:
In fact George's code is not better on a percycle basis than other topquality LL test codes (Guillermo Valor's Glucas and my Mlucas)  on an Alpha ev6 I get even better percycle performance than George's code does on the P4  but P4's are way cheaper and more numerous, and kudos to George for getting such good performance out of a chip with smaller caches and lower bandwidth to main memory. Also, the speed demons at Intel have been able to ratchet up chip speeds much faster than the RISC manufacturers. Quote:
Quote:
Quote:


20030311, 17:59  #4  
Oct 2002
43 Posts 
Quote:


20030311, 18:26  #5  
P90 years forever!
Aug 2002
Yeehaw, FL
5×29×53 Posts 
Quote:


20030311, 23:35  #6  
Oct 2002
43 Posts 
Quote:
The critical point is that complex multiplication and addition introduce up to sqrt(5)+1 ulp of error, while loading and storing introduce only 1 ulp. (Unavoidable inaccuracies in the trig data contributes the remaining sqrt(1/2) ulp.) 

20030311, 23:41  #7 
Oct 2002
43 Posts 
(If anyone hasn't already read it, my paper on FFT error bounds has *finally* been published in Math. Comp.)

20030312, 00:42  #8  
P90 years forever!
Aug 2002
Yeehaw, FL
5·29·53 Posts 
Quote:
When using 64bit FP, each radix4 "unit" does a complex multiply (4 muls, 2 adds) and two complex adds. That is 2 muls and 3 adds for each data value. In the 1M FFT squaring that means there are about 105 opportunities for error to accumulate. Are you saying this 5 to 1 ratio should result in an extra 1.5 bits of precision? Does that match the 1% to 2% larger exponents the x86 FFTs support? 

20030312, 00:54  #9 
Oct 2002
101011_{2} Posts 
It's somewhat more complicated than that; the (not perfectly accurate) trig data contributes some error, and the error introduced in complex multiplication can be bounded somewhat more closely than a naive operation count would indicate.
But to answer your second question, in the 20M range you're packing about 20 bits per double, so reducing the error by one bit would mean a change of 2.5% in the size of numbers for which you can use a given FFT length. That said, all my experience is with worstcase bounds; in practice the errors are much smaller; indeed, I could construct values for which the prime95 multiplication would fail  but it doesn't matter since doublechecking would find those anyway. 
20030312, 01:55  #10  
∂^{2}ω=0
Sep 2002
República de California
11676_{10} Posts 
Quote:
Now that you mention it, I see that Crandall, Papadopoulos' and my paper on F24 has also appeared, at long last  just over 3 years since we actually submitted the initial draft: http://www.ams.org/journalgetitem?pii=S0025571802014795 The LaTeX manuscript can be gotten for free from http://hogranch.com/mayer/F24.rev3.tex Math. Comp. is very slow, and (at least based on our experience, where one reviewer didn't do anything with the manuscript for over a year, without MC doing anything to remind them of their responsibility as a reviewer) has poor timelinessofreview controls. But grousing about tardiness aside, that paper also has some words about errors in floatingpoint FFTs, with an approach that is more heuristic in nature and more geared to mostlikely error behavior than Colin's. The error estimates thus derived seem to match realworld experience quite closely. 

20030312, 02:05  #11 
Oct 2002
43 Posts 
http://www.ams.org/journalgetitem?pii=S0025571802014199
Or, for those of you without a subscription, a preprint is at http://www.sfu.ca/~cperciva/mpaper.ps I refer to your F24 paper in a section discussing previous error bound work  Crandall sent me a copy a couple years ago. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
why is http://www.mersenne.org/ so slow  Unregistered  Information & Answers  19  20120417 03:12 
Slow Down?  R.D. Silverman  GMPECM  55  20111016 17:28 
How hot is too hot? Slow is too slow?  petrw1  Hardware  13  20081110 23:25 
Slow computer  Housemouse  Hardware  7  20080215 18:18 
Really slow machines?  Doorbasher  Hardware  5  20040823 22:18 