ToDo's (see also TODO.sp): Table of contents: 1) efficiency/memory 2) interface 3) documentation 4) installation 5) bugs 6) others 1) efficiency/memory - use a random sigma value of 64 bits by default - try the mpn/generic/{sb,dc,mu}_bdiv_qr.c functions in GMP >= 4.3.0 for REDC - the conversion from NTT primes to mpz_t in function mpzspv_to_mpzv() (file mpzspv.c) is quadratic. A faster conversion is possible with a product tree (already done for the mpz_t -> NTT conversion). - even worse, mpzspm_init seems to be cubic in the input size (because the CRT algorithm used is quadratic in sp_num). We should use a subquadratic CRT. - the "Reducing G * H" step is faster in NTT than with KS. This is probably due to the fact that some transforms are cached in the NTT mode. - the "Reducing G * H" step can be improved as follows: first compute D = GH*I mod (x^d+1) where d = deg(F), and I = 1/F mod (x^d+1); then compute E = D*F mod (x^d-1); finally compute T = (GH-E)/2 mod (x^d+1). T equals the Montgomery product GH/(x^d+1) mod F. See the paper "Fast convolution meets Montgomery" by Preda Mihailescu (Mathematics of Computation). - slowdown in stage 1 with REDC between a 58672-digit number and a 58688-digit number [reported by Christophe.CLAVIER@gemalto.com, 29 Aug 2007] (((2003663613*2^195000-2)/(2*23*173*3863))/1954173900202379)/3612632846010637 ((2003663613*2^195000-2)/(2*23*173*3863))/1954173900202379 with B1=1000 on an Opteron (44.2s for c58672, 67.5s for c58688). The culprit seems to be the REDC routine in mpmod.c: indeed, in case the modulus has n limbs, but the most significant one has only a few bits, the product (called x in REDC) has only 2n-1 limbs, and we never call Mulders's short product in ecm_redc_n (however the else-code using full products seem faster in that case). For c58672, if one replaces if (xn == 2 * n) in mpmod.c/REDC by if (xn >= 2 * n - 1), the time of stage 1 grows from 44s to 64s, whereas ecm_redc_n should be faster... This problem is still present in 6.2, ecm_redc_n should be better tuned, in particular the choice k=0.75*n in ecm_mul_lo_n() is far from optimal. - in Brent-Suyama's extension, the evaluation of a polynomial of degree k over N consecutive values is currently done using a O(k N) algorithm [table of differences]. One can do O(N/k M(k)), cf Section 3 from "Linear recurrences with polynomial coefficients and application to integer factorization and Cartier-Manin operator", by Alin Bostan, Pierrick Gaudry and Eric Schost, SIAM journal on computing, vol. 36, no. 6, pp. 1777 - 1806, 2007. It is not clear if this result also applies to ECM, but at least it should word for P-1 and P+1. - why restrict the use of mpn_mul_fft to Fermat numbers? We could use it for any cofactor of 2^(n*BITS_PER_MP_LIMB)+1, as long as mpn_fft_next_size (n, mpn_fft_best_k (n, S1 == S2)) == n. - use mpres in step 2 (Target: 7.0) - write a mpn version of add3 and duplicate - rewrite entire mpmod.c to be based on mpn_* functions, not mpz_* - take relative speed of multiplying/squaring into account in PRAC (DN: couldn't get any significant speed increase) - use/implement a mpn_mul_hi_n routine for use in mpn_REDC - use mpn_addmul_2, mpn_addmul_4 in the basecase REDC [for machines where it exists]. ASM code should perhaps be moved into GMP. - try McLaughlin's algorithm for Montgomery's modular multiplication (http://www.ams.org/mcom/0000-000-00/S0025-5718-03-01543-6/home.html) - consider Colin Percival's generalized DWT for multiplication modulo k*a^n+b, where k*a*b is highly composite. May belong to GMP rather than GMP-ECM. - implement assembly code (redc.asm) for other architectures - allow composite d2, or better use the S1+S2 idea from the P+-1 algorithm of Montgomery and Kruppa. - init mpz_t's with correct amount of memory allocated to avoid reallocs. Check for reallocs with GMP's memory interface routines. (Partly done.) - try sliding window multiplication for ECM stage 1 (Target: 7.0) - choose Brent/Suyama polynomial according to B2/k and not B2! - Adjust estimated memory to take into account -treefile and NTT (done but improvement possible) - when GWNUM is used, lower the default B2 (James Wanless, 17 Mar 2006, james at grok.ltd.uk) - implement enhanced standard continuation? With graph cover algorithm? - parallel/distributed stage 2? - add curve selection for torsion group of order 8 or 16, see Montgomery's thesis (request of Peter-Lawrence Montgomery) - Torbj"orn Granlund suggested faster code for mpn_mod_1(), used extensively in NTT. See https://sympa.inria.fr/sympa/arc/ecm-discuss/2008-05/msg00000.html - try to implement the algorithms from https://eprint.iacr.org/2024/1044.pdf (if applicable) to replace PRAC 2) interface - from Mark Rodenkirch 08 April 2011: print messages like "Step 1: 1500000/100000000" with a command-line option (or with -v) https://sympa.inria.fr/sympa/arc/ecm-discuss/2011-04/msg00000.html - with -resume, print %time for THIS RUN instead of total run? [suggested by SleepHound ] Add CPUTIME=... in the save file, to take into account the total cpu time spend so far (in seconds). George Woltman agrees for that change. It won't hurt prime95/mprime -> will be added for his next version. - when resuming, print the *initial* x0 for P-1/P+1? - [from Jakub Pawlewicz ] add an option -stage1time t to tell the step 1 time, when done by another program. PZ: or better have it in resume file? (Target: 6.1. Command line option done) 3) documentation 4) installation - check for __builtin_constant_p and __builtin_expect at configure time - [suggested by Peter Montgomery] add the possibility to compile a "fat" binary, which automatically selects the best mulredc assembly code depending on the cpuid [see TODO.fat] - [suggested by Thomas Kunz, who did port GMP-ECM to the PS3, i.e., to the Cell architecture]: several changes to make it easier to port GMP-ECM to specific architectures. Cf TODO.kunz. 5) bugs 6) others - add point counting algorithm? SEA implementation exists for Pari/GP, use that? - let user specify previous factoring work, compute distribution of candidate factors, compute probability of/est. time to finding a factor with given parameters. - re-write in C++? Lots of work, but would make parts of the code much cleaner.