How can I speed up the algorithm for finding all primes?


I use the Eratosthenes sieve algorithm. A search in the range from 2 to 10000 takes 280 ms. I need to find all the primes in the range from 2 to the maximum value of the variable INT. Use multithreaded programming? Optimize the algorithm?

import java.util.LinkedList;
import java.util.concurrent.LinkedBlockingDeque;

public class Primes {
private long timeout = System.nanoTime();
LinkedList<Integer> lprimes = new LinkedList<Integer>();
LinkedList<Integer> lnums = new LinkedList<Integer>();
public long getTimeout() {
    return timeout;
}

public void setTimeout(long timeout) {
    this.timeout = timeout;
}

public int getCountPrimes(){
    return this.lprimes.size();
}
public void timeRun() {
    System.out.println(System.nanoTime());
    System.out.println(timeout);
    System.out.println("Время выполнения равно " + String.valueOf(System.nanoTime() - this.timeout)+" ns или " + String.valueOf((System.nanoTime() - this.timeout)/1000000));
}

public void keepPrimes() {
    lnums.add(2);
    for(int i = 3;i <= 10000;i += 2){
           lnums.add(i);
    }
    /*for (int i = 2; i <= 10000; i++) {
        lnums.add(i);
    }*/
    while(lnums.size()>0){
        int nextPrime = lnums.remove();
        for (int i = nextPrime*nextPrime; i <=10000; i+=nextPrime) {
            lnums.removeFirstOccurrence(i);
        }
        lprimes.add(nextPrime);
        System.out.println(nextPrime);
    }
}

public static void main(String[] args) {
    // TODO Auto-generated method stub
    System.out.println("Максимальное значение INT: "+Integer.MAX_VALUE);
    Primes primes = new Primes();
    primes.keepPrimes();
    System.out.println("Формирование коллекции закончено!");
    primes.timeRun();
    System.out.println(primes.getCountPrimes());
}

}

Author: Harry, 2016-09-18

3 answers

Alas, I'm not good at Java, except to read someone else's code...

Yes, there is an algorithm for the Eratosthenes sieve O(n), but here the gain compared to a good implementation - O(n lg lg n) - is very small.

But optimizing your code is not something that can be done - it is necessary! For some reason, you collect all the odd numbers in a linked list, and then check and remove them from it. What for? Just go through all the odd numbers in the list. And go not to the last number (10000), but to the square root! This enough. Look, any composite number greater than the root must have a divisor less than the root - which means it will already be crossed out. All checks for numbers from 100 to 10000 are simply not necessary...

And also a very big minus - you use a linked list with constant dynamic memory allocation, indirect accesses, long access to elements, etc. - which also does not allow you to use the data cache-the solution, as for me, is not good. If it was C++, I would use compact vector<bool> by reserving the required amount of memory in advance.

P.S. In C++ I sketched - on my home page it calculated up to 10000 in 16 microseconds, up to 100000000-in 557 ms (without output to the screen).

 8
Author: Harry, 2016-09-18 15:58:59

Output of primes from 2 to 2* * 31-1 on C

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#define lim INT32_MAX
#define BITS 32
int main()
{
    uint32_t *x= calloc(sizeof*x, ((unsigned)lim-3+BITS*2-1)/(BITS*2));
    fputs("2\n", stdout);
    for(unsigned i=3;i<=lim;i+=2) {
        unsigned b= i-3 >> 1;
        if(x[b/BITS] & 1<<b%BITS) continue;
        printf("%d\n", i);
        while( (b+=i) <= (lim-3>>1) ) x[b/BITS] |= 1<<b%BITS;
    }
}

Counting time (Celeron T3100 @ 1.90 GHz) 1.5 minutes, memory approximately 128MB. Without information output 58c.

105097565 numbers found

Improved algorithm: on the advice of @KoVadim, search only among numbers of the form 6n±1.

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#define lim INT32_MAX
#define BITS 32
int main()
{
    uint32_t *x= calloc(sizeof*x, ((unsigned)lim-5+BITS*3-3)/(BITS*3));
    fputs("2\n3\n", stdout);
    for(unsigned i=5, b0=0; i<=lim; i+=6, b0++) {
        unsigned b=b0;
        if(!( x[b/(BITS/2)] & 1<<b%(BITS/2)*2 )) {
            printf("%d\n", i);
            while( (b+=i) <= (lim-5)/6 ) x[b/(BITS/2)] |= 1<<b%(BITS/2)*2;
            for(b=b0+2*i/3; b <= (lim-5)/6; b+=i) x[b/(BITS/2)] |= 2<<b%(BITS/2)*2;
        }
        b=b0;
        if(!( x[b/(BITS/2)] & 2<<b%(BITS/2)*2 )) {
            printf("%d\n", i+2);
            while( (b+=i+2) <= (lim-5)/6 ) x[b/(BITS/2)] |= 2<<b%(BITS/2)*2;
            for(b=b0+2*i/3+2; b <= (lim-5)/6; b+=i+2) x[b/(BITS/2)] |= 1<<b%(BITS/2)*2;
        }
    }
}

Counting time is 73 seconds, memory is 86MB. Without information output 43c.

Both programs produce the same result in the form of text that is slightly larger than 1GB. Storing numbers as a bitmap sets turn out to be more compact than an array of 105 million numbers. Maybe it makes sense to use it in this form and continue to use it.

 4
Author: sercxjo, 2016-09-18 21:19:40

Interested in the answer sercxjo. I played around a bit. On my machine, its code, compiled by VC++ 2015, runs (without output) 12 s. There's also my option

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

constexpr inline unsigned long pow2(unsigned long i) { return 1 << i; }

unsigned long isqrt(unsigned long a)
{
    unsigned long x = a;
    for(unsigned long z = 0; x != z; )
    {
        z = x;
        x = (x + a/x)/2;
    }
    return x;
}

constexpr unsigned long MAX_LIM = pow2(31) - 1;
constexpr unsigned long ARR_LIM = (MAX_LIM >> 6) + 1;
const     unsigned long SQR_LIM = isqrt(MAX_LIM);;

unsigned long primes[ARR_LIM] = { 0 }; // 0 - простое, 1 - составное

auto set_primes = [](unsigned long idx)
    { primes[idx >> 6] |= pow2((idx&0x0000003F)>>1); };
auto get_primes = [](unsigned long idx)
    { return primes[idx >> 6] &  pow2((idx&0x0000003F)>>1); };

int main(int argc, const char * argv[])
{
    for(unsigned long i = 3; i <= SQR_LIM; i += 2)
    {
        if (get_primes(i)) continue;
        for(unsigned long j = i * i; j <= MAX_LIM; j += 2*i)
        {
            set_primes(j);
        }
    }

    puts("2");
    for(unsigned long i = 3; i <= MAX_LIM; i+=2)
    {
        if (get_primes(i)) continue;
        printf("%lu\n",i);
    }
}

Counts (again, without output) 7.82 s. Results match :)

It is interesting that if we iterate over the simple types enter a description of the image here, then there is even a loss in time...

Lambdas were used deliberately - the compiler optimizes them very well.

The next number should be sifting through the sieve of Eratosthenes using templates at compile time :) Who's ready to take it?

 1
Author: Harry, 2017-04-13 12:53:29