So my last article on prime numbers led me to ask the question "How many prime numbers are there between 2 and 232". I know the quick answer to this question is "a lot". 232 is over four billion—and that's a fair bit of space to check. It would be easy to make a for-loop and brute force the entire number range, but this is the age of multi-core CPUs—and single threaded is so 2000. Besides, my computer hasn't broken a sweat in awhile.
I decided to do a straight C implementation. I've been doing C and C++ at work, and C++ has been irritating me lately because I tried to do a "quick and dirty" implementation without too much thought into the classes I'd need. That's a recipe for disaster, and how you end up with things like microsoft's windows ME. So vanilla C it was.
Threads and semaphores I used the POSIX standards, and I've tried to demonstrate how to make a dispatcher foreground task with one or more worker threads. The key to this is the dispatch semaphore. Normally, semaphores are used to for mutually exclusive lock, or to send signals between threads. However, they can also be used as a count. Think of them like a library that has a set number (say 5) of books. That means 5 people can check that book out, but if a 6th person wants to check that book out, they will have to wait for one of the first 5 to return their copy. The semaphore does this by keeping a count. When that count reaches zero, the next task that tries to pend the semaphore will have to wait. So the dispatcher is a loop that starts by pending the semaphore. When it gets the semaphore, it will create a task. The task will do some work, then post the semaphore. The semaphore count determines how many tasks can run at the same time. If we limit the number of tasks that can run at once to the number of CPU cores, then we can utilize the CPU to 100%.
First, the dispatcher skeleton code:
sem_init( &Semaphore, NUMBER_OF_THREADS, NUMBER_OF_THREADS );
unsigned ThreadIndex = 0;
while ( NumbersLeft )
{
// Wait for a free worker thread.
sem_wait( &Semaphore );
// Create a worker thread to check this number set.
pthread_create
(
&Threads[ ThreadIndex ],
NULL,
PrimeThread,
(void *)&Data[ ThreadIndex ]
);
++ThreadIndex;
if ( ThreadIndex >= NUMBER_OF_THREADS )
ThreadIndex = 0;
}
Then, the skeleton of the worker thread:
{
//
// Do prime check.
//
// This thread is now complete. Release one count from the dispatch
// semaphore.
sem_post( &Semaphore );
// End this thread.
pthread_exit( 0 );
return 0;
}
Now just launching a thread to check one number is a bit of a waste—it takes some overhead to start and stop the task. So we give each thread some range of numbers to check. A pthread can be passed parameters, and we pass the starting number, along with the amount of numbers to check. The parameter block is also used for storage of the results. So here is the full worker thread:
{
// Get the work data passed to the thread.
WORK_TYPE * Data = (WORK_TYPE *)ArgumentPointer;
uint32_t Number = Data->StartNumber;
uint32_t Count = 0;
unsigned Index;
// For all the numbers to check...
for ( Index = 0; Index < Data->NumberToCheck; ++Index )
{
// Is this number a prime?
if ( IsPrime( Number ) )
// Then count it.
++Count;
// Next number.
++Number;
}
// Save results.
Data->NumberFound = Count;
// This thread is now complete. Release one count from the dispatch
// semaphore.
sem_post( &Semaphore );
// End this thread.
pthread_exit( 0 );
// Never reached--here for language consistency.
return 0;
} // PrimeThread
The dispatcher has to accumulate the results. Our could try using a global variable, but you can run into issues with non-atomic access. Adding 1 might seem a basic operation, but (at least in a RISC system) it takes 3 instructions to add one to a variable in memory: load the data to a register, add the register by 1, and store the register. You can not (without a semaphore) guarantee that those instructions will not be interrupted by an other thread. So, the accumulation happens in just one place—the dispatch loop.
The dispatcher will assign all the work, but it is finished once all the work has been assigned. We still need to wait for the work to finish. So the last part of the process looks like this:
{
if ( Data[ ThreadIndex ].NumberToCheck )
{
// Wait for thread to finish.
pthread_join( Threads[ ThreadIndex ], NULL );
// Accumulate the number of primes found in this thread.
NumberOfPrimes += Data[ ThreadIndex ].NumberFound;
}
}
// Let go of dispatch semaphore.
sem_destroy( &Semaphore );
Now we can print the results to the screen, and we're done.
I knew this code would tax the CPU rather hard, so I wanted to measure how long the process would take. "time.h" is a C standard unit. Unfortunately, "time_spec", which has a high-resolution method for checking the time, it's part of the C99 standard. But we can get time in the resolution of seconds, and that's good enough. To do this, we simply mark the time when the process begins, and mark it again at the end. The difference (for which there is a function to compute) is the total number of seconds that has elapsed.
The answer: There are 203,280,221 primes between 2 and 4,294,967,295, and it took 6,054 seconds (1 hour, 41 minutes) on a quad-core clocked at 2.5 GHz.
2 comments have been made.
From Antony (http://rapid4me.com)
LA
February 17, 2010 at 3:10 AM
From Neso Web Development (http://neso.io)
Thames Valley
September 20, 2018 at 12:11 PM
Great Reading!