I have to small programs
First
// compile with -lpthread
// TEST:
// basename
#include <pthread.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <libgen.h>
#include <limits.h>
#include <inttypes.h>
// DATASET_LEN
#ifndef DATASET_LEN
#define DATASET_LEN 10000
#endif
// THREADS_NUM
#ifndef THREADS_NUM
#define THREADS_NUM 16
#endif
// need to call free(3) after
char** generateArray() {
char** dataset = (char**)malloc(sizeof(char*) * DATASET_LEN);
// fill dataset
for (size_t i = 0; i < DATASET_LEN; ++i) {
dataset[i] = (char*)malloc(sizeof(char) * CHAR_MAX);
sprintf(dataset[i], "%i/%i/", rand(), rand());
}
return dataset;
}
// pthread_create(3) callback
void* run(void* args) {
char** dataset = generateArray();
char* baseName;
for (size_t i = 0; i < DATASET_LEN; ++i) {
baseName = basename(dataset[i]);
printf("%s\n", baseName);
free(dataset[i]);
}
free(dataset);
}
// main
int main(int argc, char** argv) {
pthread_t* threads = (pthread_t*)malloc(sizeof(pthread_t) * THREADS_NUM);
// threads start
for (int i = 1; i <= THREADS_NUM; ++i) {
pthread_create(&threads[i-1], NULL, run, NULL);
fprintf(stderr, "Thread %u started\n", i);
}
// threads join
for (int i = 1; i <= THREADS_NUM; ++i) {
pthread_join(threads[i-1], NULL);
fprintf(stderr, "Thread %u finished\n", i);
}
free(threads);
return EXIT_SUCCESS;
}
Second:
// compile with -lpthread
// TEST:
// basename
#include <pthread.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <libgen.h>
#include <limits.h>
#include <inttypes.h>
#include <string>
// DATASET_LEN
#ifndef DATASET_LEN
#define DATASET_LEN 10000
#endif
// THREADS_NUM
#ifndef THREADS_NUM
#define THREADS_NUM 16
#endif
// need to call free(3) after
char** generateArray() {
char** dataset = (char**)malloc(sizeof(char*) * DATASET_LEN);
// fill dataset
for (size_t i = 0; i < DATASET_LEN; ++i) {
dataset[i] = (char*)malloc(sizeof(char) * CHAR_MAX);
sprintf(dataset[i], "%i/%i/", rand(), rand());
}
return dataset;
}
// pthread_create(3) callback
void* run(void* args) {
char** dataset = generateArray();
char* baseName;
std::string tmpStr;
for (size_t i = 0; i < DATASET_LEN; ++i) {
baseName = basename(dataset[i]);
tmpStr = std::string(baseName);
printf("%s\n", tmpStr.c_str());
free(dataset[i]);
}
free(dataset);
}
// main
int main(int argc, char** argv) {
pthread_t* threads = (pthread_t*)malloc(sizeof(pthread_t) * THREADS_NUM);
// threads start
for (int i = 1; i <= THREADS_NUM; ++i) {
pthread_create(&threads[i-1], NULL, run, NULL);
fprintf(stderr, "Thread %u started\n", i);
}
// threads join
for (int i = 1; i <= THREADS_NUM; ++i) {
pthread_join(threads[i-1], NULL);
fprintf(stderr, "Thread %u finished\n", i);
}
free(threads);
return EXIT_SUCCESS;
}
Both programs, works normal at linux, but on freebsd first(without std::string) not works
Can anybody explain why?
I see freebsd src at /usr/src/lib/libc/gen/basename.c and see a static var in function.
But because of it, with std::string program also must not works normal
By normal I mean, it output only numbers, and new lines
For tests I use:
./freebsd-threaded-basename | egrep -av '^[0-9\n\s]+$' | env LANG=c less
UPD I try to use strdup() or strcpy() result is the same – not normal
UPD *Every* time the version with std::string is run it works as expected
The reason for your programs to behave unpredictably is
basename, which is not thread-safe.basenameis a little bit outdated. Modern C++ applications tend to use other means for parsing file path. Boost Filesystem Library is popular and may be used to do it.If you insist on using
basename, place it to critical section together with some code which will getbasename‘s result (be itprintf, orstrcpy, or some other). This guarantees thatbasename‘s result is not accessed from several threads simultaneously. Which means correct behavior.Now some guesswork about “why”. (Only guesswork, because it’s not possible to predict, how exactly non-threadsafe multithreaded program works).
First version of your program performs
basenameloop partially in parallel (basenamefunction and the loop itself), partially sequentially (printfandfreeare thread-safe functions, their implementation is protected by critical sections).Second version adds
std::string, which means a lot more sequential code. It allocates memory for a new string, deallocates old memory (both these operations are thread-safe and protected by critical sections). Also (in some implementations) is uses atomic operations to update shared counter, and this decreases parallelism as well. All this actually transforms your program from parallel to fully sequential. All the threads are mostly waiting for some mutex. Or sometimes perform some complicated printf/memory/std::string computations. And very rarely one of the threads does relatively simplebasenamecomputations. Almost as if you added a critical section aroundbasename.Possibly, correct results for Linux tests are because
printfandfreeare enough to make the program almost sequential in this case. (Because something is done differently in Linux or because of different hardware).