Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 3.0.6
Component/s: library
Labels:
- technical-debt

Story Points:
1

Description

One of the c++ transaction beta customers was doing some benchmarking, and ran into some issues. When exploring their issues, I did some things like create 100 threads, each of which creates a cluster, ultimately calling lcb_connect on their own instances. I found that I'd sometimes get an EXC_ARITHMETIC (code=EXC_I386_DIV, subcode=0x0) and crash.

Looking at the stack trace, it seems possible that 2 or more threads will call gethrtime simultaneously. Ultimately, it seems that one thread can call mach_timebase_info(&tmbi) while another one is using that (as they both raced on start, both seeing 0). My guess is that call results in the tmbi.denom briefly being 0 (before it puts a 1 in there), and that's the divide-by-zero we see.

The stack trace looks like:

Process 44694 stopped

* thread #2, stop reason = EXC_ARITHMETIC (code=EXC_I386_DIV, subcode=0x0)    frame #0: 0x0000000100412cf5 libtransactions_cxx.1.dylib`gethrtime at gethrtime.c:58:40

   55      }

   57      now = mach_absolute_time();

-> 58      return ((now - start) * tmbi.numer / tmbi.denom) + CLOCK_START_OFFSET;

   60   #elif defined(HAVE_CLOCK_GETTIME)

   61      struct timespec tm;'

Target 0: (bench2) stopped.

(lldb) bt

* thread #2, stop reason = EXC_ARITHMETIC (code=EXC_I386_DIV, subcode=0x0)

* frame #0: 0x0000000100412cf5 libtransactions_cxx.1.dylib`gethrtime at gethrtime.c:58:40

 frame #1: 0x0000000100455a6c libtransactions_cxx.1.dylib`lcb::Bootstrap::bootstrap(this=0x0000000104605f50, options=2) at bootstrap.cc:224:20

frame #2: 0x00000001004a1583 libtransactions_cxx.1.dylib`lcb_st::bootstrap(this=0x0000000101c046a0, options=2) at internal.h:177:26

frame #3: 0x0000000100446c5a libtransactions_cxx.1.dylib`::lcb_connect(instance=0x0000000101c046a0) at instance.cc:740:22

frame #4: 0x00000001003c9436 libtransactions_cxx.1.dylib`couchbase::cluster::connect(this=0x000070000e8c8e70) at cluster.cxx:89:10

frame #5: 0x00000001003c8f33 libtransactions_cxx.1.dylib`couchbase::cluster::cluster(this=0x000070000e8c8e70, cluster_address=<unavailable>, user_name="", password="") at cluster.cxx:21:5    frame #6: 0x00000001003c9725 libtransactions_cxx.1.dylib`couchbase::cluster::cluster(this=0x000070000e8c8e70, cluster_address=<unavailable>, user_name=<unavailable>, password=<unavailable>) at cluster.cxx:19:1

frame #7: 0x0000000100029577 bench2`main::$_0::operator(this=0x0000000101e041a8)() const at bench2.cxx:72:25

frame #8: 0x00000001000294ad bench2`decltype(__f=0x0000000101e041a8)()) std::__1::__invoke<main::$_0>(main::$_0&&) at type_traits:4361:1

The code looks like:

 36 hrtime_t gethrtime(void)

 37 {

 38 #ifdef __APPLE__

 40     /* Most things expect a pretty large timestamp - even though a smaller one

 41      * may be perfectly valid. Initialize the default with an offset of one day,

 42      * in nanoseconds

 43      */

 45     /* Use the various mach stuff:

 46      * https://developer.apple.com/library/mac/qa/qa1398/_index.html */

 48     static uint64_t start = 0;

 49     uint64_t now;

 50     static mach_timebase_info_data_t tmbi;

 52     if (start == 0) {

 53         start = mach_absolute_time();

 54         mach_timebase_info(&tmbi);

 55     }

 57     now = mach_absolute_time();

 58     return ((now - start) * tmbi.numer / tmbi.denom) + CLOCK_START_OFFSET;

 60 #elif defined(HAVE_CLOCK_GETTIME)

which is just the apple part. There are statics in the other #defines below this, for any OS that has no gethrtime builtin (presumably that is reentrant). Could be the HAVE_QUERYPERFORMANCE_COUNTER section has this issue as well, on the LARGE_INTEGER pf as I'm just a bit suspicious of QueryPerformanceFrequency.

Since this is C, there is a compare-and-swap available for all the platforms, but platform dependent. Doing a compare-and-swap on start seems sufficient. Maybe there is a better way? Maybe LCB already has a compare-and-swap implemented for various platforms, so we don't have to make that?

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: David Kelly (Inactive)

Reporter:: David Kelly (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 06/Nov/20 8:27 AM

Updated:: 04/Oct/21 8:53 AM

Gerrit Reviews

There are no open Gerrit changes

gethrtime appears to have a race.

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty