Details
Description
My unofficial Couchbase Lite performance test shows our iOS 1.3 candidate being only 60% of the speed of 1.2.1 when using ForestDB storage. (On my iPhone 6 the test time went from 16 sec to 27 sec.) SQLite performance is unaffected (still 21 sec). This means that ForestDB storage is now slower than SQLite
I ran the test in Appleās Instruments tool and found that 30% of the total run time of the test is being spent in `malloc`, `free` and `memcpy` calls made by a handful of ForestDB functions:
10.3% `free` calls in _hbtrie_find
5.7% `malloc` calls in _hbtrie_find
4.6% `memcpy` calls in _docio_read_doc_component
3.8% `free` calls in _hbtrie_insert
2.9% `memcpy` calls in _hbtrie_reform_key
2.5% `malloc` calls in _hbtrie_insert
____
29.8%
memcpy has always been the single hottest function when running this benchmark, but the malloc/free overhead is new. It seems to come from these lines found in both _hbtrie_find and _hbtrie_insert:
uint8_t *docrawkey = (uint8_t *) malloc(HBTRIE_MAX_KEYLEN);
uint8_t *dockey = (uint8_t *) malloc(HBTRIE_MAX_KEYLEN);
...
free(docrawkey);
free(dockey);
The value of HBTRIE_MAX_KEYLEN is 65536. It appears that on iOS (and macOS?) requests this large go through a slower code path in malloc/free, which uses a Mach system call (mach_vm_map, mach_vm_deallocate) to directly map the address space from VM. Regardless, it would be good to avoid any memory allocation in a hot code path like this.
Would it be possible to use a per-handle or per-thread buffer instead? Or at least to allocate only as much memory as needed for the key being read?