MB-43967 is complete. The problem I saw on Thursday afternoon was a curl problem, not a code problem; the code was working fine. (The StackOverflow comment I was working from was adamant about how to correctly quote the If-None-Match header in curl so it would work, but those details were incorrect and quoting it as they stated made curl not send that header field at all.)
I have merged Part 3, which is the final part of the GSI implementation, to GSI's unstable branch. It now awaits just Jeelan Poola merging unstable to master.
Example of how to use in curl:
1. Request status with no ETag – will always return full status (host is an indexer node and port is GSI's "httpPort"):
% curl --dump-header curl.header.etag.out -X GET -u Administrator:password "http://localhost:9120/getIndexStatus"
|
|
{"code":"success","status":[{"defnId":948398447208412855,"instId":18064156543217770273,"name":"idx_3","bucket":"default","scope":"_default","collection":"_default","secExprs":["`col_3`"],"indexType":"plasma","status":"Ready","definition":"CREATE INDEX `idx_3` ON `default`(`col_3`) WITH { \"nodes\":[ \"127.0.0.1:9001\" ] }","hosts":["127.0.0.1:9001"],"completion":0,"progress":0,"scheduled":false,"partitioned":false,"numPartition":1,"partitionMap":{"127.0.0.1:9001":[0]},"numReplica":0,"indexName":"idx_3","replicaId":0,"stale":false,"lastScanTime":"NA"}]}
|
% cat curl.header.etag.out
|
HTTP/1.1 200 OK
|
Content-Type: application/json
|
ETag: 97b7a0cfd4ad04b9
|
Date: Mon, 15 Mar 2021 18:17:22 GMT
|
Content-Length: 559
|
2. Request status again, passing the returned ETag in the If-None-Match field (NOT the ETag field, because REST is by nature an ex post facto hack on top of HTTP and this is the officially designated header field for callers to pass in an ETag):
% curl --dump-header curl.header.etag.out -X GET -u Administrator:password "http://localhost:9120/getIndexStatus" --header "If-None-Match: 97b7a0cfd4ad04b9"
|
% cat curl.header.etag.out
|
HTTP/1.1 304 Not Modified
|
ETag: 97b7a0cfd4ad04b9
|
Date: Mon, 15 Mar 2021 18:17:53 GMT
|
In #2, the ETag was still fresh and the index metadata had not changed, so the GSI server responded with code 304 "Not Modified" and an empty body. This is why the curl call in #2 does not produce any output on the command line – all it returned was a header which was dumped to the file curl.header.etag.out. The header again contains the ETag.
The ETags are set to expire on average every 4 minutes (240 seconds). This is controlled by config setting indexing/secondary/common/config.go:
"indexer.settings.eTagPeriod": ConfigValue{
|
240,
|
"Average ETag expiration period in seconds",
|
240,
|
false, // mutable
|
false, // case-insensitive
|
If after expiry absolutely nothing changed (including the lastScanTime of all the indexes), then the new ETag will come out the same as the prior ETag (since it's a checksum) and the response will still be 304 Not Modified with the same ETag. If there are scans going on, however, the lastScanTime will change for any indexes being hit and the new ETag will be different from the old one, prompting a full 200 OK response and payload with the new status data, plus the new ETag in the header.
Passing no ETag or a 0 ETag (which is treated as invalid) will always force a full response to be returned.
The ETag in the header is a string hex representation of a uint64 in Go. Since it is a string in the headers the caller never needs to convert it, but only pass it back as the original string.
Details of ETag expiry times:
// generateETagExpiry returns the next expiration UnixNano time of ETags based on
|
// the current time. The expiry will be the next future rounded-to-S-seconds time
|
// that is at least S/2 seconds away, where S is specified by config variable
|
// indexer.settings.eTagPeriod (currently 240). The expiry will thus average S
|
// seconds in the future but can be anything between S/2 and 3S/2. (Most of the
|
// time it will be very close to S because ns_server calls getIndexStatus much more
|
// frequently, triggering new ETag creations soon after the prior expiry.) The rounding
|
// is done to try to keep expiry times aligned across all nodes (unfortunately
|
// jittered by any internode clock differences), so getIndexStatus for many nodes
|
// will likely have either all or none of its individual LocalIndexMetadata ETags
|
// unexpired, as if even one's ETag is expired we must send full results to caller.
|
So GSI ETags always expire on times that are divisible by 4 minutes (e.g. hh:mm:ss 10:00:00.000 10:04:00.000, 10:08:00.000, etc.), regardless of when they are generated. If the next available expiry boundary is less than 2 minutes in the future it will use the one following that instead. Thus an ETag generated at 10:01:59.999 will expire at 10:04:00.000 while one generated at 10:02.00.001 will expire at 10:08:00.000.
MB-43967that supports this from the GSI side is almost complete. It has three parts:Parts 1 and 2 are fully working and merged to GSI's unstable branch and Jeelan Poola is planning to merge them to master soon.
Part 3 is coded and in Gerrit but seems like it is not obeying ETag passed in from external caller, so may need an additional patch set before merging. I am out of the office until Mon 2021-03-15 and will look deeper into this then.
FYI Deepkaran Salooja