If you store 1,000 documents with the key "foo", then "foo" is stored 1,000 times in your data set.
This sounds like adding a schema/tables I'm not sure how you want to efficiently do this. If those 1k document is part of 1T documents? how do you efficiently find them?
it will be expensive to detect such document when there is billions of documents, and I doubt the savings in space will be worth it (we might get the same benefit just by compressing the documents in memory)