High-performance duplicate-removal function for nat-listps.
(nat-list-remove-duplicates x) is logically equal to:
(revappend (hons-remove-duplicates x) nil)
We leave this definition enabled, so typically you will want to just reason about hons-remove-duplicates and revappend.
In the execution, we first inspect the list and then heuristically decide how to carry out the computation.
When the list is very short, we use a naive, quadratic function in the spirit of remove-duplicates, which avoids the costs of allocating any seen tables and practically speaking seems to be the fastest way to go.
For longer lists, our usual approach is to implement a seen table as a bit-array, using a local stobj that is kept hidden from the caller. This approach performs very well for most lists that we encounter in practice, and its memory overhead is relatively low because, at least on CCL, bit arrays really do get implemented in an efficient way and, aside from some header information, only require one bit per entry.
Of course, a bit-array is not appropriate when the list contains elements that are particularly large numbers. The storage required for the bit array is just based upon the maximum element in the list. Here are some examples:
2^21: 256 KB 2^24: 2 MB 2^27: 16 MB 2^22: 512 KB 2^25: 4 MB 2^28: 32 MB 2^23: 1 MB 2^26: 8 MB 2^29: 64 MB
In some rare cases, it may actually be worth allocating hundreds of megabytes of memory to remove duplicates from a list. But most of the time, if a list contains numbers in the millions or more, an array-based approach probably isn't what we want to do, because the list is probably relatively sparse. Instead, we should just use a hash table.
We therefore adopt a heuristic for which algorithm to use. If the maximum element is under 2^17, then we always use the array since it requires us to allocate at most 16 KB. Since 2^17 = 131,072, this is really sufficient to address a lot of lists that actually arise in practice.
If the maximum is over 2^17, we: