[SOLVED] An efficient algorithm to sample non-duplicate random elements from an array

Issue

This Content is from Stack Overflow. Question asked by Kaiyakha

I’m looking for an algorithm to pick M random elements from a given array. The prerequisites are:

  • the sampled elements must be unique,
  • the array to sample from may contain duplicates.

This is what I’ve managed to come up with. Here I’m also making an assumption that the amount of unique elements in the array is greater (or equal) than M.

#include <random>
#include <vector>
#include <algorithm>
#include <iostream>


const std::vector<int> sample(const std::vector<int>& input, size_t n) {
    std::random_device rd;
    std::mt19937 engine(rd());
    std::uniform_int_distribution<int> dist(0, input.size() - 1);
    
    std::vector<int> result;
    result.reserve(n);

    size_t id;
    do {
        id = dist(engine);
        if (std::find(result.begin(), result.end(), input[id]) == result.end())
            result.push_back(input[id]);
    } while (result.size() < n);

    return result;
}


int main() {
    std::vector<int> input{0, 0, 1, 1, 2, 2, 3, 3, 4, 4};

    std::vector<int> result = sample(input, 3);

    for (const auto& item : result)
        std::cout << item << ' ';
    std::cout << std::endl;
}

This algorithm does not seem to be the best. Is there a more efficient algorithm to solve this task? It would be good if this algorithm could also assert the amount of unique elements in the input array is not less than M.



Solution

The comments already note the use of std::set. The additional request to check for M unique elements in the input make that a bit more complicated. Here’s an alternative implementation:

  1. Put all inputs in a std::set or std::unordered_set. This removes duplicates.
  2. Copy all elements to the return vector
  3. If that has more than M elements, std::shuffle it and resize it to M elements.
  4. Return it.


This Question was asked in StackOverflow by Kaiyakha and Answered by MSalters It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?