Wednesday, September 14, 2016

Find Duplicate in Array

Given a read only array of n + 1 integers between 1 and n, find one number that repeats in linear time using less than O(n) space and traversing the stream sequentially O(1) times.
Sample Input:
[3 4 1 4 1]
Sample Output:
1
If there are multiple possible answers ( like in the sample case above ), output any one.
If there is no duplicate, output -1

Solution:

Had it not been a read-only array, we could have solved it in-place using the algorithm here.
http://www.geeksforgeeks.org/find-duplicates-in-on-time-and-constant-extra-space/

Since it is a read-only array, we cannot make changes to the array itself.
We can treat this question as finding the starting point of a loop in a linked.

Explanation:

Let us take an array A = [3, 4, 1, 4, 2]
Treat the elements as individual nodes. If there is a repetition , we will get a loop.
The above array can be converted to a linked list as shown: 



We get 2 linked lists in this case. Since we need to output any one duplicate number, we can do so by finding the start of the loop in linked list. We do not actually need to create linked list for it. The slow and fast pointer can be simulated in the array itself. The solution contains 2 parts as is the case when we need to find the start node of a loop in a linked list.

Phase1: Take 2 pointers slow and fast. Fast moves with double the speed of slow. If there is a cycle the two pointers will meet somewhere.

Phase2: Keep the fast pointer where it is. Move the slow pointer to the start of the linked list. Now advance both the pointers at the same speed. The point where they meet is the start node of a loop.
In our case, that is our duplicate node.

Note: This does not take care of the case when there is no duplicate element. However, interviewbit seems to have no test cases for this scenario and the code passes. 

Please see the code below:


Please comment if you can  explain it better.


18 comments:

  1. can u explain how does this return -1 if no loop is found?

    ReplyDelete
    Replies
    1. I don't think that we need to check for that case as there will be at least one duplicate element based on the condition that the elements will be in the range 1 to n and the number of elements will be n+1.

      Delete
  2. For this you can initially check whether the sum is (n)*(n+1)/2 or not. If the sum is equal to this return -1.

    ReplyDelete
    Replies
    1. This may not be true always, because if instead of elements 4,6 we may have 5,5 still the sum can be n*(n+1)/2 .and we will still be having duplicates.
      So initial check may not be a solution.

      Delete
  3. For the given question, a test case such that there is no duplicate is not possible. Because we have an array size of n+1 and we only have n numbers. So, at least one number has to repeat.

    ReplyDelete
    Replies
    1. if replace fast by A[1] instead of A[A[fast]] in line 7,it is giving time exceed.Why so?

      Delete
    2. This comment has been removed by the author.

      Delete
  4. can't understand the logic behind the implementation of the slow and fast pointers, can you please explain it a bit more in detail?

    ReplyDelete
  5. Why do we need to find a start node of loop?

    ReplyDelete
    Replies
    1. That is where u will find the duplicate element

      Delete
  6. No need to complicate everything.. a simple solution in Cpp:
    int Solution::repeatedNumber(const vector &A) {
    vector v(A.size());
    fill(v.begin(), v.end(), true);

    for(int i=0;i<A.size();i++){
    if(v[A[i]]){
    v[A[i]]=false;
    }

    else
    return A[i];
    }
    }

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Dude, the question says to solve it using less than O(n) space. You are using an entirely new array.

      Delete
  7. Found a really good explaination after working very hard
    https://www.youtube.com/watch?v=MrHbSiQGvLk&t=14s
    its really unique way to solve this problem by buketing

    ReplyDelete
  8. If we are making the linked list then still I can see one problem here that is the space complexity will still not be less than O(n). can you please explain ?

    ReplyDelete