## Data Engineering Interview Questions and Tutorial

March 20, 2020

Passing a coding interview is a different skill than writing code. Even a CS degree holder who has failed 8+ coding inteviews just needs the practice and the right opportunity!

I recently did a coding interview for a Data Engineer position at one of the top tech companies in the Bay Area (e.g., one of FANG). With sharing the exact Python questions and my "solutions", I hope it would help give insight on how to solve coding problems on the fly. Give it a try to test your knowledge or continue reading to see how I tried to solve it.

Disclaimer: There were 3 Python questions I recieved during my 1-hr coding interview, which was done using coderpad. Each question required that I wrote a function that would have to "pass" all the multiple test cases listed, which was validated by the `assert()` function. The solutions that I have provided are the ones that I came up with during the interview process. They are not the only solutions or the "fastest".

## Question 1

Return the count of a given char that exists in a string

```"""
Example:
s = "mississippi"
char = "s"
Expected Output = 4

Assumptions: Length of char is always 1
"""

# Try solving this function!

def countChar(s, char):
# your code here
return num

assert countChar("mississippi", "s") == 4```

## Question 2

Return a list of mismatched words between two strings

```"""
Example:
s = "i think dogs are cute"
char = "dogs like to eat"
Expected Output = ['i', 'think', 'are', 'cute', 'like', 'food']

Assumptions: List of words can be returned in any order
"""

# Try solving this function!

def findMismatch(s1, s2):
# your code here
return output_list

assert findMismatch("i think dogs are cute", "dogs like to eat") == ['i', 'think', 'are', 'cute', 'like', 'food']
```

## Question 3

Return a list that replaces any 'None' element with the previous non-None element.

```"""
Example:
input_list = [1, 1, 8, None]
Expected Output = [1, 1, 8, 8]
"""

# Try solving this function!
def update_list(input_list):
# your code here
return output_list

# Also consider the following test cases
assert fix_list(None) == None
assert fix_list([None,5,4,None]) == [None,5,4,4]
```

Give yourself about 7-10 minutes to solve each question. If it takes more than 15 minutes, try reviewing data structures in Python.

## Solving these questions

The goal of writing each function is to get "passing" for all test cases provided, under a "reasonable" timeframe. Keep practicing!

“I have not failed. I've just found 10,000 ways that won't work.” - Thomas Edison

Question 1: Return the count of a given char that exists in a string.

First, we have to store the counts of the target letter ("char"), so we need to initialize a dictionary using `dict()` or `{}`. I tend to prefer the former, just to make reading the code easier.

```"""
Example:
s = "mississippi"
char = "s"
Expected Output = 4

Assumptions: Length of char is always 1
"""

def countChar(s, char):
# initialize a dictionary
output_count = dict()

return num

assert countChar("mississippi", "s") == 4```

Using the dictionary, we can create a new counter starting at 0 associated with the target letter. We can do this by creating a key-value pair with the key as "char" and value as 0.

```"""
Example:
s = "mississippi"
char = "s"
Expected Output = 4

Assumptions: Length of char is always 1
"""

def countChar(s, char):
# initialize a dictionary
output_count = dict()

# create a key-value pair with the target letter and counter at 0
output_count[char] = 0

return num

assert countChar("mississippi", "s") == 4```

Now that the dictionary has been set up, we can iterate each letter of the string by creating a `for loop`. In the `for loop`, we can also create a condition for when a letter exists the target letter, "char". This is because any Python string can be treated like a list object (e.g., "hello" -> ["h", "e", "l", "l", "o"]) - we can check whether a string (which contains a "list" of characters) exists in another string (which contains another "list" of characters).

```"""
Example:
s = "mississippi"
char = "s"
Expected Output = 4

Assumptions: Length of char is always 1
"""

def countChar(s, char):
# initialize a dictionary
output_count = dict()

# create a key-value pair with the target letter and counter at 0
output_count[char] = 0

# create a for loop to iterate each letter of the given string
for letter in s:

# setup IF condition when the letter "exists" in char
if letter in char:

return num

assert countChar("mississippi", "s") == 4```

We want to count every time the target char exists in the given string. We can use the abbreviated syntax for counting, `count += 1`, which is identical to `count = count + 1`.

To add the number of "counts" into the key-value pair of our dictionary, we can write `output_count[char] += 1`, which is identical to `output_count[char] = output_count[char] + 1`.

Finally, we want to the function to return the total counts of the target char, so we return the value of the key-value pair by assigning the `num` variable with `output_count[char]`.

```"""
Example:
s = "mississippi"
char = "s"
Expected Output = 4

Assumptions: Length of char is always 1
"""

def countChar(s, char):
# initialize a dictionary
output_count = dict()

# create a key-value pair with the target letter and counter at 0
output_count[char] = 0

# create a for loop to iterate each letter of the given string
for letter in s:

# setup IF condition when the letter "exists" in char
if letter in char:

# incremental counter
output_count[char] +=1

# assign the num variable with the count of the key-value pair
num = output_count[char]

return num

assert countChar("mississippi", "s") == 4```

Let's try to run this code! If the function does not give you a `Assertion()` error, this function passes!

Question 2: Return a list of mismatched words between two strings.

First, let's create a list called `output_list` to store any mismatch words found.

Secondly, let's convert each given string into a list using the `split()` function. By default, Python will split each word of the string if there is a blank space in between.

For instance, if the string is "hello world", the `split()` function will return the string as a list `["hello", "world"]`.

```"""
Example:
s = "i think dogs are cute"
char = "dogs like to eat"
Expected Output = ['i', 'think', 'are', 'cute', 'like', 'food']

Assumptions: List of words can be returned in any order
"""

def findMismatch(s1, s2):
# create new list to store mismatch words
output_list = []

# create a list for each string
split_s1 = s1.split()
split_s2 = s2.split()

return output_list

assert findMismatch("i think dogs are cute", "dogs like to eat") == ['i', 'think', 'are', 'cute', 'like', 'food']
```

We want to create a `for loop` so that we can iterate every word in each of the list of words and check that the word does not exist in the other list using the `if else` condition.

```"""
Example:
s = "i think dogs are cute"
char = "dogs like to eat"
Expected Output = ['i', 'think', 'are', 'cute', 'like', 'food']

Assumptions: List of words can be returned in any order
"""

def findMismatch(s1, s2):
# create new list to store mismatch words
output_list = []

# create a list for each string
split_s1 = s1.split()
split_s2 = s2.split()

# for loop to iterate every word from first list
for word in split_s1:

# conditional to check if the word does not exist in second list
if word not in split_s2:

# for loop to iterate every word from second list
for word in split_s2:

# conditional to check if the word does not exist in first list
if word not in split_s1:

return output_list

assert findMismatch("i think dogs are cute", "dogs like to eat") == ['i', 'think', 'are', 'cute', 'like', 'food']
```

Under the condition where the word does not exist in the other list, we want to add the word to our `output_list`, using the `append()` function. Finally, the function will return the list of mismatch words in any order.

```"""
Example:
s = "i think dogs are cute"
char = "dogs like to eat"
Expected Output = ['i', 'think', 'are', 'cute', 'like', 'food']

Assumptions: List of words can be returned in any order
"""

def findMismatch(s1, s2):
# create new list to store mismatch words
output_list = []

# create a list for each string
split_s1 = s1.split()
split_s2 = s2.split()

# for loop to iterate every word from first list
for word in split_s1:

# conditional to check if the word does not exist in second list
if word not in split_s2:

# add word to output_list
output_list.append(word)

# for loop to iterate every word from second list
for word in split_s2:

# conditional to check if the word does not exist in first list
if word not in split_s1:

# add word to output_list
output_list.append(word)

return output_list

assert findMismatch("i think dogs are cute", "dogs like to eat") == ['i', 'think', 'are', 'cute', 'like', 'food']```

You can check that this function returns the list of mismatch words according to the test case by running the `assert()` function.

Question 3: Return a list that replaces any 'None' element with the previous non-None element.

Note: this was a fairly challenging question as there are multiple test cases to consider.

First, let's create a new list called `output_list` for the function to return.

Consider the first test case. If the input list is None, the function should return None. Let's write that condition using an `if else` statement.

```"""
Example:
input_list = [1, 1, 8, None]
Expected Output = [1, 1, 8, 8]
"""

def update_list(input_list):

# create new list to store output
output_list = []

# condition to satisfy the first test case
if input_list is None:
return None

else:

return output_list

# Also consider the following test cases
assert fix_list(None) == None
assert fix_list([None,5,4,None]) == [None,5,4,4]
```

Next, we want to be able to iterate each item of `input_list` using a `for loop` and check whether the item is None with an `if else` statement. If the item is not None, we can add the item to the `output_list` using the `append()` function.

```"""
Example:
input_list = [1, 1, 8, None]
Expected Output = [1, 1, 8, 8]
"""

def update_list(input_list):

# create new list to store output
output_list = []

# condition to satisfy the first test case
if input_list is None:
return None

else:
# iterate each element of the intput_list
for element in input_list:

# conditional if the element is not None
if element is not None:
output_list.append(element)

else:

return output_list

# Also consider the following test cases
assert fix_list(None) == None
assert fix_list([None,5,4,None]) == [None,5,4,4]
```

The tricky part is figuring out how to add the previous element into the `output_list` if the current element is None.

One strategy is to store the previous element as a variable, such as `previous_element = element` when the element is not None. In Python, any variable must be declared outside of the `for loop`.

What should we assign the previous_element with? An empty string? Recall that for the second test case, the first element of the list is None and does not have a previous element. We can set the `previous_element` variable as None.

```"""
Example:
input_list = [1, 1, 8, None]
Expected Output = [1, 1, 8, 8]
"""

def update_list(input_list):

# create new list to store output
output_list = []

# create a new variable as None
previous_element = None

# condition to satisfy the first test case
if input_list is None:
return None

else:
# iterate each element of the intput_list
for element in input_list:

# conditional if the element is not None
if element is not None:
output_list.append(element)

# also store the element
previous_element = element

else:
# add the previous element when the element is None
output_list.append(previous_element)

return output_list

# Also consider the following test cases
assert fix_list(None) == None
assert fix_list([None,5,4,None]) == [None,5,4,4]
```

Finally, the function should return a list that satisfies the two test cases listed here. Try running this function with the test cases using `assert()`!

## Concluding Thoughts

It's a rough process to "crack" the coding interview. Even if you don't do well, treat all coding interview experiences as a learning process. No matter how many times you have failed, think of how to prepare better for the next one! Several great resources are discussions in LeetCode and Glassdoor for an aggregation of technical interviews.