February 15, 2023
You have two arrays of strings,
parts. Return an array that contains the strings from
words, modified so that any occurrences of the substrings from
parts are surrounded by square brackets
, following these guidelines:
parts substrings occur in one string in
words, choose the longest one. If there is still more than one such part, then choose the one that appears first in the string.
words = ["Apple", "Melon", "Orange", "Watermelon"] and
parts = ["a", "mel", "lon", "el", "An"], the output should be
solution(words, parts) = ["Apple", "Me[lon]", "Or[a]nge", "Water[mel]on"].
"Watermelon" contains three substrings from the
"mel" is the longest substring that appears first in the string.
We will tackle this problem in a straightforward manner by iterating through each word in the list of words and checking for the presence of substrings from the list of parts in each word. We will record the longest substring that appears earliest in the word.
Let's analyze the performance of the code. With
W as the number of words and
P as the number of parts, the running time would be at least O(WP) due to the nested loop. In addition, the function calls
if p in w and
w.index(p) within the loop will have a running time of O(len(W)) as they scan the word w to find p. Although the performance can be improved by calling
w.index(p) once and saving the result, the running time would still be O(WPN), where N is the length of a typical word in w.
According to the given constraints, W is at most 5000, N is at most 30 and P is at most 5000. This leads to 750 million loops, which means each loop would have to run in about 5 picoseconds to be completed within 4 seconds. This is a tough requirement, especially considering that the worst case scenario has been taken into account.
However, it's important to note that in real-world scenarios, the worst case scenario may not always be encountered. In the context of job interviews, companies are looking for candidates who can quickly solve the problems of today, rather than waiting for the perfect solution.
It's good to show the interviewer that you understand the complexity of the algorithm and that it's a combination of three multiplicative factors (length of a typical word, number of words, and number of substrings). You can also mention that the code passes the tests, but may not handle the worst case in a reasonable amount of time.
If you want to impress the interviewer, you can mention that there are better algorithms that can be used if there is a valid business case for investing more time in the solution. However, it's important to keep in mind that premature optimization is not advisable.
The main function
solution(words string, parts string) string will focus on constructing the trie and bringing together all the words at the end.
Here's the plan for
- Initialize the variable
- Iterate through each character in the word, using it as the starting point for finding matching substrings in
- For each character, utilize the trie to locate the longest substring in
partsthat begins with that character. If it's longer than the previous longest one, record the current position in the word and the length of the substring.
- Use the length of the
longestSeenSubstringand its starting position to properly place the square brackets.
You'll notice that going through the word in order and only updating the position when we encounter a longer substring automatically satisfies the tie-breaking rule. The trickiest part is finding the longest substring from
parts that starts at a particular location, as we need to keep track of the final terminal node that we passed before failing to match in the trie. It would be much simpler if we only had to keep track of the last node we were on before the match failed.
Our approach continues to loop through each word, but there is one limitation that hasn't been utilized yet - the length of each element in
parts. In the worst-case scenario, we may need to check up to 5 levels at each position in each word. To be precise:
- The number of words, W = 5000;
- The number of letters in a word, N = 30;
- The number of levels to check in the trie, P = 5.
Compared to the initial naive implementation, this is 1000 times more efficient.