Recently, I was involved with a case that produced some interesting behaviour in Azure Data Factory ("ADF").
In this case, the only way to access the data needed was via two API endpoints. Considering the amount of data to be consumed, we wanted the processing to be done in parallel to avoid bottlenecks.
This is how the pipeline structure looked:
Call the first API endpoint;
If the results of the API call meet certain conditions, pass the array of ids (returned by the first API) to a second API. Second API results were then stored in a staging environment for further processing;
If the results from the first API call did not meet certain conditions, do nothing.
During testing, it was discovered that the id wasn't incrementing, resulting in the same data being staged repeatedly. On further investigation, as it turns out, this is a documented limitation of ADF. Below we're going to work through an example of this limitation. In my next post, we are going to work through the workaround.
The Set Up
Let's start by creating a simple pipeline that produces this limitation (I've attached the JSON for this pipeline at the bottom of this post).
In this example, we start with an array variable v_outer_array which contains two values - "a" and "b".
Using a ForEach activity, we iterate through v_outer_array setting a new variable, v_foreach, to the current item value of v_outer_array.
Then, depending on whether v_foreach equates to "b", we enter the True or False part of an If Condition activity.
Regardless of whether v_foreach equates to "b" is true or false, the final task is to set a third variable, v_ifcondition, to the current value of v_foreach.
The Results
When we take a closer look at our pipeline's run, we can see that there have been two loops through the ForEach activity, with v_foreach being set to both "a" and "b" as we would expect.
However, the input for the activity Set v_ifcondition if False is "a" on both occasions. So even though the output for v_foreach says it has been set to both "a" and "b", in reality, it hasn't. We're only ever passed through the value of "a".
How do we overcome this and ensure our variables are being set to the correct value? Next time we're going to look at the workaround.
Comments