β€’ 4 minute read

Automating Purview Integration Runtime with the Proxy API

Organizations often need to automate management of Self-hosted Integration Runtime (SHIR) nodes - for example, when VM1 (a Windows machine) needs to be replaced by VM2 (a newer Windows machine) - which contains a patched/updated base image. In the absence of automation, one would need to perform a series of manual steps involving RDP-ing into both machines, clicking around within the Purview Studio to delete/modify scans etc. every time this need arises.

Data Factory SHIR offers az datafactory integration-runtime module for day-to-day SHIR management, as well as Powershell scripts for first time installations. This allows us to leverage an existing CI/CD pipeline (Azure DevOps Pipeline tasks, GitHub Actions, Jenkins etc.) that offers some form of stateless script runtime (e.g. Bash or Powershell) to automate these day-to-day operations. As Azure Purview rapidly moves into GA, we can expect these innovations to be available for consumption in a similar manner - but at the time of writing (July 2021) it's not quite there yet.

As a workaround, this article offers a scripted approach involving the https://your--purview--acct.**proxy**.purview.azure.com endpoint that allows us to achieve end-to-end automation, imitating what's available in Data Factory SHIR today - which is the focus of the rest of this article.

Automation steps

We're going to demonstrate the SHIR automation lifecycle end-to-end, as follows:

SHIR automation - demonstration steps
SHIR automation - demonstration steps

  1. Create new SHIR1: create a new SHIR1 entity against an existing Purview Account.
  2. Get all SHIRs: obtain a JSON payload of all SHIR entities.
  3. Unregister VM1: we first onboard VM1 from scratch using Powershell (executed inside VM), then showcase how to unregister it via bash (executed from CICD pipeline).
  4. Register VM2: unregister VM1, and onboard VM2.

At this point, the automation cycle is complete - i.e. VM1 can be deleted, and this can be repeated going forwards (i.e. VM2 β†’ VM3 β†’ and so on.)

1. Create new SHIR1

Get OAuth Token

We run the following inside our CICD pipeline (in this case running bash) to obtain an OAuth Token against Purview (note that the jq usage is completely optional):

# Localize for your environment
tenantId=<your-tenant-id>
clientId=<service-principal-client-id>
clientSecret=<service-principal-client-secret>
resource=https://purview.azure.net

# Get OAuth Token
response=$( curl -X POST "https://login.microsoftonline.com/$tenantId/oauth2/token" \
--data-urlencode "grant_type=client_credentials" \
--data-urlencode "client_id=$clientId" \
--data-urlencode "client_secret=$clientSecret" \
--data-urlencode "resource=$resource")

token=$( jq -r ".access_token" <<<"$response" )

If we call echo $token, we should see the Token value if authentication was successful - e.g. something like this:

Get OAuth Token
Get OAuth Token

Create new SHIR Object in Purview

The following creates a SHIR entity named SHIR1 against an existing Purview Account:

# Localize for your environment
purviewAccount=<your-purview-acct>
SHIRName=SHIR1

# Create Self Hosted Integration Runtime Object
response=$(curl -X PUT "https://$purviewAccount.proxy.purview.azure.com/integrationRuntimes/$SHIRName?api-version=2020-12-01-preview" \
-H "Authorization: Bearer $token" \
-H "Content-Type: application/json" \
-d "{
   'name':'$SHIRName',
   'properties':
      {
        'type':'SelfHosted'
      }
    }"
)

And we see SHIR1 get created within Purview Studio:

Create SHIR1
Create SHIR1

2. Get all SHIRs

We can get a list of all our SHIR entities (not the individual VMs - but the Purview Objects) as follows:

response=$(curl -X GET "https://$purviewAccount.proxy.purview.azure.com/integrationRuntimes?api-version=2020-12-01-preview" \
-H "Authorization: Bearer $token")

# List of SHIR objects
jq -r "." <<< "$response"

This can be useful if we were performing management across multiple SHIR objects (in this demo, we're only using SHIR1 - but this can be easily extended by adding another layer to our automation logic.)

All SHIRs on Purview
All SHIRs on Purview

2Β½. Onboard VM1 for demonstration

Before we can unregister VM1, for this demo we first need a VM1 to be registered against SHIR1 (since we created SHIR1 from scratch above). In reality, we would have already onboarded VM1 at some previous point in time (that's the whole idea for us wanting to replace VM1 with a newer, patched VM2).

At a high level, there's two patterns we can leverage at this point:

  1. Grab SHIR Key from Purview inside CICD pipeline as our Service Principal, then pass in SHIR Key1 to the VM to register itself.
  2. Pass in Service Principal creds inside the VM, have it gather SHIR Key1 from Purview, then have it register itself.

We proceed with Method 1 - as it doesn't involve us injecting our Service Principal creds into the VM, but rather just the SHIR Key1 (which is limited to the scope of SHIR1) - i.e. it's arguably more secure.

Gather SHIR Keys

We can grab Key1 and Key2 from SHIR1 inside our CICD pipeline as follows:

response=$(curl -X POST "https://$purviewAccount.proxy.purview.azure.com/integrationRuntimes/$SHIRName/listAuthKeys?api-version=2020-12-01-preview" \
-H "Authorization: Bearer $token")

# All Keys
jq -r "." <<< "$response"

# Keys
Key1=$( jq -r ".authKey1" <<<"$response" )
Key2=$( jq -r ".authKey2" <<<"$response" )

Get SHIR Keys
Get SHIR Keys

Now, we can pass in any one of the Keys, e.g. Key1, as well as a PowerShell script that contains the SHIR onboarding logic (i.e. latest executable download β†’ installation β†’ registration) into VM1 (e.g. as a Custom Script Extension if we're deploying an Azure VM). Here, we can make use of the PowerShell scripts provided by the Data Factory team.

Register VM1 with SHIR Key

We run the SHIR onboarding PowerShell script inside VM1 (once again, for this demo we showcase via RDP - but in reality this would be done as part of the CICD pipeline):

# Download script
$SHIRInstallScriptURL = "https://gist.githubusercontent.com/mdrakiburrahman/cc99928d639fa10e905d36d2ed844429/raw/SHIRInstall.ps1"
$ScriptPath = "$PWD\SHIRInstall.ps1"

$client = New-Object System.Net.WebClient
$client.DownloadFile($SHIRInstallScriptURL, $ScriptPath)

# Execute script
.\SHIRInstall.ps1 $Key # Pass in SHIR Key1 via CICD pipeline

# Tail logs from script (for demo only)
Get-Content -Path ".\tracelog.log" -Wait # tracelog.log is generated by the script

VM1 SHIR onboarding (takes about 2.5 minutes): VM1 onboarding (takes about 2.5 minutes)

And we can progammatically get the VM1 registration info back in bash:

response=$(curl -X POST "https://$purviewAccount.proxy.purview.azure.com/integrationRuntimes/$SHIRName/monitoringData?api-version=2020-12-01-preview" \
-H "Authorization: Bearer $token")

node=$( jq -r ".nodes" <<<"$response" )

jq -r "." <<<"$node"

VM1 Registration against SHIR1
VM1 Registration against SHIR1

3. Unregister VM1

The following script will unregister the VM that's registered against SHIR1 (currently, this is VM1). We have the script query the Purview API for the currently registered VM's name to keep the logic stateless (i.e. otherwise, we'd need to pass in the name of the VM from a state-store somewhere; e.g. a database - which adds unnecessary complexity):

# Get name of currently registered VM
response=$(curl -X POST "https://$purviewAccount.proxy.purview.azure.com/integrationRuntimes/$SHIRName/monitoringData?api-version=2020-12-01-preview" \
-H "Authorization: Bearer $token")

node=$( jq -r ".nodes" <<<"$response" )

# Unregister VM if one exists
if [ "$node" == "null" ] # jq above happens to return the string "null", rather than an actual null if ".nodes" is empty
then
 # No VM exists
 echo "No SHIR node registered, no need to unregister."
else
 # VM exists
 nodeName=$( jq -r ".[0].nodeName" <<<"$node" )
 echo "$nodeName is registered. Unregistering now..."
 # Remove VM from SHIR
 curl -X DELETE "https://$purviewAccount.proxy.purview.azure.com/integrationRuntimes/$SHIRName/nodes/$nodeName?api-version=2020-12-01-preview" \
-H "Authorization: Bearer $token"
 echo "Unregistering successful."
fi

We see it picks up the fact that VM1 is registered, and unregisters it:

VM1 unregistered against SHIR1
VM1 unregistered against SHIR1

At this point, we can deallocate and delete VM1 as needed - as it's no longer required for our SHIR scanning pipeline.

4. Register VM2

Finally, to register our new node, we leverage the script from above on VM2 (a new, patched Windows VM), to register it against SHIR1:

VM2 onboarding (takes about 2.5 minutes)

And we see our VM2 registered:

VM2 Registration against SHIR1
VM2 Registration against SHIR1

And that's it! We can now repeat Step 3 and Step 4 above every time we need to replace a VM without any manual steps involved.

Get in touch πŸ‘‹

If you have any questions or suggestions, feel free to open an issue on GitHub!

Β© 2023 Raki Rahman.