Screen Scrape with Powershell

I am sure there is all sorts of technical, legal or ethical reasons why you shouldn’t do it, however sometimes its much needed. Powershell is that knight in shining armor which will save your day.

 

In this example I am snatching information from wikipedias SO_3166-1

wiki

as you see in the code is split a very useful information to pin out the information needed, do be aware that your code will fail as soon as the provider changes his format.

web

The code…

$ISO3166 = Invoke-WebRequest -Uri "https://en.wikipedia.org/wiki/ISO_3166-1" -UseBasicParsing

$Countries = ((($ISO3166.Content -split "<table class=`"wikitable sortable`"")[2] -split "</TABLE>")[0]) -split "<tr>" | Select-Object -Skip 2
 
if ($Countries.Count -eq 0) {
 Write-Warning "oh crap no Countries found"
 Continue
}
 
foreach ($Cty in $Countries) {
 
 $country = $null
 $alpha02 = $null
 $alpha03 = $null
 $numeric = $null

 #Decode html
 $country = (($Cty -split "<td>")[1] -split "</td>")[0]
 $alpha02= (($Cty -split "<td>")[2] -split "</td>")[0]
 $alpha03= (($Cty -split "<td>")[3] -split "</td>")[0]
 $numeric= (($Cty -split "<td>")[4] -split "</td>")[0]

 #Special decoding
 $country = (($country -split ">")[1] -split "<")[0]
 $alpha02= (($alpha02 -split "<span style=`"font-family: monospace, monospace;`">")[1] -split "</span>")[0]
 $alpha03= (($alpha03 -split "<span style=`"font-family: monospace, monospace;`">")[1] -split "</span>")[0]
 $numeric= (($numeric -split "<span style=`"font-family: monospace, monospace;`">")[1] -split "</span>")[0]

 #Output results
 $returnObject = New-Object System.Object
 $returnObject | Add-Member -Type NoteProperty -Name country -Value $country
 $returnObject | Add-Member -Type NoteProperty -Name alpha02 -Value $alpha02
 $returnObject | Add-Member -Type NoteProperty -Name alpha03 -Value $alpha03
 $returnObject | Add-Member -Type NoteProperty -Name numeric -Value $numeric
 
 Write-Output $returnObject
}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s