The previous week I wrote a Python script that successfully pulled the server id from the HTML of Netflix’s landing page. I ran the script in a loop to build up a data set from which I then was able to identify 4-5 distinct servers, although I did not observe any simply identifiable patterns by naive examination. Next , I decided to set up VMs in different regions of EC2 and have them poll the Netflix site in order to build up a larger data set. While I was able to successfully pull data with VMs set up in several US regions, I ran into a few errors when attempting to poll Netflix from EC2 regions outside the US –I am still looking into the issue to see what is happening differently outside of the country.
On a side note, I also attempted to see if any other software services advertised on AWS as using EC2 happen to reveal load balancing information or server id’s, but so far only Netflix appears to display this information.
This week I will continue to pull data and well as examine the intermediate data sets I have collected.