This week was spent setting up and running the PoC (Proof of Concept) experiment in Amazon EC2. I created a load balancer connected to two virtual machines acting as simple web servers. When I made a request to the load balancer’s DNS address, I would receive a simple HTML page from one instance or the other.
The goal was to examine the packets returning to my client machine to see if there is identifying information in the packets that would distinguish the two VMs. To capture the packets, I used the linux command-line tools ‘curl’ to make HTTP requests to the loadbalancer and ‘tcpdump’ to capture incoming and outgoing packets and write them to a .pcap file. Then we can use ‘wireshark’ to examine the packets individually.
The week will be spent examining the packets, particularly the IPid’s, to identify patterns, if any, that would distinguish the two VMs.
Last meeting, we decided to focus on identifying deployments behind load balancers in the cloud. I read a couple of papers to familiarize myself existing research and glean ideas for techniques on measuring web deployments . “WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds” describes the WhoWas platform that uses active probing to perform network measurements and provide the history of an IP address over time. “A Technique for Counting NATted Hosts” by Steve Bellovin describes their process of using the unique sequences of IPids to attempt to identify the number of hosts behind a NAT (Network Address Translator).
We decided to explore the applicability of Bellovin’s IPid technique. This week we will work on setting a small trial experiment in Amazon’s EC2 with a few instances and load balancer to see if we can detect unique hosts based on the IPid.
Hello and Welcome!
My name is Maimuna Lubega ( Mai, pronounced “My”, for short). I will be documenting on this blog my experience as an Undergrad Researcher as apart of CRA-W’s 2014 Collaborative Research Experience for Undergraduates program. I will be conducting research on Public Infrastructure-as-a-Service clouds under UW-Madison Professor Aditya Akella and Graduate Researcher Aaron Gember-Jacobsen. Here is a abstract summary of our research project:
Cloud services are a popular web hosting and data storage option for several companies and organizations. This study aims to infer the the back-end configuration behind these web services deployed in Public IaaS clouds, for instance how many back-end servers support a hosted front-end service, the geographic distribution of these back-end resources, and if and how web services are using loading balancing and content distribution networks (CDNs). First, we will explore identifying back-ends behind load balancers or VMs by examining meta-data in HTTP headers and other techniques to try and infer the configuration and number of these back-end servers. Secondly, we will explore how to determine if web services that utilize CDNs are also hosting their content in the cloud or elsewhere. Examining the DNS look ups of a client to infer the content source is one possible technique we will investigate.
More to follow!