We were able run a controlled experiment in EC2 and gather response timing data via information returned by cURL . As we suspected, the response times from the load balancer versus its back-end instance were not significantly different.
For our second idea of identifying web deployment misconfigurations, we made a list of ports we could probe in order to determine how prevalent those services are in the cloud, including FTP (21), SSH (22), MySQL ( 3306), memcached(11211), redis(6379), NFS (2049), OpenVPN (1194), HFDS datanode (50075), MapReduce task tracker ( 50060) . This is the first step in helping us decide which services are worth exploring, which including figuring out if and how they are misconfigured and insecure. We intend to look at using WhoWas to conducting the probing in the following weeks.
Last week were where able to critically examine the ideas of the previous week and narrow them down to the 2 best options as summarized below:
1) Timing Measurements of Request-Response Packets through a Load-Balancer versus Direct-to-Machine Communication ( in a Cloud)
This idea would involve measuring the time it takes to receive a response packet from a vm behind a load balancer in the cloud versus communicating directly with that machine. The original hypothesis is that the timing information the different situations will be similar and therefore indistinguishable, but if the time responses are dissimilar, it would open up more room for exploration. The plan is to conduct the testing in a controlled environment in the EC2 cloud. Since the methods ( packet capture and inspection) are quite similar our work done earlier in the semester, this component shouldn’t take long to complete.
2) Determine Prevalence of Misconfigurations in Cloud Deployments
The WhoWas paper including a case study of the ” software ecosystem of web services running the cloud” which included identifying the web servers and templates, backend techniques, and tracking behaviour. As a continuation of that inquiry, we thought it would be interesting to study misconfigurations in these software ecosystems. Our ideas include:
1- Identifying set-ups where we are able to communicate directly with vm that is supposed to be hidden behind a load balancer.
2-Identifying unprotected memcached . Memcached is an in-memory object caching system used to speed up websites, but purportedly has little built-in security which could lead to unwanted leakage of data. Determining the extent of memcached usage and protection provide insight into the security of some software deployments in the cloud.
In our search for new research questions, we contacted on the co-authors of the WhoWas paper and held a productive brainstorming session. The WhoWas paper was a measurement study of web deployments in the EC2 and Azure cloud using light active probing over an extended period of time. We look at the unresolved questions from the study and were able to come up with several ideas for further research.
For example, the study only probed ports 80 & 443, so we could try other ports to and see how they affect the results. We could also conduct measurements to see if there is a time difference between directly accessing a vm versus accessing a vm via a load balancer. Additionally, we could study misconfigured deployments in the cloud, instances with an unprotected memcache-D for instance.
At the end we made plans to meet the following weeks and narrow down which problem we intend to focus on.
For our first meeting of the new semester, we took time to evaluate our research goals to determine if they were attainable or if we should look at slightly different research problem. While our original idea was to try and determine how much content served by CDNs originated in the cloud, we also decided that reverse engineering the behavior of the CDNs might prove too complex and difficult. We decided to reexamine the WhoWas paper and look for related research questions that would be more feasible.