I have recently returned from NSDI 2015 in Oakland, CA. Overall, I enjoyed the experience; it was very interesting to listen to the paper presentations and learn about the types of research being conducted at such a high level. Two presentations in particular stuck out to me. Phd researchers from the University of Cambridge gave a presentation on how jumping queues ( Qjump) can lower latency in a data centers. I especially enjoyed this talk because I was familiar with several OS-related concepts and was therefore able to follow along easily. During the Wireless Track, researchers from MIT presented a system to track several people, both stationary and moving, utilizing wireless signals. They even had an real-life demonstration where they were able to localize a stationary volunteer based on his heartbeat. The paper was titled Multi-Person Localization via RF Body Reflections.
On Monday night I attended an BoF ( Birds of a Feather) session for Students and Young Professionals. There, I was able to meet other younger researchers like myself and trade stories, discuss the presentations of the day, and generally socialize. Finding such a great group of people with similar experiences really made my time at NSDI truly enjoyable.
Since nginx ( pronounced ‘engine-x’ ) was the second most common web server/load balancer in use, we thought it would be a good idea to try some of the previous techniques ( check IPid sequences, TCP timestamps, etc) on servers running nginx to see if the results hold. I spent this week attempting to configure a couple instances in EC2 running nginx. The set-up is slightly more complicated than a basic Apache server and so additional troubleshooting was needed.
This week I continued to look through the list ( I pulled approximately 3,000 server fields ). To make things easier, I removed Apache results and sorted the list alphabetically. Most of the names listed in the server field are those of general web servers, not any specific open source load-balancing services such as balance, HA Proxy, Linux Virtual Server, etc. Since we are looking at old WhoWas data, the results are not reflective of the most currently services in the cloud, but nevertheless are a good starting point.
This week, along with other student researchers I was able to meet with a visiting researcher, Saiha Guha, who came to give talk about his paper, “Characterizing Large-Scale Click Fraud in ZeroAccess“. This was an opportunity to discuss my research and receive feedback and insights. Having previously written the Poster Abstract was quite helpful, as I was able to succinctly explain the goals and methods of my research.
In terms of the WhoWas data, as suspected the reverse DNS lookups on a few IP clusters resolved the generic format and not shared domain name. I have extracted a list of unique servers from the WhoWas data and am in the process of looking through to identify load-balancing servers. At initial glance it appears Apache, followed by nginx are the most common servers, but we will continue to look for other services.
This week I finished registration and booking to attend the NDSI 2015 conference in Oakland, CA. Fortunately, the cost of the trip will be covered with a combination of CREU funds and the NDSI travel grant I received.
We also discussed looking at some old WhoWas data to investigate some preliminary hypotheses. First, we thought it would be useful search the cloud space to see what sort of load-balancing services, if any, were in use. The old WhoWas data is stored in a mysql database, so this task is as simple a running a sql query on the server field of IP header packet tables. Secondly, we also wish to explore the scenario such that if vms are behind a load balancer, would the reverse DNS lookup of those vm’s IPs resolve to the same domain name? The potential issue lies in that domain names in the cloud may be a generic individualized format ( www.some-ip.aws.amazon.com” rather than an shared domain name. Without a shared domain name, we cannot map or clusters those of IPs. Therefore by performing reverse DNS lookups on IP clusters from the old WhoWas data, we can try to establish ground truth on this issue.
Happy Spring Break everyone!
In other good news, my NDSI grant application was accepted. Registration and a small travel stipend was included in the award. Thank you NSDI! We plan to discuss how to cover the rest of the funding next week after spring break.
This week I completed a travel grant application, which included writing another application essay. Fortunately, this turned out which turned out to be a great deal shorter than the poster abstract, but still required a concise summary of my work and an explanation of how attending the conference would be relevant to my field to study. Following is a snippet from my essay:
Attending the 12th USENIX Symposium on Networked Systems Design and Implementation will be a valuable capstone to my first undergraduate research experience. I am currently a senior at the University of Wisconsin-Madison and expect graduate in May 2015. I was selected to be among the 2014-15 Collaborative Research Experience for Undergraduates (CREU) cohort spearhead by CRA-W. Working under faculty advisor Aditya Akella and graduate mentor Aaron Gember-Jacobson, I spent most of the year researching networked systems in public Infrastructure-as-a-Service ( Iaas) clouds. Particularly, I researched methods to discover hosts behind a load-balancer configured in Amazon EC2. I’ve evaluated several techniques including counting IPid sequences in IP packets sent between the server and client, using TCP timestamps to calculate the clock skew of individual machines, looking for HTTP cookies from the server and more. As a result, over the past year I have had the opportunity to explore the many aspects of scientific research including formulating and vetting research ideas, gathering, processing and interpreting data, finding and reading related works, writing a poster abstract, and summarizing and presenting my work to others. Attending NSDI 2015 would allow me to see not only what other ideas in the field people have been researching, but also how students, professors, and industry professionals share and use the discoveries of academic research. It will also be a valuable opportunity to receive feedback on my own research, and I would be willing to a scribe for one of the sessions.
I spent this week putting the final touches on the poster abstract and then successfully submitted it to the Grace Hopper portal. I found the experience of writing a poster abstract very informative in terms of how to organize and present data, summarize research methods, explain results, referencing prior work and other research papers and more. It was an interesting look into the other side of scientific research that isn’t just about doing experiments, but rather effectively presenting your ideas to others.
In addition, we also discussed attending the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2015 ) conference in Oakland, CA. My next step is to write a essay to apply for the travel grant for NSDI.
For the time being, I have temporarily paused working on the WhoWas project in order to work on drafting and submit a poster abstract to Grace Hopper 2015. After consultation, we decided it would best to write about our research attempts to discover hosts behind a load-balancer where we have more data and results. The poster abstract will detail our four major approaches 1)counting IPid sequences, 2) calculating the clock skew from TCP timestamps, 3) searching for front-end identifiers, and 4) searching for AWSELB cookies.
As the first step, we were able to fork the existing WhoWas code from a git repository and get the code up and running it its original form. Currently, the WhoWas scanner accepts a list a IP ranges, attempts to initiate connection on port 80 (HTTP), 43 (HTTPS), or 21 (SSH). If the connection is successful it stores the relevant information( header, ip , etc) in a SQL database. This week we hope to look through the existing code and identify where we can make changes for our specific experiment.