Blog

Blog

Google failed! Read all about it!

Sure, it's all over the news, Tuesday's Google major glitch, impacting about 14% of its global user base was embarrassing, disappointing and plain dumb.

And of course, the question now arises: How good of an idea really is cloud computing when even Google can go down?

Well, let's think about it, what are the alternatives to cloud computing? Are they really better in terms of availability?

Because that's the real question, in terms of general availability, who is going to be better equipped to respond to the glitches we know there will be: a company that doesn't specialize in IT infrastructure but merely uses them or the cloud provider – a company that specializes precisely on IT infrastructure?

No matter how you look at it, non specialized staff in a non specialized company will always react slower, take more time to figure it out that its specialized staff counterpart in the specialized company, this doesn't just applies to cloud computing but to the fastest growing trend in business in the past 50 years, outsourcing.

We just have to keep in mind that cloud computing is not going to give us 100% availability, but, big surprise, nothing is! We all know the benefits of cloud computing; and now here we have one of the issues with cloud computing, not a small one, but not a new one either!

Google failed, AWS failed, the question is, did your own IT infrastructure ever failed on you?

Note that we are not claiming this should be taken lightly, a failure in your IT infrastructure provider should worry you, but what's most important than an actual failure is how they respond to it: did they disregard users' complaints? Did they know about it before you did? This aftermath should be the one leading our decision-making process when choosing a cloud provider.

We just need to remember, #googlefail'ed, and so will your IT infrastructure provider.

 

Network recognition is the basic key to remote attacks. New scanning methods and techniques are often developed by all hat colors. Each of this methods and techniques focus on one or two features sacrificing other features. Take for example the TCP scan, the most basic form of network recognition, where a connection is opened to each scanned port. It's main feature is that its fully reliable, while it lacks an important feature, it's easily detectable by any IDS.

There are also scans such as the Null scan or the Fin scan that are stealthier than the half open scan (also known as SYN scan), but they don't differentiate between open and filtered ports.

There are many techniques used to hide port scanning, such as the usage of decoys, where the scan performed is replicated with spoofed IPs, in the intent to confuse IDS and administrators in correctly identifying which is the true source of the scan; or the usage of fragmented packets, to bypass firewalls that don't reassemble fragmented packets.

Introduction to the Proxy scanning method
Proxy scanning is a brand new scanning technique developed by the author of this article which focus on two different features: firewall bypassing and blind TCP port scanning.

As it names implies, the Proxy scanning method relies on the use of proxies. Even though the implementation this article its going to use only supports HTTP 1.x proxies, the method on itself can also be used with other proxies types such as SOCKS.

The idea behind this scanning method resembles the FTP bounce attack, with the important difference that administrators all over the world have taken care to prevent exposure to this vulnerability, while the proxy scanning method is new and completely usable nowadays.

Blind TCP port scanning
This feature of the Proxy scanning method relies on the usage of public or private proxies (see the “Introduction to proxies” inset). While the mechanism is very simple it's also a very powerful way of scanning without sending a single packet from the true source to the target host or network.

The proxy scanning method simply connects to a standard HTTP proxy server and sends a GET (or any other HTTP petition) petition specifying as the web page to fetch an URI which points to the target IP and port.
# nc proxy 8080 GET http://target:port/ HTTP/1.0
Listing 1. GET petition to the proxy server

After this command is sent to the proxy server, if the proxy accepts communicating with an unauthenticated source and to connect to a non-standard HTTP port (see the “Finding proxies” inset), it will try to connect to port 22 on target. Now, from the proxy point of view, this is an standard old fashion TCP full-connect scan, the target can send a SYN/ACK back if the port is open, a RST if the port if close and no packet at all if the port is filtered or target is not reachable from the proxy's location.

502 Bad Gateway
The server, while acting as a gateway or proxy,
received an invalid response from the upstream server
it accessed in attempting to fulfill the request.

Chains

Even more secure blind TCP scanning is possible using a technique often referred as “proxy chaining” or “condon technique”. By this technique the attacker uses public or private proxies as hops to hide the real source of the connections.

Each hop added to the chain increases:
  • Security for the attacker;
  • Latency;
  • Throughput.
Each hop adds security for the attacker (security in the sense that the attack is not going to be traced back to him/her) since a connection that jumps around the world several times requires a lot of time, effort, money and resources in order to be traced, and even with lots of all of this the connection could be practically untraceable if the attacker is smart enough to use good proxy choices (a dumb attacker would use a couple of proxies all within the country of the target, while an smart one would have to wait several hours or days for a simple scan to be completed while the packets bounce all over the world, within different countries and preferably countries that don't hold relationship with the country of the target).

With all this in mind attackers decide how many hops are to be used according to the type of target. If an attacker is working on a dangerous or powerful source (say, a government agency or militar force) most probably the
attacker will use several hops, sacrificing latency and throughput for security sake.

The theory on how to use this technique with the proxy scanning technique is straightforward, instead of directly connecting to a proxy server, the attacker connects to n proxies, and connects at last to the last proxy, where the attack its going to be launched from.

Not all proxies are good for this technique, luckly, finding valid proxies is not really hard. Proxies that can be used for this particular technique are HTTP proxies with CONNECT method (see the “Connect method” inset) and SOCKS proxies. This article is going to focus on HTTP proxies with CONNECT method.

When the attacker creates the chain each node of the chain will only know about the previous and the next node, not knowing it's being used as part of a chain.

Given this, the real source of the attack is going to be known only by the first node, and the target of the attack is going to be known only by the last proxy.

We will continue this article in the next post.

Now that we have each image setup with its appropriate software and ready to be deployed we have to give some thought to how High Availability will be taken care off.

For this we are reusing the VPN connection we already have established. We will have each server started with an index (this is done through the user data that Amazon Web Services provides as part of the startup of an instance). That index will be used to establish an order on which servers will attempt to establish different roles, for this we are using the following IDs

Frontend1 = OpenVPN server

Frontendn (n > 1) = OpenVPN client, monitors Frontendn-1

Backend1 = Active MySQL server, OpenVPN client

Backendn (n > 1) = Passive MySQL server, OpenVPN client

So, regarding high availability, each server has an ID as well as an IP which is assigned by the first frontend according to the certificate used (each possible role has a different certificate emitted by the frontend). Since this is not going to be a large deployment we can simply write the possible values:

/etc/hosts

127.0.0.1 localhost localhost.localdomain

70.xxx.xxx.xxx frontend1

192.168.21.1 frontend1.internal
192.168.21.11 frontend2.internal
192.168.21.12 frontend3.internal
192.168.21.13 frontend4.internal
192.168.21.14 frontend5.internal
192.168.21.100 backend1.internal
192.168.21.101 backend2.internal
192.168.21.102 backend3.internal
192.168.21.103 backend4.internal
192.168.21.104 backend5.internal

Monitoring

Now, everytime an instance starts with its corresponding ID it logs in through the VPN and the VPN server assigns the IP according to its role, this is done using the client-connect setting to launch a script:

/etc/openvpn/pool
#!/bin/bash

FILE=$1

if [ `grep $common_name.internal /etc/hosts | awk '{print $1;}' | wc -l` = "0" ]; then
echo "Unable to find IP of $common_name" >&2
exit 1
fi

ip=`grep $common_name.internal /etc/hosts | awk '{print $1;}'`

echo "ifconfig-push $ip 255.255.255.0" > $FILE

exit 0

That forces the VPN to assign the right IP address, and once the server has its IP it will start monitoring the server that was started, in its same role, right before. This is true for all but for the first instance of each role.

When a server detects its monitored server is not responding for some time it will assume the server its not responding and it will i) completely stop the instance, ii) take over the role of that server

#!/bin/bash

ID=`cat /tmp/id`
MONITOR_ID=$((ID-1))
ROLE=frontend
REQUIRED_FAILS=5
SLEEP_TIME=5
TIMEOUT_IN_SECONDS=4

missed_pings=0

while (true); do
if [ "$ID" -gt 1 ]; then
ping -c1 $ROLE$MONITOR_ID.internal -W $TIMEOUT_IN_SECONDS >/dev/null 2>&1

if [ "$?" = "0" ]; then
missed_pings=0
else
# Check that frontend1 is accessible
ping -c1 frontend1.internal -W $TIMEOUT_IN_SECONDS >/dev/null 2>&1

if [ "$?" = "0" ]; then
missed_pings=$((missed_pings+1))
fi
fi

if [ "$missed_pings" -gt $REQUIRED_FAILS ]; then
echo "Monitoring as $ID, detected that $ROLE$MONITOR_ID hasn't replied for $((SLEEP_TIME*REQUIRED_FAILS)) seconds, doing fail over"
missed_pings=0

# Destroy instance
echo "Would destroy instance $ROLE$MONITOR_ID"

# Set new ID
echo $MONITOR_ID > /tmp/id
ID=`cat /tmp/id`
MONITOR_ID=$((ID-1))

# Start OpenVPN
if [ "$ID" != "1" ]; then
# OpenVPN seems to have an issue when a client changes its credentials too quickly, work around it
while (true); do
/etc/init.d/openvpn stop
/etc/init.d/openvpn start-client
if [ `grep AUTH_FAILED /var/log/openvpn.log | wc -l` != "0" ]; then
rm -rf /var/log/openvpn.log
echo "Authentication failed, retrying"
sleep 2
else
echo "Client authenticated"
break
fi
done
else
/etc/init.d/openvpn stop
/etc/init.d/openvpn start-server
fi

# Reset monitor data
if [ "$ID" = "1" ]; then
echo "Monitor became $ROLE$ID, stopping monitor role"
exit 0
fi
fi
fi

sleep $SLEEP_TIME
done

Some features are still missing on that script, but it does most of what's mentioned, most noticeably the instance that is not responding will stay alive, which could cause problems, specially for the first servers of each role.

Next: Database high availability

Once we had our base image ready to go we rolled two instances, each one to be shaped to the role it will server: frontend and backend.

Customizing each image was pretty straightforward, install Apache with PHP on the frontend, and MySQL on the backend. Copy the Drupal over to the frontend, set the location of user files to be on the S3 mounted filesystem.

Regarding MySQL we have a simple design, MySQL runs off the local drive and binary logs are written to the S3 mounted filesystem (a different filesystem than that one used for the frontend). At a later stage we will setup database replication across the backend server.

Since database servers are not going to be overwhelmed with requests we are keeping an active-inactive configuration for the database servers rather than go with a more complex design for the databases (KISS!!!)

Next:

  • VPN configuration

We've been working on the base image, a stripped CentOS 5 with some tweaks here and there. We installed a Snort, ClamAV, Tripwire, Syslog-NG, OpenVPN.

For storage we are going to be using Elastic Drive. Elastic Drive allows mounting S3 as a local partition. It is very efficient and allows a number of options to be tweaked to adjust the performance depending on usage.

We have two roles, frontends and backends, in the frontends we have Apache serving Drupal using the backends with MySQL, everything running over an OpenVPN. Multiple instances can run for high availability and load balancing for the frontends and for high availability for the backends (to be expanded to load balancing later).

Within the base image we have created an init script that will mount S3 as well as a script that will create an AMI from the current instance.

# cat create_image 
#!/bin/bash

set -e

if [ "$1" = "" ]; then
echo "Usage: $0 revision"
exit 3
fi

ec2-bundle-vol -d /mnt -k $AWS_PK_PEM -c $AWS_CERT_PEM -u $AWS_USER -r i386 -p centos-$1
ec2-upload-bundle -b baseimage -m /mnt/centos-$1.manifest.xml -a $AWS_AWSID -s $AWS_SECRET
ec2-register baseimage/centos-$1.manifest.xml

What's next:

  • Branch the base image into Frontend and Backend;
  • Create Frontend and Backend groups;
  • Deploy software to each image.

More Articles...

Page 2 of 3

2