Friday, September 16, 2016

Install PowerShell v5 on Windows Server 2008 R2 or 2012 R2

Prior to updating to PowerShell version 5 make sure to be running the latest version of .NET Framework (.NET 4.6.1 as of this writing). See Checking .NET Versions.

Check Current Version of PowerShell
Run either $Host.Version or $PSVersionTable.PSVersion

PowerShell 5 is part of Windows Management Framework (WMF) version 5 and can be downloaded from Microsoft (2008 R2 or 2012 R2). Select the correct download for your version of Windows Server and run the installer. After completion verify the version.

Wednesday, September 14, 2016

Install .NET 4.6.1 on Windows Server 2008 R2 or 2012 R2

Start by downloading .NET 4.6.1 from Microsoft. Launch the installer - I typically do this by running the command
from an elevated PowerShell. Accept the licensing agreement and wait....

Check Version(s) of .NET Installed
From an elevated PowerShell run the following set of commands:

Get-ChildItem 'HKLM:\SOFTWARE\Microsoft\NET Framework Setup\NDP' -recurse |

Get-ItemProperty -name Version,Release -EA 0 |

Where { $_.PSChildName -match '^(?!S)\p{L}'} |

Select PSChildName, Version, Release, @{



      switch($_.Release) {

        378389 { [Version]"4.5" }

        378675 { [Version]"4.5.1" }

        378758 { [Version]"4.5.1" }

        379893 { [Version]"4.5.2" }

        393295 { [Version]"4.6" }

        393297 { [Version]"4.6" }

        394254 { [Version]"4.6.1" }

        394271 { [Version]"4.6.1" }




The result will look something like this.

Thursday, May 14, 2015

Use PowerShell to Save Excel Worksheet as CSV

I'm working on a PowerShell script to launch some virtual servers on VMWare based on input from a CSV file. To create the CSV file I'm using Excel, and I have more than one worksheet in the Excel workbook. Since it's a bit of a hassle to use Excel's GUI to save a given worksheet as a CSV, especially when this has to be done more than once, I set out to find a way to do this programatically. At first I was indifferent on using either a macro within Excel, or a PowerShell method. I decided on PowerShell because I'm already using a PS script to launch the servers, so I decided I would write a block of code to open my spreadsheet and create a CSV from the desired worksheet.

Excel spreadsheet file.
In my quest to figure this out I ran across a lot of pieces, but not a complete solution, and as is often the case on the web I couldn't find explanations of some of the solutions. That's what prompted me to save this all here. I'll step through some of the logic, then post the script in its entirety at the bottom. The best resource I found, although not complete (for my needs), was this great post from LazyWinAdmin.

First thing is to set some variables specific to my environment, the path and name to both my input (xlsx) and output (csv) files, and the name of the tab or worksheet in the spreadsheet.
$Sheet = "CreateVM-CA"
$xls = "C:\Temp\PC\PCTest.xlsx"
$csv = "C:\Temp\PC\$Sheet.csv"

NOTE: Make sure you save your Excel file as this script reads the file from disk & not what's in an open workbook.

Next we will open Excel, the workbook, and the specific worksheet we want to export. I'm also using "Excel.Visible = $False" to prevent the Excel GUI from opening, and "Excel.DisplayAlerts = $False" to prevent messages such as being prompted to overwrite the file if it already exists (see below).
$objExcel = New-Object -ComObject Excel.Application
$objExcel.Visible = $False
$objExcel.DisplayAlerts = $False
$WorkBook = $objExcel.Workbooks.Open($xls)
$WorkSheet = $WorkBook.sheets.item("$Sheet")

Prompted to replace existing file (when not using DisplayAlerts=$False)
A useful note (again thanks to LazyWinAdmin) is that we can verify excel has opened, and we are working with the desired workbook using the following, which lists the available worksheets within the opened workbook.
$WorkBook.sheets | Select-Object -Property Name

Listing Excel worksheets with PowerShell.
Saving the file. When saving the file we will, of course specify that it is a CSV, but also the file type (CSV) is designated with the number 6. For some reason you can't pass the number 6 directly in the SaveAs command, so we are using the variable xlCSV. In my discovery I ran across a list of several file types, but cannot find that now. It's out there somewhere....
$xlCSV = 6

Closing/quitting Excel. Initially I was using just the first command to quit Excel. However, I discovered that it remained open and subsequent executions of my script would fail to write the file because the open Excel process had the file locked. So after a little research and testing I found the second command which kills the process, and prevents the CSV file from being locked.

This little script works beautifully now and does just what I need by opening a specific worksheet from an Excel workbook, then saving it as a CSV file. Here's the entire script.
$Sheet = "CreateVM-CA"
$xls = "C:\Temp\PC\PCTest.xlsx"
$csv = "C:\Temp\PC\$Sheet.csv"
$objExcel = New-Object -ComObject Excel.Application
$objExcel.Visible = $False
$objExcel.DisplayAlerts = $False
$WorkBook = $objExcel.Workbooks.Open($xls)
$WorkSheet = $WorkBook.sheets.item("$Sheet")
$xlCSV = 6

Output CSV file.

Sunday, April 19, 2015

History of Internet Addressing

Volumes have been written about the Internet and its protocols so what I have here is nothing new or ground-breaking. It is, however, a good overview of the history of IP addressing, related Request For Comments (RFC) and other links.

I've been working with the Internet Protocols for a while, including data center builds, network operations and teaching various Cisco (CCNA), Microsoft (MCSE), Novell (CNE), etc. certification and training courses over the years. And although I do know a fair bit about the Internet and IP in particular I actually learned a few new things recently. Or I should say while doing a little research for a new project I had some thoughts gel in my mind which led me to record this information. This will be a useful resource for me, my future students, and hopefully you as well.

If you'd like to cut to the chase scroll down to the summary/timeline at the end.

Origins of The Internet
Host-to-host communications via packet-switched networks was in its infancy in the late 1960's with the Advanced Research Projects Agency Network (ARPANET), from which grew TCP/IP, AKA "The Internet Protocol Suite", with IP and IP addressing at its core. Two key pioneers of the Internet and the godfathers of IP, Bob Kahn and Vint Cerf, had to go through a lot of experimentation and work before finally coming up with what we know today as IPv4. First we'll discuss the IP versions prior to 4, then subsequent enhancements of IPv4 and finally the next generation of IP. The previous versions of IP where part of the initial experimentation. An example of one of these earlier versions is IPv2 as documented in Draft Internetwork Protocol Description Version 2. This and the other preceding versions are listed in table 1 along with the dates they were published.

Table 1
Version  Date
0        March 1977
1        January 1978
2        February 1978 version A
3        February 1978 version B
4        August 1979 version 4

In January of 1980 Jon Postel, who was editor and/or writer of many RFCs, published RFC 760, "DOD STANDARD, INTERNET PROTOCOL," the first IPv4 RFC. At this point it is important to note no concept of address classes, subnetting, CIDR, etc. were defined at this point, therefore they didn't exist. All of those things came along later to enhance the Internet Protocol addresses. So in this original definition IPv4 used the first 8 bits as the network address, and the last 24 bits as the "host" portion of the address, which was termed at the time Source or Destination Local Address (see RFC 760, pages 14 and 22 for exact wording).

In a 2011 interview, Vint Cerf, on of the Internet Founding Fathers said of IPv4, "I thought it was an experiment and I thought that 4.3 billion [IPv4 addresses] would be enough to do an experiment." He said he never thought, "this experiment wouldn't end." So for an "experiment" IPv4 is doing pretty damn good! Albeit with a few enhancements over the next several years.

Looking back to the 1970's computers hadn't yet reached the mainstream so it wasn't imagined by the Pioneers that within a few short years we would see an explosion of networks and devices connecting to the Internet. This is clearly evident in RFC 762 titled "Assigned Numbers," which lists 33 "networks" out of a possible 254 (0 and 255 being reserved). Again, IP address classes hadn't yet been defined.... This RFC was published in January 1980, but it also lists when the 0-3 Assigned IP versions were initially defined, giving us a little more insight into the timeline of IP address versions.

From this we can see that IP addressing began to be defined around March 1977 with IPv0. Then by August 1979 IPv4, the main version on which the Internet was built and is still widely in use today, was defined. Now, over 35 years on, the Internet runs mainly with the "experimental" IP (v4) addressing created by Vint Cerf and Bob Kahn.

Classful IPv4 Addresses
Fairly quickly it was apparent to these Founding Fathers that the intial 8/24 (nnnnnnnn.hhhhhhhh.hhhhhhhh.hhhhhhhh) addressing scheme initially defined by IPv4 needed an enhancement. In September 1981 RFC 791 defined IP address classes in order to "provide for flexibility in assigning address to networks and allow for the large number of small to intermediate sized networks the interpretation of the address field is coded to specify a small number of networks with a large number of host, a moderate number of networks with a moderate number of hosts, and a large number of networks with a small number of hosts." Since this was just an enhancement or further definition of the use of the 32 bit IPv4 address scheme it wasn't necessary to increment the version.

This enhancement to IPv4 is known as classful IP addressing. Table 2 provides a quick view of the classes A, B and C as a reference. Much more about these classes, etc. are available elsewhere on the Internet.

Table 2
Class Leading Bits Network Bits Host Bits Number of networks Addresses per network Start address End address
A 0 8 24 128 (2^7)  16,777,216 (2^24)
10 16 16 16,384 (2^14)  65,536 (2^16)
110 24 8 2,097,152 (2^21)  256 (2^8)
NOTE: classful addressing was superseded by classless (AKA CIDR) addressing. Although this is the case it is common to discuss and have classful addresses referenced in documentation. While this technically shouldn't be done this network engineer hasn't seen much if any decrease in classful address references in my 25 years in the business.

In 1984 another evolution of IPv4 addressing was introduced, Internet Subnets with RFC 917. This explains both the need for and way to use part of the host portion of the address for a subnet, effectively creating something like nnnnnnn.ssssssss.hhhhhhhh.hhhhhhhh, where a "Class A" address is subnetted by taking 8 bits (as an example, more or less could be used of course) of the host address fields to allow for up to 256 subnets. This necessitated the creation of a way for hosts to know which portion of their address was used for the network and subnet vs. the host portion of the address, therefore the birth of the subnet mask. In the case listed above just about everyone familiar with IP addressing will know the mask would be In August 1985 RFC 950, "Internet Standard Subnetting Procedure," further defines subnetting and subnet masks.

NOTE: In several early RFCs you'll see the term "catenet." This depricated term was defined in the early days of packet-switched networking and referred to a network of networks, or an internetwork or internet (lower case "i").

Governing Bodies
By the mid- to late-1980's organizations like the Internet Engineering Task Force (IETF), Internet Assigned Numbers Authority (IANA), its parent Internet Corporation for Assigned Names and Numbers (ICANN), and others were formed to steer, define and coordinate protocols, growth and enhancements of the Internet and Internet Protocols.

Classless IP Addressing - CIDR
In September 1993 IETF introduced Classless Inter-Domain Routing (CIDR) which is a method for allocating IP addresses and routing IP packets. CIDR replaces the previous addressing architecture of classful addresses, with the goal to slow the growth of routing tables on routers across the Internet, and to help slow the rapid exhaustion of IPv4 addresses. Once again the "experimental" IPv4 address space was modified to extend its life. Due to the lack of scalability of classful (A, B, C, etc.) addresses CIDR provides the ability to "subnet" an IPv4 address at virtually any bit value. This is done by appending the number of bits used for the network (technically network/subnet) to the IP address. For example provides for a network with up to 1022 hosts. Much is available on CIDR around the web, including numerous subnetting/CIDR charts, but a few references are Wikipedia's "Classless Inter-Domain Routing", and RFC 1518, "An Architecture for IP Address Allocation with CIDR", and RFC 1519, "Classless Inter-Domain Routing (CIDR): an Address Assignment and Aggregation Strategy."

Private Addresses, NAT & PAT
As early as the 1980's it was apparent that IPv4's 32-bit address space would become exhausted. A variety of factors contributed to this with the main one being the explosive growth of networks and devices worldwide participating in and connecting to the Internet. So in addition to the aforementioned classful, classless and subnetting of IPv4 network address translation (NAT) and port address translation (PAT) were developed. Currently NAT & PAT are widely deployed, so much so that they are the norm. But, these actually break the end-to-end communications originally envisioned with the Internet and introduce other problems or challenges. This is something IPv6 addresses, but let's not get ahead of ourselves. NAT and more specifically PAT  has become a popular and essential tool in conserving global address space allocations in face of IPv4 address exhaustion.

Of course in order to effectively use NAT/PAT a set of private, non-publicly routable IP addresses had to be defined, which was done in February 1996 in RFC 1918, "Address Allocation for Private Internets." (See also, RFC 5735, "Special Use IPv4 Addresses.")

Table 3 - Private IPv4 Address Ranges     -  (10/8 prefix)   -  (172.16/12 prefix)  - (192.168/16 prefix)

This discussion wouldn't be complete without touching on version 5. It was an experimental family of protocols for streaming voice, video, etc. called Internet Streaming Protocol in 1979, but was never fully developed. If you are so inclined see RFC 1190, "Experimental Internet Stream Protocol."

Along with enhancing IPv4, by 1994 IETF began to define the next generation of IP with IPv6, sometimes called IPng. This took several years and in December 1998 RFC 2460, "Internet Protocol, Version 6 (IPv6) Specification" was published. Since IPv6 was defined about 20 years after IPv4 a lot of the former protocol's shortcomings were addressed. In addition to adding native security to IP with IPSec, restoring the end-to-end communications model (doing away with NAT), IPv6 increases the address space. A lot!

By using a 32 bit address IPv4 has a total of about 4.3 billion (2^32) available numbers. IPv6 is 128 bits, which provides a bit over 340 undecillion (2^128) addresses. Table 3 shows just how large this address space is. With such a large address space it is not necessary to define address classes (no "classful") in IPv6, nor is it necessary to use a subnet mask. Rather, since IPv6 was built upon the concept of Classless Inter-Domain Routing IPv6 addresses are written with a trailing /xx (example: 2001:db8:abcd:3f00::/64). In most cases organizations will receive a /64 or /48 address space from either an ISP or IANA, then they will be able to use CIDR to suit their needs.

Table 4
IP4 addresses (2^32)  - 4,294,967,296
IP6 addresses (2^128) - 340,282,366,920,938,463,463,374,607,431,768,211,456

Although IPv4 was an "experimental" addressing scheme born three decades ago it has seen numerous enhancements with classes, subnetting, CIDR and NAT/PAT to extend its lifespan, and it's still going strong. IPv6 has been around for nearly two decades and its use is definitely picking up steam but I believe IPv4 will be around for quite some time. In fact, it's likely the two will run in parallel perhaps for as long as the Internet exists.

  • Mar 1977: IP addressing first defined and documented, starting with IPv0
  • Aug 1979: IPv4 defined (RFC 760)
  • Sep 1981: IPv4 classful addresses defined (RFC 791)
  • Oct 1984: IPv4 subnetting introduced (RFC 917)
  • Aug 1985: IPv4 subnetting further defined (RFC 950)
  • Sep 1993: IPv4 classless (AKA CIDR) addresses defined (RFC 1518, and RFC 1519)
  • Feb 1996  IPv4 private addresses and NAT/PAT  (RFC 1918, and RFC 5735)
  • Dec 1998 IPv6 defined (RFC 2460)

Thursday, March 6, 2014

How Did One Guy Build A World Class Web Infrastructure Serving Over One Billion Calls Per Day?

On Monday, December 15, 2008 I went to work for an online video start-up. On my first day our video players loaded on 384 of our partner's web pages. As you can see from the first chart that was actually a pretty good day for us (it was actually our biggest day to date) as we were averaging less than 100 pages a day. During my interview with the founder and CEO he confidently told me the company would be one of the 1000 busiest sites within one year. I believed him.

Chart 1 - 384 page views on Dec 15, 2008
I had a comfortable job as a Network Operations Director with a mature and stable company, but I felt like I needed a change so I was looking a little for something new. A colleague had recently found a job which had been posted on craigslist.org, so I was trolling IT position postings there. I was frustrated because most jobs wanted specific skills, like Active Directory, Exchange, routers, firewalls, etc. I didn't want to do just one thing or focus on a narrow area. Then I came across a post on craigslist looking for a utility player, one who had varied experience and could wear many hats. It was a match made in heaven. (I later found out this company was frustrated by not being able to find someone with a wide range of experience as all previous applicants had been siloed in specific areas. They had actually given up and pulled the post less than a day after I ran across it.)

Based on the CEO's statement that we were going to become one of the one thousand busiest web services on the Internet I was tasked with building a scale-able system that could grow rapidly (along with all other IT related duties, but that's another story...). Oh, and because we were a start-up funded solely by friends and family of our founder I had an extremely lean budget for equipment, facilities and personnel. Basically the budget was zero.

Admittedly I was a little naive, but I'm also optimistic and very determined. So I set out to do what I was asked.

At the time we were sharing a handful of servers with another start-up in a colo across the country. I had 20 year-old 1U Dell servers, a couple gigabit switches, two entry-level Cisco firewalls, and two low-end load balancers. I quickly put together a fairly lean list of servers and networking equipment I needed and tried to get a few hundred grand to buy that and setup at least one separate location. The answer came back that I couldn't even spend one tenth of what I needed & I had to figure out how to make things work without any capital expenditure.

Then on January 19-24, 2009 while I was trying to figure out how to work miracles we had our first Slashdot effect event when one of our partners had an article containing our player featured on politico.com (note: at the time we were mainly politically oriented, now we are a broad-based news, entertainment and sports organization). We went from averaging less than 100 player loads (AKA page views) per day to over 500,000 in a single day. Needless to say our small company was ecstatic, but I was a bit nervous. While our small infrastructure handled the spike, it did so just barely.

Chart 2 - January 19-24, 2009 Slashdot effect
When I started with this new company was when I was introduced to Amazon Web Services and started dabbling with EC2 and S3 right away. In fact, we started running our corporate website on EC2 a little over a month before I started, and we ran it on the exact same server for just over five years.

Admittedly I was somewhat hesitant to use AWS. First, the concept of every server having a public IP address, then the fact that they didn't have an SLA, and finally the only way to load balance was to build your own with something like HA Proxy on EC2 servers. But the compelling factors, elasticity, pay as you go, no CapEx, etc., were really attractive, especially to someone like me who didn't have any money for equipment, nor could I hire anyone to help build and maintain the infrastructure.

Sometime in the spring of 2009 when AWS announced Elastic Load Balancing I was swayed and fully embraced moving to "the cloud." I started right away copying our (~200 GB) video library and other assets to S3, and started a few EC2 servers on which I started running our web, database and application stacks. By August of 2009 we were serving our entire customer-facing infrastructure on AWS, and averaging a respectable quarter million page views per day. In October of that year we had our second 500,000+ day, and that was happening consistently.

Chart 3 - 2009 Traffic
Through most of 2009 our company had 1 architect, 0 DBA's (so this job defaulted to me), and 1 operations/infrastructure guy (me), and we were outsourcing our development. We finally started to hire a few developers and brought all development in-house, and we hired our first DBA, but it was still a skeleton crew. By the end of that year we were probably running 20-30 EC2 servers, had a couple ELB's, and stored and served (yes, served) static content on S3. Things were doing fairly well and we were handling the growth.

Chart 4 - Explosive Growth in 2010
2010 was a banner year for us. In Q1 we surpassed 1 million, 2 million and even 5 million page views per day. And by Q3 we were regularly hitting 10 million per day. Through it all we leveraged AWS to handle this load, adding EC2 servers, up-sizing servers, etc. And (this is one of my favorite parts) didn't have to do a thing with ELB as AWS scaled that up for us as needed, automatically.

We were still a skeleton crew, but finally had about ten people in the dev, database and operations group(s). Through this all and well beyond we never had more than one DBA, and one operations/infrastructure guy.

I can't say this growth wasn't without pain though. We did have a few times when traffic spikes would unexpectedly hit us, or bottlenecks would expose themselves. But throughout this time we were able to optimize our services making them more efficient, more able to grow and handle load, and even handle more calls per server driving costs (on a per call basis) down considerably. And, yes, we benefited greatly from Amazon's non-stop price reductions. I regularly reported to our CEO and others about how our traffic was growing exponentially but our costs weren't. Win, win, win!

I'm a bit of a data junky and I generate and keep detailed information on number of calls/hits to our infrastructure, amount of data returned per call, and ultimately cost per call. This has enabled me to keep a close eye on performance and costs. And I've been able to document when we've had numerous wins and fails. I've identified when particular deployments have begun making more calls or returning more data usually causing slower performance and always costing more money. I've also been able to identify when we've had big wins by improving performance and saving money.

The main way I've done this is to leverage available CPU capacity when servers have been underutilized on evenings and weekends. Currently on a daily basis I analyze close to 1 billion log lines, effectively for free. This is a high-level analysis looking at things like numbers of particular calls, bandwidth, HTTP responses, browser types, etc.

Starting in 2009 we really started to focus on making our systems more efficient and making them faster, more resilient and more scale-able. And I've been able to measure the results of those efforts and we recorded several wins, each time making our products faster, better and less expensive to deliver.

Chart 5 - More Growth in 2011
2011 was another banner year for us and we crossed the 20 million and 30 million page views per day thresholds. When our video products load on a given page as many as 20 calls are made to both static and dynamic content, roughly half of each type. All the static files (HTML, CSS, JS, images, video, etc.) are served through CDN's. But all the dynamic calls (embed, player services and analytics) are served through EC2 servers behind Elastic Load Balancers. And these are where I think we really shine. These are the services where we've really fine tuned their performance mentioned above.

Chart 6 - Continued Growth in 2012 and 2013
In 2012 and 2013 we saw more growth hitting as many 78 million page views in a single day, and at present an on average day our products load on 60 million pages across the web. This translates to about 500 million calls to static content served through CDN’s, and another 500 million daily calls to our web services (chart 7 shows four of our busiest web services, but not all of them) powered by web and database servers running in EC2 behind Elastic Load Balancers. ½ a billion dynamic services calls per day. Rather impressive!

Chart 07 - AWS CloudWatch Stats Showing Over 400,000 Calls Per Day
Not only have we been able to leverage the zero CapEx, low OpEx, high availability and scalability of AWS, but we were able to build all this with a very small team. In the fall of 2012 we had a couple of nearly 80 million page view days & at that time we had less than 10 people in the dev, database and operations groups (Note: to that point we never had more than 1 DBA and 1 network operations guy). Since I was the operations “group” up until that time I am blown away that we could build a world-class infrastructure serving at the scale we do with such a small crew. I believe it’s unheard of to build and run a system like ours with only 1 operations guy, and I know that wouldn't have been possible without AWS.

AWS Certified Solutions Architect

Although I've been working with Amazon Web Services for a few years now I only recently learned about AWS certifications. So today I went to an authorized testing center (for the first time in several years) and took a certification exam. And didn't do too bad.

Back in the day I took dozens of exams for Novell's CNE, Microsoft's MCSE, Citrix, and Cisco, so I wasn't too nervous about taking another one. All-in-all the AWS certification was fairly straight forward and comprehensive. The exam guide indicates that it covers topics from EC2, to S3, to RDS, ELB, CloudFront, CloudFormation, etc. And that's true. Study up and get certified.

Now I can officially use the AWS Certified Solutions Architect - Associate logo (above).

Tuesday, February 4, 2014

My EC2 Server That Ran For Over 5 Years

Today I observed a moment of silence as I shutdown our longest running Amazon Web Services EC2 instance. We started this instance on the evening of October 29, 2008 when we were first getting acquainted with AWS, and stopped it today, February 4, 2014. That's 5 years, 3 months, 6 days.

Tuesday, November 19, 2013

Crunching Billions of Log Lines With LogParser

Yesterday three different people came to me asking what type and how many browsers are our "users" using to access our content. Since our products load on about 50 - 80 million pages a day we have a pretty good sample size. And a hell of a lot of log files to collect and analyze. Since I already summarize this info on each server daily it was rather simple to gather these summaries and tally them for the month.

These calls are from one of our services that handles an average of about 50,000,000 calls per day. Of course these calls are spread across several web servers, so I took the daily summary from each server & combined them per day, then took these daily summaries and combined them for last month (October 2013). In total this represents only about 5-8% of our total daily calls, but this particular service is the first called by many of our products so it is the best place from which to gather info like user agent distribution.

UserAgent UAHits UAPercent
Internet Explorer 516,408,427 34.05%
Chrome 318,859,924 21.02%
Firefox 262,120,296 17.28%
Apple IOS 165,269,836 10.90%
Safari 136,577,103 9.01%
Android 77,221,373 5.09%
Other Mobile 10,372,620 0.68%
Other User Agents 9,694,239 0.64%
Search Bot 5,097,159 0.34%
Opera 3,938,622 0.26%
Monitoring 5,527,439 0.36%
IEMobile 2,698,171 0.18%
BlackBerry 1,161,637 0.08%
No User Agent 1,119,479 0.07%
Monitoring2 307,882 0.02%
Gaming Device 152,633 0.01%
CMS 44,012 0.00%
wget 6,034 0.00%
curl 1,784 0.00%

Total Hits 1,516,579,920

For our users Internet Explorer reigns supreme, but the percentage of hits for IE is down quite a bit from my sample last spring where it represented just over 38% of the total. Since then IOS slipped a little from 11.3% to 10.9%, and Android rose from 3.12% to 5.09%. In total nearly 17% of our users access our content with "mobile" devices (includes phones, tablets, etc.). I suspect this is a little lower than the average of some corners of the Internet, but since the majority of our users access our content during the day on weekdays (makes me question their productivity while at work....) it's no surprise it's fairly low & desktop browsers are higher.

I've written much about my love affair with Microsoft's logparser, and that love continues. All of these over 1.5 billion log lines were crunched with logparser using only the servers' CPU's which served the content and are running 24/7 anyway. The bottom line is this info was gathered (along with a lot of other info) for free! That's right, instead of spending thousands or even tens of thousands of dollars for fancy third party log analyzing tools I've leveraged available CPU time, free software (logparser itself) and a little ingenuity to collect a great deal of useful information from a rather large amount of log files.

Friday, May 17, 2013

Improving Web Services (Part II)

Last year I wrote about how we improved web server performance with some fairly small changes, and about how I keep an eye on these metrics with reports I create analyzing my web server logs with Microsoft's LogParser. This is a follow-up to those.

Recently we did an upgrade to our platform. One of the "improvements" was our amazing DBA (he is truly amazing!) did was to tighten up some of the SQL stored procedures used for returning dynamic data to our video players (playlists, video meta data, etc.). These "player services" get hit around 300,000,000 to 400,000,000 times per day, so even a small improvement can have far-reaching impact.

As I'm sure is common across regions of the web traffic is lower at certain times. Ours is no different, so I leverage lower CPU load in the middle of the night to crunch my web server logs across my fleet of web servers. As this RRD Tool graph shows CPU load is considerably lower overnight, except when the server is processing its own log file analysis. Which takes about an hour or so on each server. It's also worth noting that average response times are not negatively affected during this time - I know, I keep a close eye on that!

Among the various pieces of data gleaned by this log processing is the time (in milliseconds) each response takes, as recorded by the server. This is very valuable information to me as I can definitively know impacts of various factors; like systems deployments (such as the one that spurred this post...), performance under various load conditions (peak times vs. slow times), performance during operations or maintenance windows (crunching logs, system updates, patches, etc.), and last but not least when people come to me saying anecdotally  "customers are saying our system is slow..." I can show them with absolute certainty, both historically and at any point in time (I have some really good methods of running ad hoc reports to get up-to-the minute stats), how our system is performing or has performed.

So any time we roll out a change of any kind I look at the data to understand the impact(s), if any. After this deployment of the new and improved SQL stored procedures I'm seeing approximately a 30% decrease in response times. That's a huge improvement!

Besides loading faster (at the client side) this is also causing a noticeably lower load on both the front end web servers and database servers. Therefore we have more available capacity or head room with the same number of servers, or I could potentially shut down some of our AWS EC2 servers saving money. Now we have set the bar even higher for performance of our systems, and any future "improvements" or modifications can be accurately measured against this.

I love the fact that I have such good insight into these systems and can measure any impact of changes or varying load with great accuracy!

Sunday, March 31, 2013

Using Log Parser to Extract User Agents From Web Log Files

I've been meaning to write a follow-up (or two...) to my I LOVE LogParser post from a few months ago. The time has finally arrived.

Every day I collect and analyze (at least at a high level) somewhere around 1/2 a billion (yes, billion) web server log lines. And about that many more from a couple CDNs. Needless to say that's quite a bit of information. Besides the mundane stuff like number of hits, bandwidth served, hits per file type, etc. I've recently buckled down and written a few pretty good scripts with LogParser to extract and count User Agents.

I know even this isn't all that sexy or sophisticated, and that numerous companies selling analytic have already solved this, but, since I have the data right at my finger tips why should I pay someone else tens of thousands of dollars to analyze my logs and give me a pretty web interface. Yeah, I'll admit that would be nice, but for what I'm after I'm able to get what I need with just a little elbow grease.

This pursuit actually began several months ago when my boss came to me and asked how many and what types of Android devices were hitting our services. Since our product runs on numerous sites around the web we get all kinds of traffic. And, of course, many people (our partners, etc.) all say your product has to run on this Android device, or that Android device. But with so many out there all running so many different OS versions it's crazy. This image (from this phandroid.com article) shows it quite well.

Figure 1 - Android Fragmentation.
At this point I must give credit where credit is due. The LogParser queries below are by no means unique, nor are they original. The best one I found was a little dated. So I took it, made it a little prettier, and adapted it for three distinct outputs. First, is my daily User Agents Summary report (below). This is a high level report showing us the type and distribution of browsers, which are hitting our sites. While others publish  similar information regularly this applies directly to us, to our products, which gives us good, reliable information we can both use to convey to our partners, but also so we know where to focus time and energy on development and QA resources.

The numbers in this summary report and others in this post come from a single web server (I have lots more) for one day (March 28, 2013 UTC). So, this is current as of this posting. (See below for the exact LogParser query I'm using for this summary report.)

UserAgent UAHits UAPercent
Internet Explorer 2,611,065 38.03%
Chrome 1,322,187 19.26%
Firefox 1,241,372 18.08%
Apple IOS 775,693 11.30%
Safari 620,618 9.04%
Android 214,479 3.12%
Other Mobile 25,171 0.37%
Opera 19,262 0.28%
Other User Agents 15,342 0.22%
IEMobile 6,188 0.09%
No User Agent 5,061 0.07%
BlackBerry 4,481 0.07%
Search Bot 3,407 0.05%
Gaming Device 590 0.01%

This summary is great and very useful, but we certainly need some detail. Since the detail report is over 100 lines long I'm only going to show about a dozen lines here. Again, this detail is great for us so we know which versions of which browsers are being used to access our content at any given point.

UserAgent UAHits UAPercent
IE 9 1,170,034 16.99
Firefox 19 968,039 14.06
Chrome 25 846,618 12.30
IE 8 815,316 11.84
Safari 5 622,317 9.04
iPad 553,625 8.04
IE 10 331,290 4.81
Chrome 26 289,195 4.20
IE 7 286,986 4.17
iPhone 234,653 3.41
Android 4.1 72,227 1.05
Android 2 64,688 0.94
Android 4.0 54,877 0.80

Finally, the thing I was really after - what type of Android devices are being used to access our content? Just like the detail report this is only a partial list.

UserAgent UAHits PercentOfAndroid
Other Android Devices 52,470 24.11
Samsung Galaxy S III 25,903 11.90
Motorola Droid RAZR 16,558 7.61
Samsung Galaxy Tab 12,285 5.64
Samsung Galaxy S II 9,589 4.41
Samsung Galaxy Nexus 7 6,772 3.11
Samsung Galaxy Nexus 6,458 2.97
Samsung Galaxy Note II 6,270 2.88
HTC EVO 6,177 2.84
B&N Nook 4,271 1.96
Motorola Droid Bionic 3,736 1.72
Asus Transformer Pad 3,279 1.51
Motorola Droid RAZR M 3,140 1.44
Motorola Droid X 2,479 1.14
Motorola Droid 4 2,308 1.06
Motorola Xoom 2,278 1.05
Kindle Fire 1,498 0.69
Kindle Fire HD 728 0.33

It's no surprise that there are dozens and dozens of Android devices that are all browsing the web and hitting sites like ours. One little surprise is that Barnes and Nobel's Nook registered higher than the Kindle Fire. So many devices so little time.

Here's the Log Parser query I'm using for the User Agent summary (above).

select case strcnt(cs(user-agent),'Android') when 1 THEN 'Android'
else case strcnt(cs(user-agent),'BlackBerry') when 1 THEN 'BlackBerry'
else case strcnt(cs(user-agent),'iPad') when 1 THEN 'Apple IOS' when 2 THEN 'Apple IOS'
else case strcnt(cs(user-agent),'iPhone') when 1 THEN 'Apple IOS' when 2 THEN 'Apple IOS'
else case strcnt(cs(user-agent),'iPod') when 1 THEN 'Apple IOS' when 2 THEN 'Apple IOS'
else case strcnt(cs(user-agent),'Opera') when 1 THEN 'Opera'
else case strcnt(cs(user-agent),'Chrome') when 1 THEN 'Chrome'
else case strcnt(cs(user-agent),'Safari') when 1 THEN 'Safari'
else case strcnt(cs(user-agent),'IEMobile') when 1 THEN 'IEMobile'
else case strcnt(cs(user-agent),'MSIE') when 1 THEN 'Internet Explorer'
else case strcnt(cs(user-agent),'Firefox') when 1 THEN 'Firefox'
else case strcnt(cs(user-agent),'Googlebot') when 1 THEN 'Search Bot' when 2 THEN 'Search Bot'
else case strcnt(cs(user-agent),'Yahoo!+Slurp') when 1 THEN 'Search Bot' when 2 THEN 'Search Bot'
else case strcnt(cs(user-agent),'bingbot') when 1 THEN 'Search Bot' when 2 THEN 'Search Bot'
else case strcnt(cs(user-agent),'Yandex') when 1 THEN 'Search Bot' when 2 THEN 'Search Bot'
else case strcnt(cs(user-agent),'Baiduspider') when 1 THEN 'Search Bot' when 2 THEN 'Search Bot'
else case strcnt(cs(user-agent),'loc.gov') when 1 THEN 'Search Bot' when 2 THEN 'Search Bot'
else case strcnt(cs(user-agent),'crawler@alexa.com') when 1 THEN 'Search Bot' when 2 THEN 'Search Bot'
else case strcnt(cs(user-agent),'Mobile') when 1 THEN 'Other Mobile'
else case strcnt(cs(user-agent),'PlayStation') when 1 THEN 'Gaming Device'
else case strcnt(cs(user-agent),'Nintendo') when 1 THEN 'Gaming Device'
else case strcnt(cs(user-agent),'curl') when 1 THEN 'curl'
else case strcnt(cs(user-agent),'wget') when 1 THEN 'wget'
else case strcnt(cs(user-agent),'-') when 1 THEN 'No User Agent'
ELSE 'Other User Agents' End End End End End End End End End End End End End End End End End End End End End End End
AS UserAgent, count(cs(User-Agent)) AS UAHits, MUL(PROPCOUNT(*),100) AS UAPercent
INTO D:\Reports\UserAgent_Summary.csv
FROM D:\Logs\.log