One Mom’s Way of Getting Things Done
For work study (or something like that,) I’ve been working on my school’s help desk. I find alot of humor in the way people behave when they feel that they “need” something to get done.
For example, I once had a father call who wanted to pay his son’s tuition online and it “needed” to be done “that day.” Unfortunately, he didn’t know his son’s password and wanted me to reset it for him. I can’t do that per some federal law that governs students’ privacy of their educational records (so I’m told,) so I offered to transfer him to the bursar who probably could have taken payment information over the phone, but he decided instead to mail a check. This was funny to me because the payment could have been made instantly, but he instead chose to deliver the payment through a medium that would take 2-3 days to arrive.
Here’s another example of the way parents try to “get things done” through my school’s help desk. My first phone call of the day:
Me: ITS Help Desk, this is Charles, how may I help you?
Someone’s Mom: Look, I just really need to reset my son’s password. If I’m paying the one paying the fees, wouldn’t it make sense that I have access to his account?
Me: I’m sorry, but we are not able to reset a student’s password for anyone other than the student per federal law.
Someone’s Mom: Fine, I’ll just keep calling back until I get someone to reset it for me. *click*
My second phone call of the day:
Me: ITS Help Desk, this is Charles. How may I help you?
Someone’s Mom: Oops, I got you again. *click*
There are usually only 2-3 of us at any given time and phone calls come in round robin. We sit next to each other. This technique of “getting things done” doesn’t work in this environment.
NESIT Hackerspace Break-In
NESIT, a Connecticut hackerspace, was burglarized Sunday morning between 3:00am and 4:30am and is offering a $500 reward for information leading to the arrest of the burglar. They have released some of their security footage from the burglary. They ask that anyone who recognizes the burglar or has information call them at 1-203-51-HACKS (1-203-514-2257).
Within the first 10 seconds, you see what appears to be a camera in the hallway get covered up by something from behind. Afterwards, you can see someone calmly walking around the inside of the hackerspace for the next hour and a half with a headlamp on. The headlamp turns out to be pretty beneficial for NESIT, because as a Facebook commentator pointed out, you can see that the burglar is roughly as tall as one of the shelves in the space.
What I found most interesting, however, is that around 8 minutes and 30 seconds into the footage you get a pretty nice shot of… cleavage? No, I’m not being a jokester. It appeared to me that the burglar was wearing a low-cut shirt with some type of line up the middle. It looked like cleavage to me and the second opinion of my girlfriend, but NESIT believes that this was actually a shirt that was wrapped around the burglar’s head. UPDATE: After a second, closer look at the surveillance footage, I agree with NESIT.
Despite the security footage being black and white and fairly low in quality, it tells us quite a bit.
- The burglar was familiar enough with the location to know about the hallway surveillance camera.
- The burglar was not familiar enough to know about the other cameras inside the hackerspace.
- The burglar was comfortable enough to spend an hour and a half walking around and looking through boxes and drawers.
- The burglar’s height is roughly as tall as one of the shelves in the hackerspace. UPDATE: I’m told that the height is between 6’0″ and 6’2″.
- The burglar has enough strength to pry off the lock on the door and part of the concrete wall.
I sure hope they catch the bad guy.
With gardening season right around the corner, one of my desires was to set something up that would allow me to take automated, regular snapshots of some of my plants and upload them to flickr. After a few cumulative hours I finally cobbled together the solution.
Taking the Snapshots
The first thing I needed to do was to take snapshots from an installed USB webcam and save them to a directory. This needed to be able to run from a cron script so obviously it needed to work without a GUI and without user-interaction. I read in a Webcam Howto that I could do this using streamer so I installed it and wrote a short shell script that would iterate through the video devices installed on my PC and run the snapshot command. You can view the source of this script here.
Uploading the Photos
Next I wanted to automatically upload the files to Flickr. At first, I tried using a script I found called uploadr.py which worked OK, but I also wanted to add my photos to a specific set which this script didn’t do. I probably could have extended its functionality, but this script didn’t use or implement the full Flickr API which made this task seem unnecessary.
Instead, I downloaded the Python Flickr API from Stuvel and in less than 90 lines I had working code to upload a directory of images to Flickr and add them to a given set. You can view the source to my flickr uploader script here, which I’m calling simpleuploadr.py for now.
Results
Here are my pretty pictures :) My apologies for the quality, I’m using a really cheap webcam.
Yesterday, I wrote a blog post detailing how I crawled an entire MMORPG’s player database via their search page. Since then, I have been analyzing that data in Minitab and trying to gain some insight into the state of affairs of that game. Today, I’m going to attempt to explain some of that data using statistics and common sense. In particular, we’re going to find out if there’s a relationship between when players join the game and when they stop returning.
Preparation
I’m new to the statistics software package I’m using, Minitab, and I’m not aware of an easy way to take measurements based on dates. So, my first order of business was to convert dates in the database to an easier metric for analysis, “days since today,” which is simply today’s date minus date x. I did this in my database (MongoDB) prior to export by adding a “last_seen_days” attribute to all documents (records). This attribute is simply the difference between today’s date and the date that the player stopped logging in – measured in days. I then did the same for the signup date. This was quickly done in the MongoDB console in just a few lines:
> var today = new Date();
> var day = 60*60*24*1000;
> db.accounts.find().forEach(function (o) { o.last_seen_days = Math.ceil((today.getTime() - o.last_seen.getTime())/day); db.accounts.save(o); })
> db.accounts.find().forEach(function (o) { o.date_joined_days = Math.ceil((today.getTime() - o.date_joined.getTime())/day); db.accounts.save(o); })
The Scatterplot
I then exported my data to CSV, loaded it in Minitab, and created a scatterplot between these two attributes. What I got was this:

For the uninitiated, a scatterplot is a quick and easy way to visually see if there’s any type of relationship (correlation) between two variables. In this case, I used the signup date as my independent variable (x) and the “last seen” date as my dependent variable (y). Overall, there is not any real relationship between the signup date and the last seen date. However, there are two significant items in this graph that deserve to have some attention brought to them.
Observations
The first and most obvious item is that there are not any points above the identity function. The identity function, or just f(x) = x, is the diagonal line directly across the center of the graph. This makes perfect sense since it’s impossible for a player to have their “last login” occur before they even sign up. I bring this up because this leads into my next observation:
There is a heavier concentration of data points plotted on or directly below the line of the identity function. For points exactly on the identity function, these are accounts that registered but were never logged into. For accounts below the identity function, these should be considered more significant to those who run the game. Why is that? Because, simply put, I believe that these accounts belong to players who went through the effort of joining; They signed up, validated their email address, logged in, and for whatever reason chose not to stick around. This is akin to the “bounce rate” so frequently mentioned in the context of web analytics.
It’s possible that these new players didn’t understand the interface and left, or maybe they thought the game play was too slow, or maybe… this list could go on. What’s important is that some attention is paid here. Some effort should be made to discover why these players are leaving and the number of these players (or almost-players) should be measured, monitored, and analyzed. Decreasing this metric (“bounce rate”) should be a regular goal as these players represent a potential revenue stream for the game’s owner as well as a potential contribution to the game for the rest of the players.
The Histogram
While, in this case, the scatterplot helped us see that there are a noticeable amount of players who quickly “bounce” after joining the game, this type of graph doesn’t make it particularly easy to measure the magnitude of this phenomena. From observing this behavior, we next want to know how many players are leaving, or what our “bounce rate” is. Instead of first trying to quantitatively define the bounce rate so that we can measure it, it’s probably best if we first take a look at the total distribution of how long players are active for before leaving. For this, we’ll use the histogram of “Days Active”. Days active is simply days since signup minus days since last login.Here’s what we’ve got:

In this histogram, I excluded the lowest rank from being included in the histogram. I did this because I was more interested in how many potentially-active players were leaving, as opposed to junk accounts. As such, our definition of the bounce rate is already becoming more different than the bounce rate in web analytics.
Each bin (“bar”) in our histogram is 15 days wide. Knowing this, you can see from the histogram that the largest density of days active seems to be about from 15 days to 2.5 months. This chunk, while significant, doesn’t have much to do with our bounce rate mentioned above. What we’re instead interested in is the near-5% of players who become inactive in less than a week.
What’s Next?
If this were my game (it’s not), I would work on defining what level of bounce rate is acceptable and set some goals based on that. I would then look into the large amount of players leaving within the first 2.5 months and try to increase player retention. Finally, I would automate these measurements and have them displaced in a nice administrative dashboard (I’ve always wanted one of those) so that I have to see them all the time.
Recently I found myself in a situation where I needed to gather a large amount of data from a website but there did not exist any API, index, or otherwise publicly-accessible map of the data. In fact, the only mechanism for uncovering data to be collected was a very limited search engine.
In particular, I was trying to collect a list of (living, non-banned) usernames from a web-based RPG I play so I could then download, parse, and store their profiles for further analysis. I needed all of the data simply because there also was not any way in which I could get a truly random statistical sample.
The game’s search engine has these limitations and features:
- Search is performed on username only and implicitly places a wildcard after the search. For example, if you search for “bob” not only will “bob” be returned in the results, but also “bob123″ and “bobafett,”
- If a given search returns more than 35 results than only the first 35 results are returned,
- Results are sorted by username (alphabetically),
- Usernames are case-insensitive and can only contain alphanumeric characters, i.e. {ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890},
- Search queries cannot start with the character zero (“0″), but I happily overlook this,
- The search engine does allow you to filter out players who have been killed or banned.
So, there I was, trying to crawl this game’s search feature using urllib and regular expressions. I first tried to search for “A”, then “B”, then ”C”, and so on, but there were some obvious flaws with this method. In particular, because of the limit on the number of results that can be returned, this method would only yield 1,260 usernames. This isn’t good enough because I knew from the game’s statistics page that I should be expecting a little more than 21,000 names!
The logical extension of that search method is to tack on an extra letter. For example, try “AA”, then, “AB”, then “AC”, all the way down to “ZZ” (or, erm, “99″ on this case). This seems alot better because, hypothetically, the keyspace is large enough to return more than twice as many usernames than what I need – I believe the math is [36^2]*35 or 45,360 usernames.
Unfortunately, this method falls apart very quickly because there isn’t an even distribution of usernames across the keyspace. I could try to go one level deeper on the searches (e.g., “AAA” to “AAB”, and so forth) but now we’re looking at 36^3 or 46,656 search pages I have to crawl, so this method is out of the question.
Making matters worse, I am completely naive as to what the distribution of usernames might actually look like.I know what it looks like now, but moving forward I had absolutely no idea what to expect. (Just in case you’re curious, you can see the actual distribution – sans accounts that start with “0″ – below.)

I decided, then, that I would start with “A” to “Z” to “1″ to “9″ and dynamically and recursively expand one level deeper if only 35 results were returned from the search. You can see this dynamic, search unfolding code here on Bitbucket (Python, lines 46 through 65).
The results were pretty positive. I crawled almost the entire set of alive, unbanned accounts in just over 2 hours (while I played video games and drank beer). I missed exactly 356 accounts, or about 1.6% of the population. While some of these may have been accounts that started with the character “0″ (remember, I couldn’t crawl those,) it seems more likely that many of these were aborted HTTP requests that failed and were handled by my ridiculous try/except:pass block.
Now that I have this data, it’s time for me to do something with it. You’ll hear more about that from me soon, I’m sure.
















