Charles Hooper

Thoughts and projects from an infrastructure engineer

What the Hell? Same-operand Comparisons on Intel Architecture

I was reading The Shellcoder’s Handbook and, in chapter one, the author discusses how being able to recognize C-like language constructs in assembly code is an important skill.

There was one example which threw me for a loop and I just couldn’t seem to understand what was going on. This was despite all the reading that I could including the references for the or, cmp, jxx, and even test instructions.

So I started writing this blog post and, by the time I finished, I think I understand it now! I decided to publish it in case that it helps someone else out orrrr I got it all wrong and you can tell me :)

Okay, so, here’s the C code that the book wanted me to recognize:

1
2
3
4
5
int number;
if (number < 0)
{
// do something
}

The IA32 assembly that the book translated this to was:

1
2
3
4
5
6
7
number dw 0
mv eax, number
or eax, eax
jge label
; code for 'no' condition
label:
; code for 'yes' condition'

And what I was really stuck on were the two lines:

1
2
or eax, eax
jge label

I knew that, following the or instruction, the value of eax was not going to change. I also knew that this instruction was manipulating the EFLAGS register and that the jge instruction was evaluating those flags. What I didn’t understand was how this register was being manipulated nor evaluated.

I started by reading about the or instruction. According to a logical inclusive OR reference page, “the OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result.” But what did “according to the result” even mean?

I looked over the docs for conditional jumps and the cmp instruction. I knew alot more than I did before reading these but it still didn’t answer my question of how the or flag manipulates these flags or how jge evaluates them.

In reading these references, however, I saw that the cmp instruction worked by subtracting the operands and setting the affected flags while discarding the result. I felt like I was getting closer but I still didn’t know exactly how these flags were evaluated.

Finally, I noticed on the cmp reference page that it referred to an appendix to the Intel software developer’s manual. This appendix was titled EFLAGS Condition Codes. This sounded promising!

Googling it led me to a PDF of the chapter EFLAGS Cross-Reference and Condition Codes which finally cleared things up for me.

So, what the logical or comparison does is:

  • Sets the zero flag (ZF) high if the result of the subtraction operation is zero (i.e. the operands are the same) – in our case, the operands are the same so we can ignore this flag
  • Sets the parity flag (PF) high if there is an even number of bits sets – we can ignore this too (to be honest, I don’t know why this would be important. I guess that’s a future blog post)
  • Resets the overflow (OF) and carry (CF) flags so we can ignore these too in our case (sort of)
  • Sets the sign flag (SF) high if the result of the comparison is negative (aka the most significant bit is high) is high

The other thing that reading about the EFLAGS cross-reference and condition codes taught me was that the jge instruction was evaluating the expression (SF xOR OF) = 0.

This finally made sense, I think! Based on number:

  • the SF flag is set high (1) if number is negative or low (0) if number is positive
  • the OF flag is always reset to zero

Given that, we can look at our code again and work through some examples:

Say we let number = -10 here. We wind up with:

1
2
3
mv eax, number  ; eax = -10
or eax, eax     ; eax = -10. SF=1, OF=0
jge label       ; (SF xOR OF) = 0? => (1 xOR 0) = 1 => jump!

Or let number = 10 this time:

1
2
3
mv eax, number  ; eax = 10
or eax, eax     ; eax = 10. SF=0, OF=0
jge label       ; (SF xOR OF) = 0? => (0 xOR 0) = 0 => don't jump

Finally, let’s test our boundary, zero:

1
2
3
mv eax, number  ; eax = 0
or eax, eax     ; eax = 0. SF=0, OF=0
jge label       ; (SF xOR OF) = 0? => (0 xOR 0) = 0 => don't jump

Does this make enough sense? Is it even correct? If not, please let me know!

Getting Started With CTFs

Something exciting that’s happened recently is that I transferred to a security team at work!

I’ve always been interested in security. In fact, my first “tech company” job was at a company that made security appliances and I felt really lucky to have been there.

Despite that job, having spent a lot of time in the past researching vulnerabilities in popular web applications, and doing a couple of bug bounties, I never really considered myself a “security person” so this is an exciting change for me!

Recently, my new team and I were discussing CTF (Capture the Flag) events. I have never done one before so I was really curious about how to get started.

My teammates pointed me to a few resources which I thought I’d share with you all because maybe you’d like to get involved in doing CTFs too?

Wait, what is a CTF?

Capture the Flag events are a kind of security competition. There are different popular formats and I think CTFtime.org explains the different types of CTF events rather well.

Buuuttttt, to summarize:

  • Jeopardy-style events are where you/your team are gievn a variety of tasks in different categories and you work to solve these tasks. The more tasks you solve, the more points you get

  • Attack-defense style events are essentially wargames where you are given infrastructure (your own network or your own host) running vulnerable services and you gain points both by attacking other teams and defending against them

  • And mixed events are, well, a combination of the two!

How can I get involved in CTFs?

If you’re in college, check out the National Collegiate Cyber Defense Competiton which is a great way to practice your new skills by defending against real (volunteer) attackers in a safe environment.

Whether you’re in college or not, you can also get involved with the large number of CTF events published on the CTFtime events list. In this list, they advertise jeopardy, attack-defense, and mixed style events.

Okay, but how can I practice for these events ahead of time?

If you’d like to practice your skills on your own time and before doing a CTF (like me!), there are different resources for that as well!

For example, lots of CTFs publish their challenges after the competition. This means that you can find an old CTF event and work through the tasks on your own. Personally, I’ll be starting with the challenges on Square’s CTF page and picoCTF.

There are also resources out there for practicing the specific types of challenges you might encounter. For example, if you’re interested in getting started with reverse engineering, you can practice on “crack mes” where you’re given a binary and you have to crack it. Material for practicing the other types of challenges are out there too but you’ll have to find those resources on your own.

Wrap up

That’s all for this post. I hope this post is helpful for people wanting to get started competing in CTF events and hopefully I’ll see you out there!

It’s All Fun and Games Until Someone Loses a Life

Two weeks ago, I uninstalled every game I own from my computers and I’m SUPER glad I did.

For a long time, I thought computer games were really fun. I’ve been playing then ever since I was a single digit in age when my Dad bought a CD-ROM drive and SoundBlaster 16 sound card. The CD-ROM drive came with a number of CDs which included evaluation versions of Myst and the original Doom.

These games were like “wow!” They became even more interesting when we could dial into a friend’s game and play deathmatch or cooperatively. This was a lot of fun but it took some coordination since you needed to have a dedicated group of local people who could play at the same time as you. Also I was probably eight at the time so these were all of my Dad’s friends.

As I grew older, the Internet became more ubiquitous even if it was only dial-up at the time. We still had to play with folks who were local, due to latency considerations, but now we could play on dedicated servers (well, sort of) with complete strangers. Thanks to software like GameSpy, we could discover the servers closest to us. Despite the ease of discoverability, the latency restriction and relatively small number of dedicated servers meant we had stronger communities. We all knew each other and, in fact, some communities had entire websites, forums, and other BBS-like functionality built around their respective games.

This pattern would continue for probably another ten years until broadband became commonplace and games began imposing the “matchmaking” pattern. I think that maybe the people who produced and sold games wanted to make it easier for casual or new gamers to get started in multiplayer, which is totally valid, but I feel like this pattern completely killed the community aspect. At least for me.

Still though, I continued to play games regularly. Any time I got a free moment and felt like I had nothing else I needed to do, I’d jump into a game of some kind. If I had a short period of time available to me, I’d hop into a fast-paced FPS (or, eventually, Rocket League). If I had an entire day available, well hello Civilization and Eve Online!

Then, one day maybe two years ago now, Erica asked me to explain what I enjoyed about gaming. I had told her that I enjoyed strategy games like Civ and Eve Online because they were mentally stimulating and FPSes because they’re exciting and distracting. I could describe the precise sensations I got from each game I played but there was something off. The problem was that I didn’t think any of these things were fun! Not anymore, anyway.

Still, that didn’t stop me from playing.

Not until we bought a house and had a lot of things put into perspective. It turns out that, when you buy a house that needs work, literally everything is more important or higher priority than playing video games. I thought I could balance gaming with important things and even other hobbies but I had already failed at that for literally twenty years!

On top of that, for the seventh year in a row I caught myself wishing I wrote more, read more, did more archery, learned some engineering topics really well, started a business, and literally everything else besides playing games.

A friend of mine, Russell, has a side business he started despite having two kids and a demanding full time job. It was the day that I caught myself telling him how jealous I was that I realized my priorities were skewed.

I decided right then that, as an experiment, I would uninstall every single PC game I owned. I reasoned that, since I use Steam, I could recover all of the games if I ever needed to but, since my Internet connectivity is so bad right now, it would take me a solid day just to get a single game reinstalled. That sounded like the perfect barrier to entry! My hypothesis was that, with the barrier to gaming being so high now, that I would spend my free time doing literally anything else more productive.

And, guess what? I was right.

It’s only been about two weeks but I’ve been writing about 500 words per day. This isn’t a lot, mind you, but it’s way better than the one thousand words I wrote in all of 2014! I’ve also been reading. I’m still reading Stranger in a Strange Land which I think is a beautiful book but I’ve been “reading it” for probably an entire year now! And, finally, I’ve been working around the house and the yard. I recently built Erica and I a compost bin and we’re getting our new garden ready for our first Spring in our new home.

I know that not everyone has the same priorization problems as I do and that some people genuinely find gaming fun but I don’t have an ounce of regret about ditching gaming for good.

Network Imagineering

If I made a thousand dollars each time I dreamed of starting an ISP, I would have a couple thousand dollars and be a hundred thousand dollars in debt.

I go through this exercise, planning and designing an ISP, every few years and each time I find that it’s untenable for one reason or another. It turns out that starting an ISP is a CapEx-heavy venture, usually with shitty margins and are slow-starting to boot. For the jargon-shy, CapEx is short for “capital expenditure” and denotes spending a lot of money up front as opposed to paying some money e.g. each month as part of operations.

On top of the heavy up-front investment, your local “wholesale” provider of Internet access tends to hold a monopoly and are also in the retail business so, if you do decide to start your own regional ISP, you’re literally buying service from your competition.

After the military, I started my systems engineering career at a small local ISP as a network engineer. I did a lot of “sysadmin”-type work but I also spent half my time logged into Cisco equipment and even became a Cisco-certified network associate (CCNA).

It was so much fun! I actually really liked it and, for a few years at least, I thought I would pursue a career as a network engineer. I considered getting my CCNP (the next level past CCNA) and onward but eventually lost focus when my career took a turn down the Linux systems engineering path.

In any event, the company I worked for was headquartered across from AT&T which is who we bought all of our connectivity solutions from. You know, our provider as well as our competition. At the time, our alleged value-add was that our connectivity was “managed” which to this day I’m still not really sure what that means.

But today I found myself having this dream again. You see, yesterday morning I woke up at 3AM and my Internet connectivity was totally shitty. I was experiencing over 40% packet loss and I was furious! I managed to find my ISPs number but learned that their tech support, who consists of a single person, didn’t open until 8AM. I patiently waited until 8AM to call, setting up a Raspberry Pi with Smokeping as a monitoring solution and, right at 7:55AM, my Internet connectivity recovered. It was as if someone rolled into the office and rebooted a router.

Having been in the shoes as the early-morning office arriver, this was a totally plausible scenario. But I was mad! I recently made the decision to move to nearly the middle of nowhere, so shitty Internet was always in the cards, but as an engineer who works remotely crappy Internet is totally unacceptable! So I found myself, again, wondering which obstacles stood in my way of starting my own ISP.

I found this amazing blog series called Tales from the tower and, while the author is a little ranty at times, it’s generally informative and engaging. My general impression from reading so far is that it might be somewhat straightforward (which is not the same as “easy”) to start up a small WISP but radio (RF) engineering has a lot more influence on technical success than network engineering. Additionally, equipment has come way down in price making the “heavy CapEx problem” much more manageable.

While I have no idea what’s next, I hope to continue exploring this idea, even if I never take it further. I hope that it will be interesting for you to read (and for me to document!) how I approach the problem of providing respectable Internet connectivity in the middle of nowhere.

Hsleep: `sleep` With a Countdown

Today I’m open sourcing hsleep. hsleep is a utility which behaves just like GNU sleep(1) in coreutils – and its BSD counterpart – with the addition of a countdown timer which is emitted to standard error.

hsleep counting down

I wrote hsleep because I sometimes find myself needing to delay commands for a few minutes and I couldn’t stand not knowing how much time is left!

hsleep is available on github or – if you have go installed – can be installed with:

1
go install github.com/chooper/hsleep

I Have a New Home and New Job!

Almost three years ago, I wrote a post about moving to San Francisco and was happier than a pig in shit. Well, today I’m even more excited to announce that I’ve moved to Oregon!

When I was between the ages and 12 and 14, my parents moved us to a very small farm and I absolutely loved it. I had lots of space to myself, fresh air, and animals to care for. Unfortunately, it wasn’t long before my parents lost the farm and we ended up moving back to the suburbs.

I went the rest of my life (so far :)) reminiscing about that farm and, late last year, Erica and I started at homes. We started in the Petaluma area but we found it a bit too expensive for what we were looking for. We gradually continued our search further and further North until we found the Grants Pass/Medford/Ashland, Oregon area.

We put an offer on our house in December and we finally moved in at the beginning of February. It’s been a month since then and we still wake up every day and look out across the valley to exclaim “wow, I still can’t believe we live here!”

View from home

Around the time we put an offer on this house, I also changed jobs. I ultimately wound up at Stripe and, if you’re interested in solving challenging problems, you should apply to come work for us :)

I’m currently assigned to the Systems team as a Site Reliability Engineer working on how Stripe’s engineering teams reliably run and consume services at scale.

Anyway, these last three months have been amazing! I’m looking forward to seeing what the next three bring.

Briefly: Operator Requirements

On any given day, there are a number of people discussing user requirements and prioritizing the work ahead of them based on them. There’s an oft-underrepresented group of users however and those are your operators. Typically, the set of things needed by your operators are buried in your project’s list of “non-functional requirements”, if at all.

In this brief, I would like to provide you with a de facto set of “operator requirements” for your project. This list is likely incomplete and I’m discovering more every day. I may update this post from time to time to add things or clarify them as I journey towards understanding.

An application that satisfies these requirements will be more scalable, easier to operate, and likely have a lower Mean Time To Recovery than an application that does not.

  1. In general you should strive to adhere to 12factor if you’re building a web application. 12factor creates a clean contract between your application and the operating system, enables simpler deployments, and results in applications that are mostly horizontally scalable by default. If you cannot adhere to 12factor, then I would challenge you to borrow as much of it as you can before discounting the whole 12factor methodology.

  2. Your application should have plenty of logging and follow best practices.

  3. Your application should also emit metrics that create some sense of understanding of what the system is doing.

  4. Your application’s services should have health checks. The health checks should return HTTP 2xx or 3xx when the service is healthy and HTTP 5xx when it is not. The response body should contain an explanation or identifier that will allow the operator to determine why the health check failed to aid in incident recovery.

  5. Your application should use unique request IDs and add them to their logging contexts (see logging).

  6. Your application should support credential rotation. Any given secret, whether it’s a password, API key, SSL private key, or otherwise, should be changeable with minimal disruption to the service. This should be exercised often to ensure it works as designed.

  7. Your application should provide operators with toggles or feature flags — parameters that allow the operators or the system itself to turn off bits of functionality when the system is degraded.

  8. Your application should put external resources behind circuit breakers. Circuit breakers allow your app to continue operating (albeit in a degraded state) when an external resource is unavailable instead of taking your application offline.

  9. Your application should be disposable and restartable; this means that it’s restartable on the same instance or a new instance after a crash and should crash in an automatically recoverable state. If your crash is not automatically recoverable, it should scream! In addition, your application should gracefully complete existing work such as HTTP requests or jobs it picked up from a task queue. In the case of long running jobs, your application should be able to abandon the work to have it picked up by another worker or node.

These are just a start but these requirements should be imported into your project’s requirements and prioritized with maintainability in mind. By doing so, your application will be more scalable, easier to operate, and have a lower Mean Time To Recovery than an application that don’t satisfy these requirements.

Do you feel like I missed anything? What else would you recommend?

Briefly: Health Checks

Health checks are specially defined endpoints or routes in your application that allow external monitors to determine the health of your web application. They are so important to production health that I consider them the “13th factor” in 12factor.

If an application is healthy it will return a HTTP 2xx or 3xx status code and when it is not it will return an HTTP 5xx status code.

This type of output allows load balancers to remove unhealthy instances from its rotation but can also be used to alert an operator or even automatically replace the instance.

In order to implement proper health checks, your application’s health checks should:

  1. Return a HTTP 2xx or 3xx status code when healthy

  2. Return a HTTP 5xx status code when not healthy

  3. Include the reason why the check failed in the response body

  4. Log the requests and their results along with Request IDs

  5. Not have any side effects

  6. Be lightweight and fast

If you implement health checks in your application following this advice, you’ll have a more resilient, monitorable, and manageable application.

How about you all? Is there anything you would add?

Briefly: Logs

Recently I was asked by another engineer what information I expect to be able to find in logs. For this, I mostly agree with Splunk’s best practices but I have some additional advice I want to provide. I’ll end up regurgitating some of Splunk’s recommendations anyway.

  1. Your logs should be human readable. This means logging in text (no binary logging) and in a format that can be read by angry humans. Splunk recommends key-value pairs (e.g. at=response code=200 bytes=1024) since it makes Splunking easy, but I don’t have a strong enough opinion to evangelize that. Some folks advocate for logging in JSON but I don’t actually find JSON to be very readable.

    Edit: Someone pointed out to me that this isn’t ideal when you have a large amount of logs. They prefered sending JSON logs to a service like ElasticSearch but I think also sending key-value pairs to Splunk is also reasonable at some scale.

  2. Every log line should include a timestamp. The timestamp should be human readable and in a standard format such as RFC 3339/ISO 8601. Finally, even though the above specs include a timezone offset, timestamps should be stated in UTC time whenever possible.

  3. Every log line should include a unique identifier for the work being performed. In web applications and APIs, for example, this would be a request ID. The combination of a unique ID and timestamp allows for developers and operators to trace the execution of a single work unit.

  4. More is more. While I don’t particularly enjoy reading logs, I have always been more happy when an application logs more information than I need versus when an application doesn’t log enough information. Be verbose and log everything.

  5. Make understanding the code path of a work unit easy. This means logging file names, class names, function or method names, and so on. When sensible, include the arguments to these things as well.

  6. Use one line per event. Multi-line events are bad because they are difficult to grep or Splunk. Keep everything on one log line but feel free to log additional events. An exception to this rule might be tracebacks (see what I did there?)

  7. Log to stdout if you’re following 12factor otherwise log to syslog. Do not write your own log files! By writing your own log files, you are either taking log rotation off the table or signing yourself up to support exciting requirements like re-opening logs on SIGHUP (let’s not go there).

  8. Last but not least: Don’t write your own logging library! Chances are there already exists a well thought-out and standard library available in your application’s language or framework. Please use it!

So those are my recommendations about logs. What else would you recommend?

I Have a New Job at Truss!

Two weeks ago I started a new job at Truss after leaving Heroku two months ago.

Working at Heroku was an amazing experience in many ways. I achieved the highest level of work-life balance so far in my life, I had great coaches, and I solved a lot of challenging and interesting problems.

But it’s time to move on so after a month and half of downtime I’ve joined Truss as an operations engineer.

I joined Truss for a number of reasons:

  1. I wanted to consult again; consultants are given more ownership of the problems they are tasked with solving and there’s always something new to do

  2. I believe there is a ton of opportunity for infrastructure consulting and engineering, both in government and in private industry

  3. I wanted to work with the folks on this team in particlar

Thanks to all the folks who made my time at Heroku awesome and the folks who have been most welcoming at Truss. I’m already enjoying working together!