Search: Site Web

Software reVisions

In pursuit of reliable, fault-tolerant, fail-safe software and systems

The Humane Society of the United States

By Extension

December 18, 2011 by Robert C. Watson

Seeing What We Expect to See

I'm a Unix/Linux programmer. It capitalizes on my tendency to take what I see quite literally. By making few assumptions, I'm able to see the problem as the computer does.

By contrast, "normal human thinking" depends heavily on our imagination filling in many blanks. We have to make lots of assumptions. Those assumptions cause us to see what we've seen before... what we expect to see.

With almost all of my attention on Unix/Linux over the years, I've mostly just used Microsoft Windows and Office as tools and not followed their inner workings much. Long ago, I attempted to decipher the raw format of a Microsoft Word document (and failed to do so reliably). It left me with a mental image of Word documents consisting of intermixed binary values and text that only Microsoft understood.

A few years later, someone sent me a document in Microsoft Word 2007's new .docx format that I needed to convert to HTML for the web. I only had Office 2003 and was horrified by the mess Word made when exported "As a web page". So I proceeded to read the document into a text editor to see if I could just cut out the content and reformat it by hand. Knowing it was supposed to be XML, that's what I was expecting to see. What I saw instead was gibberish -- pure binary.

"Damn that Microsoft!"

Sliding back and forth through the sizable document and finding no blocks of text or other discernible patterns, I brought up Firefox and started many hours of Googling.

Now one thing I've learned over the years is that, for me at least, there's a very consistent inverse relationship between the intractability of a problem and the complexity of its solution. The longer it takes to solve it, the more simple the solution is likely to be. Assumptions and expectations lead me down an increasingly complex path of study, experimentation and failure as I exhaust "obvious" solutions. (Is "Occam's Razor" misunderstood?)

Lots of Googling have also taught me that simple, fundamental facts and concepts about a piece of software are often documented only once and thus rarely found in search results. Assumptions again.

The more intractable the problem, the more likely that the solution hinges on one of these obscure bits of information.

I finally came across somebody in a forum explaining the new format to a newbie (A Newbie! A Noob! How mortifying...) and discovered that in the world of Microsoft...

Though a .docx is named much like a .doc, looks like a .doc and is used like a doc... it's really a .zip!

Labels: , , , , ,


Post a Comment

<< Home