IT#19 The Most Expensive Storage in IT (and how to use it)

#IT Management #ITHiring #SoftwareEngineering

A story

Long ago in 1998 I was interviewed by a guy from one Chicago company about C++ and MFC (Microsoft UI library for C++). He asked me:

"How would you put 50000 items in UI into a dropdown box."
"I would not put 50000 items into a dropdown box."
"But what if the customer wants that?"
"Then I would talk to the customer and explain why this is a bad idea."
"But still..."

The guy wanted a hack. Specifically, he wanted to implement scrolling event handlers and dynamically update the list. He clearly was very proud of knowing about those interceptors and his goal in the interview was to show that. I know, odd for an interviewer.

The idea itself was horrible, it’s expensive to implement and maintain and the result would be... inferior. For example, the slider would jump around when scrolling.

But this is not why the idea was bad. As I wrote before, all Computer Science and things like GUI are designed to counteract human limitations and help us to control von Neuman computers. So, the main problem of his idea was:

Computer can put 50000 items into a dropdown box, but the human user cannot use it. So, dropdown box is the wrong UI item for the task, you need a different navigation design.

Fast remote to a few years ago. Now I interviewed a guy proud of knowing small details of an obscure tool. The problem was that he could write a command line for that tool for an obscure operation out of top of his head, but he did not know how to properly use that tool. In fact, what he proposed to do would become a multi-year maintenance problem for the company.

Did you already guess what this article is about? I already touched that topic before from a different angle.

Hierarchy of Information Storage

Here are the ways to store information (let’s omit paper) from the cheapest to most expensive ones:

1. $ Tapes. Amazingly they are occasionally still used, for example, for archival like Amazon Glacier Deep Archive (Secure Storage - Amazon S3 Glacier storage classes - AWS)

2. $$ Magnetic disks.

3. $$$ SSDs.

4. $$$$ RAM.

5. $$$$$ CPU cache.

In a good system design interview the candidate must properly explain which storage will be used where and why.

For example, CPU cache is used automatically and only for code execution.

RAM is used for what is necessary to run programs, the program’s binary code and essential data. Data may be stored on a disk and loaded and discarded as needed.

Any persistent information can be stored on disks – magnetic or SSD. If a large massive of data will be read a lot and we need it fast, for example, in machine learning, SSD is the best option, otherwise magnetic disks present a more economic option.

And finally, data which are rarely used or likely never, for example compliance data, can be stored on really long storage like Amazon Glacier Deep Archive.

Everything comes with a cost, and an engineer must be conscious of that cost, and not try to use the most expensive storage for negligible results, or even worse, taking that resource from a more critical task.

Makes sense?

And now we come to one more class of stores, which is extremely expensive and currently often mismanaged due to essentially malpractice and lack of understanding how discovery economy works.

The most expensive information storage and how we use it

The most expensive information storage in IT is:

6. $$$$$$$$$$ Human brains. Especially engineers' and customers’ brains.

I already wrote why you should not increase cognitive load on your customers (“IT#17 Social ADD"), so this time I will consider engineers.

Generally, it’s very hard to always reach the right balance price/value, but here I will refer to one previous article which will make a couple of common pitfalls obvious, namely “IT#03 Discovery Economy”.

Assume you are a hiring manager. What do you put into a job description? Usually you have three sections:

1. Some CEO-like nonsense (“exciting!”, “tough problems!”, “challenge yourself!”), which everyone except those, whom you certainly don’t want to hire, will ignore.

2. Tools and skills, and usually skills are actually also tools like computer languages.

3. Environment like great office, some remote time, work-life balance, perks, etc.

Let’s concentrate on #2. Generally, you are right, an applicant should know what to expect. Unfortunately, HR and recruiters treat it as requirements and they approach it in a rather robotic than human way as Google search engine, by matching keywords.

They literally throw away every resume that does not have at least 70% of the keywords. And that’s stupid.

To begin with, IT has a lot of synonyms. So, an engineer who manually implemented TLS for a Microsoft product would not qualify for a job description with IPSec. For any knowledgeable software engineer that’s textbook stupid, but not for HR.

However, there is a more serious problem. Imagine you wanted an engineer who knows Java and perforce (version control system). Ok, you found one. Now, your company decided to move from perforce to git. You have two options:

1. Fire C++/perforce person and find and hire C++/git person.

2. Make the one you have, learn git.

What would you decide?

You see, knowledge of perforce was not essential from the beginning.

You could have been better off hiring a better engineer, who would learn perforce in a couple of the first days.

How is it related to storage? Directly. Where do you think the syntax of a computer language should be:

1. Engineer’s brain.

2. Disks and editing tools?

If you think of (a), you are wrong. Computer Science made huge steps in helping engineers to control von Neuman computers (that’s what we use), but humans are still not very good at it. Of course, it’s not an assembler language or FORTRAN, where on period instead of comma may damage a space mission (real case), but still there is a reason we have specialized code editors with embedded syntax checks. Engineers should understand the philosophy of a language syntax, like no semicolons in Python and Go, but for the rest there are syntax-aware code editors, validation tools and compilers.

I have more than 20 active computer languages in which I recently wrote production code. “Active” means not counting atavisms like Algol-60, Algol-68, FORTRAN, PL/1, Pascal, Modula-2... Do I know syntax of all of them in smallest details? No. I know the principles. And with such simple code editing tools like VS Code or internal Google Cider, this is quite enough.

By the way, human brains are not good for storing information, they are essentially “rules engines”. They are good at storing principles, connections, logic, but not dead information. That’s why remembering some interesting story is so easier than the table of multiplication you had learn in the elementary school.

But there is more to it. You did not hire a person with the knowledge of a tool which became obsolete very soon. You hired a person who filled the space in their head with the soon to be obsolete tool instead of what? The storage in a human brain is not infinite. We discussed that in the “Knowledge Economy” and “Discovery Economy” articles.

And now we come to the question: so, what skills really should be in the head of a good software engineer?

What skills really should be in the head of a good software engineer?

The #1 requirement comes from the example above, where you’ve got a person with nearly obsolete knowledge and had to retrain them.

In the quickly changing environment with a lot of tools becoming obsolete and a lot of tools appearing from nowhere, a good software engineer must be able to learn fast.

Other than that:

• Communications: The main language for a software engineer is not Java, C++ or even Go, it’s English!

• Philosophy and principles of procedural, functional and declarative languages.

• Philosophy and principles of current languages and their libraries.

• System design and design pattern (including high level like message queues).

• Overview of currently available tools and ability to incorporate them if needed, even if they don’t know them yet.

• Some project management, like scrum. By the way, if there is an official Scrum certification, the person most likely does not really know what Scrum is.

• There is more.

This is what you need in the heads of your engineers.

Back to the most expensive storage

What I listed above is a lot. Sure, it’s better to hire someone with experience with your tool. But they should already understand everything above. The experience with, say, Java is bonus, not a requirement. After all, basic knowledge about a computer language can be acquired in a couple of evenings. A library may require a bit more and a test project. Version control, containers, VPS, are not really harder.

By putting tools on a job description, we often rob ourselves from better engineers we could have got otherwise. Because we have used the most expensive information storage with what was not worth it.

Search This Blog

Eldar University - IT and Economy