Two clueless consultants

How I hire a DBA or Database Developer

Livedrive is currently looking to hire a rock star DBA and developers with a strong understanding of databases. We have 100TB of mySQL data online and a SQL server mirror running at 15k tx/sec non stop (peaking at 200K) with a nice little 3TB OLTP system.

And my goodness – rock star DBAs are hard to find. For those of you looking for one – or thinking you are one – I wanted to write up my advice.

Here is how I hire a DBA:

Initial Screening and Recruiters

We rely on internal HR and personal connections to find CVs. Every day – I get 3-4 invitations on LinkedIn from recruiters. Half of these just invite me to connect with the default LinkedIn message. Those get rejected right away. The other half write some MBA-style empty talk about partnering with us and how they can help our business do blah blah blah (the word: “Value” sprinkled generously into the paragraph). Those get laughed at, then discarded. Only once have I gotten a nice letter from a recruiter with details of the candidates he has available showing some understanding of what we are actually looking for. That gets forwarded to our internal HR. Most recruiters, give recruiters a bad name.

Only twice have I recieved a CV directly from a real candidate. Why do job seeking candidates  bother with recruiters?

Phone interview

After perusing the CVs we receive, we discard about 75% of them outright. The ones we like – we arrange for a short phone call with the candidate. This call typically lasts between 15-20 min.

The interview is conducted by the COO and myself.

The purpose of the initial call is to understand what motivates the candidate to move to a new job. We also try to understand what the person actually did in her/his previous positions – especially what they are proud of or feel they learned from. If a CV has been “pimped” we will nearly always discover obvious flaws in the person’s skills at this point. Candidates who are obnoxious (no matter how skilled) also get discarded during this interview. We don’t hire jerks – it’s just not worth the time.

What positive signs do I look for during this phase?

First of all, passion. People who are passionate about their job will nearly always learn what they need to learn. We are not your run-of-the-mill installation – and we don’t expect people to hit the ground running. We have some really smart people on board that the candidate can learn from, if they have the passion to do so. It doesn’t matter if you previously failed miserably – as long as you have learned from it.

Second, courage and intelligence. A candidate who has learned a great many skills is highly rated with us. It doesn’t really matter if you don’t know MySQL, if you have already acquired MSSQL, Oracle and MongoDB (to take an example) skills instead. Similarly – we are a mixed tool shop (OSS and MS) – so we try to filter out obvious Linux bigots and MS worshippers.

We once interviewed a candidate who knew none of the skills we required – yet spoke 8 languages fluently and had changed career twice. We did not hire this person (there was too large a skill gap) – but this is someone who will do well and get past our initial filters.

We also answer any questions the candidate has. Even if we don’t like the candidate – we still sell ourselves to them. They will tell their friends – and we may hire those friends one day (if you must – think of this like good karma)

During this initial interview – we discard another 2/3 of the candidates. The last ones – we invite on site.

On site interview

The final stage our our process is the on site interview. There are a lot of things you can’t tell about a candidate until you meet them in person. Additionally – because of the way I do techical testing – I need them in front of a whiteboard. We tell the candidate to reserve half a day for this interview.

As the candidate arrives – we offer then a cup of coffee or tea and walk them to our meeting room set up for the purpose. This room is equipped with a table, a few chairs and the most important furniture: a whiteboard.

The first phase of the interview is conducted by the same two people who did the phone interview – the COO and me. We want the candidate to feel comfortable at this point and will start the meeting with some small talk, asking a few questions about them and who they are. Anyone who lied during the phone interview – we catch at this point (as sad as this is – this happened twice).

Once the candidate has gotten to know us – the grilling begins!

I first tell the candidate that we like to understand how they think about problem solving. I say – and this is true – that I am not looking for specific answers – just a discussion. I leaned this intro from reading Joel Spolsky.

Depending on my initial impression of the candidate’s skill level – I pick a “toy problem” that is not too hard and put it on the whiteboard. If the candidate appears nervous – we start slow. If the candidate is full of confidence – I hit them with a harder problem.

The purpose of the discussion that follows is:

  1. To find out how the person thinks about problem solving
  2. To show the person some real problems they will be dealing with (good candidates will love the challenges)
  3. To bring the person out of their technical comfort zone and see how they react

The last may seem cruel. But, I can assure you that until you see people admit they don’t know things, you don’t really know them. Nobody is perfect and reality will sometime hit you with unexpected situations – I need to know that I can rely on them to think straight when faced with a problem they don’t know the solution to.

After presenting the toy problem, I ramp up the difficulty of questions – typically using very open ended problems that allow the discussion to flow.

Good candidates exhibit some very specific traits during this phase.

First, they will eagerly engage in the discussion. Some people are shy and will remain seated at the table, but others will jump to the whiteboard and start asking clarifying questions. The good candidate will seize the change to prove him/herself.

Second, the candidate will look for paths to solutions. It is not uncommon to find candidates asking a lot of clarifying questions (which we encourage), but not really moving towards a solution while asking those questions. While it is admirable to seek understanding of a problem before jumping to conclusions; there is a point where you have to take and stand and put forth a straw man solution – even if it may be wrong. As an example, we once interviewed a really promising candidate who asked a lot of intelligent questions – but never really had the courage to propose a solution based on the answers. Such people can spend a lot of time in meetings, but they wont produce much code.

Finally, we are looking for people who rule out bad solutions when they see them. It has been said that experts are not better at solving things, they are just better at filtering obviously bad ideas. You will too frequently find candidates who will strongly defend the most obscure solutions to a problem by inventing non existing (or non important) issues that are addressed by those solutions. We need programmers, not politicians.

A Sample Toy Problem

The toy problems we give candidates are designed to move them a bit out of their “best practice and canned solutions” comfort zone. Any fool can read blog entries and whitepapers and follow the cooking recipes there, it takes skills to go beyond that. We are looking for rock stars after all.

To give you an idea of the flavour of problem I pick, here is one example (that mimics what we have in production):

“In our systems, we store metadata about a user’s files. The data is stored in a relational database for quick retrieval.

There are two ways to retrieve a file: by it’s name and by it’s unique id. In both cases, the account-id of the user is also passed to the database.

To make the example concrete, consider this schema:

CREATE TABLE FileMeta
(
Id [Some Data Type] NOT NULL
, ParentId [Some Data Type] NOT NULL
, Type CHAR(1) NOT NULL /* F = File or D = Directory */
, AccountId [Some Data Type] NOT NULL
, Name VARCHAR(255)
)

These two queries are the most frequently run (90% of the workload):

SELECT …
FROM FileMeta
WHERE AccountId = [Foo]
AND Id = [Bar]

and:

SELECT …
FROM FileMeta
WHERE AccountId = [Foo]
AND Name = ‘[Some Name]’

My question is: How would you finish the design of this table and it’s indexes?”

This apparently  simple question hides some quite complex considerations. Depending on how the candidate fares, I can take the questions in different directions and up the game. Examples:

  • What data type will you pick for Id?
  • Should you use cluster indexes or regular indexes?
  • What about the hierarchy implied by ParentId? Are there better solutions?
  • How will you store the Name to support multiple languages?
  • What about index sizes, if this table is large (we have 60TB of this stuff) – could you be smart about the indexing structures to locate the Name column?
  • Is the Type column the right way to model this problem?
  • If you were to implement file versioning, how would you extend this model?
  • With the indexes you proposed, could you draw me the expected query plan of this statement?
  • If I wanted to introduce a query that retrieves ALL the files from a user, would your indexing strategy then change? If yes, how? If no, what would the impact be?
  • Could you estimate the storage cost/row with your current solution?
  • What is the expected runtime of this query? What will it depend on and how will you size a server for it?
  • If this table grew very large, how would you partition it, and if you do why do it like that?
  • If you were to shard the workload, how would you approach this? Will your indexing strategy change based on that?

There are multiple right answers to these questions. The good candidate will be able to think of tradeoffs and list them out. I will then ask them to pick between the alternatives – allowing them to ask me clarifying questions to reach a conclusion (again, we try to filter out the empty “it depends” statements; you must take a stand). Sometimes, I will throw a spanner in the works and propose a really silly solution and see if the candidate has the chops to shoot it down.

Sometimes, I suspect a candidate is spewing best practices at me or they have read this blog and come up with pre-canned answers. If this is the case, I dig in to test their understanding. For example, if the person says that the Name should be Unicode to support international characters, I will ask them how they will determine the encoding to use for it if we were to support all of Mac, Linux and Windows. If the candidate offers platform specific solutions (for example the hierarchy data type for SQL Server) – I will ask them why that solution is better than a self join (or some of the OTHER alternatives).

Team interview

If the candidate does well during the grilling session, we invite their future team members to speak with them and introduce the candidate to the people they might end up working with. Typically, we will have 2-3 interview sessions where neither me, not the COO are present (to avoid influencing the decisions).

After each session, we debrief with the interviewer. If more than one person rejects the candidate – we do not proceed.

Final Phase

Once a candidate has been through the grilling session and team interview, we have enough information to make a choice. At this point, the candidate is most likely very tired. We thank them for their time and inform them of our decisions.

After this, salary negotiations begin and we hopefully reach a mutually agreeable solution.

  11Comments

  1. Donald   •  

    I enjoyed reading this article and found it interesting and informative. However the consistent misspelling of the possessive its with an apostrophe which shouldn’t be there really winds me up. Guess I would fail the interview!

    • Thomas Kejser   •     Author

      That’s a good one Donald. I will correct the blog entry. Thanks for noticing

  2. Stacy Gray   •  

    Why do job seeking candidates bother with recruiters?
    Because they bombard your inbox and they make all the arrangements for you…when I was laid off in 2009, it was the first time I had ever been unemployed. I had no idea how the system worked. I still had a single page resume. All these people were calling me, I thought they were with the hiring company, or had some special arrangement or something. It was really quite a learning experience.

    Your interviewing technique is strong. You will be able to tell the difference from people that sound smart, but don’t pay attention to detail, from the people that are smart because they always put 100% of themselves into everything they do. It is like my grandpa used to say, you can’t teach ‘um to give a crap.

    There aren’t that many rock stars out there, and I’m not one of them…not of THAT caliber. I’m just thrilled to be able to follow along well enough to be able to whistle and shake my head as I imagine myself standing there, a vast span of intimidating white board beside me whilst I clutch the dry erase marker, trying to scratch out a solution. The mental exercise alone is exhausting.

    It has been four months. I’m curious, who survived the interview?

    • Thomas Kejser   •     Author

      Hi Stacy

      I suppose getting spammed with job offers does have its advantages. Having seen the other side of the coin, I could worry that the jobs that get thrown out to the wolves are not the really good ones. The way recruiters behave to prospective employers – it does not take long to get tired of working with them.

      Two people passed through the gauntlet during my time there – but as expected, they were already looking at other offers to and we could not close the deal with them. As far as I am aware, they are still looking and currently relying on consultants (who get trough similar testing).

  3. Chris Adkin   •  

    I might be wrong here and I need to look into how exactly sequence caching works in SQL Server, but in other popular database engines, the sequence cache size effectively causes each session to grab a range of sequence, thus partitioning the sequence between sessions. Does this go against your understanding of sequences in SQL Server ?.

    Regarding a DBA versus developed, if you are talking production DBA versus developer ( as opposed to development DBA ), in my humble opinion I thing that the difference is stark. I would expect and hope a production DBA to be very process, procedure and “Do things by the book oriented”, someone who is well versed to serving change tickets, making sure that everything is backed up, dr plans are in place, scripted, dress rehearsed, water tight and air tight. I would also expect the production DBA, particularly in a corporate environment to be service delivery focussed and have knowledge of working within the ITIL framework. On the other hand I would expect a developer to have a knowledge of design patterns, modelling, continuous integration and different methodologies / frameworks in which software is delivered, agile, RAD etc that the production DBA does not have. To muddy the waters further, I would expect a development DBA to bridge both worlds and have skills in both areas, but not necessarily have the same skill set as an out and out developer or an out and out production DBA. If I was to some up a developer in one word it would be creative, if I was to sum up a prod DBA in as few words as possible, they would be “Safe pair of hands”, “Keep the ship afloat” and “Risk adverse”.

    As you know my home rig has two distinct types of storage 10K SATA drives to simulate DAS type storage and fusion IO cards, once I’ve got more mundane tasks out of the way such as completing the books for my company year end, I’ll look into the scalability of using GUIDs, reverse key indexes, IDs. My hypothesis being that scalability and performance on spinning rust crumbles when using GUIDS and flies on low latency storage.

    For a variety of reasons a permanent role would only appeal to me if it was a genuine unique opportunity or if my personal circumstances changed radically, however it would be interesting to turn up to your interview and see how it pans out 😉

    • Thomas Kejser   •     Author

      The sequence cache is indeed generated in advance (or “chunky” if you wish). But that is not the issue. The issue is that you still compete on the right side of the tree to acquire latches. Not just at the leaf, but at the index pages too.

      Oracle actually has a neat solution to this (in addition to reverse indexes) but allowing the hot page at the end to “overflow” the tree and cleaning it up later (which is feasible, because if you have hot page contention, you also leave a lot of cold data behind you).

    • Thomas Kejser   •     Author

      On the production DBA vs. developer: I think the line is much more blurry than you make it out to be. But perhaps this is because I am mostly biased towards fast and agile organisations. In a big corporate environment, there is obviously room for both a methodical DBA and a more developer oriented programming DBA. However, if you are cloud provider, there is just no escaping that you need to be a bit of both.

      I look forward to seeing your tests with the different insert types. From my previous testing, GUID completely destroy IDENTITY/SEQUENCE. But the reverse indexes are even better than GUID. If you configured a SATA system with 10 or so drives, I would actually expect that you could show that GUID are superior even on spinning rust. For that, you would need to dedicate 2 spindles to the log and put a reasonable write aggregating cache in front of the RAID1. Its a system that you can construct today for around 2K USD – depending on who you shop with.

  4. Chris Adkin   •  

    Wow, where do you start with this, I would imagine that most people could write an entire blog post in response to this.

    Agents

    The market is saturated with them, irrespective as to whether they come across as good bad or indifferent most of them are chasing the same talent, worse still a lot of them simply surf linked and the job boards for people. The standard pattern of its “A challenging role at a fast paced organisation” makes me think for the love of god can you for once come up with something original, on the same line every time a position is touted to me its always without exception “An incredible unique opportunity”. You will find that the true rock stars already have good positions and enticing them out of them is going to be no mean feat.

    Interview Process

    This should ( within reason and certain constraints ) be a two way street, if the opportunity is that unique I would want to know what the big draw of it is, I would want a bit more than words that include uniquer environment, work atmosphere and challenge, what is your value proposition here ?. If I was on the interviewing end of things I would want to know what the candidate does to develop themselves, I would hope and expect ( although I’ve been sadly disappointed in the past when it comes to this ) that they do a bit more to develop themselves than rely on what they pick up between 9 to 5.

    Technical Questions

    I would need to know what the load profile is on the table SELECT versus DML, what the storage and balance of CPU to storage is, what is the performance and scalability goal of the application. Given low latency storage as you know GUIDs scale well, “Rick’s reverse key index” scales better . I need to look into SQL Server . Here is another interesting take on identifiers, SQL 2012 introduces sequences, these can be retrieved in ‘Blocks’ via the cache keyword, you might want to play around with this in order to minimize remote memory access on servers that use NUMA, this is a well practiced technique in the Oracle world when using RAC . For the question about the hierarchy I would want to know how deep this and how the query workload profile falls across this, in order to avoid issues of locking, based with little back ground information to go off, my natural choice would be to split this across tables. This is just scratching the surface, again you could write a whole blog post on this.

    Dev Versus DBA

    You would need to quality what sort of DBA you are talking about, for a production DBA I would look for someone who is a stickler for being methodical and doing things “By the book” this is diametrically opposed to the type of personality I would want if I had unique problems which wanted solving.

    Startups

    I’ve worked for a couple of startups whilst they can be great, I have experience of one going the way that a lot sadly do, I worked for another in which I was promised “The world is your oyster” only for this to evaporate when it got bought out, in fact it was like being assimilated by the borg . . . . They do tend to have a habit of either falling by the way side or in accordance with the founders “Exit plan” being bought out and assimilated. The other thing I would be cognizant of is; if the CTO was arguably one of the top people in the world at what he does and thrives on challenges, I would be mindful of how long he would be around for before seeking his next big mountain to climb and conquer . . . .

    Food for thought . . .

    I’d be interested to get other peoples take on this.

    • Thomas Kejser   •     Author

      That’s a lot of good discussion points Chris.

      To add my thoughts to this (And I hope others will add theirs).

      Agents: Not much to add here. They take a cut for doing a very dirty job – poorly. Clearly, there is an opportunity to automate this with a machine matching system (LinkedIn with some Legal framework integrated).

      Interview process: It absolutely is a two-way street. Once a candidate has passed the initial phase, I find myself spending as much time selling us to them as they spend selling themselves. The good candidate will ask a lot of the questions you pose and I better have good answers if I want them on board. My experience has been that if you detect passion, you don’t really need to worry about people learning outside work. They simply wont be able to help themselves.

      Technical Questions: These are exactly the type of considerations I hope a candidate will put forth. You might be surprised at how few people even start the discussions. The amount of time I have heard best practices regurgitated. On a side note: Sequences don’t help with the PAGELATCH problem – they just make getting the value more scalable (and useful for things like Rick’s Reverse Trick(tm))

      Dev vs DBA: I am curious to hear your thoughts about what you believe the difference is.

      Startups: Not much to add here either. Its the nature of startups to be fickle and fast moving. Some people like that sort of tempo (I do). Thanks for the implied compliment by the way.

      I too am curious to hear what other people think about this process.

  5. Marco Shae   •  

    “will cease the change”? Perhaps instead “will seize the chance”?

    Interesting read. Thanks for sharing.

    • Thomas Kejser   •     Author

      Thanks 🙂 Corrected. I really should pay someone to spell check my blog

Leave a Reply

Your email address will not be published. Required fields are marked *