Hiring a Data Scientist: Decoding the Ambiguities
Updated: Feb 12
Data Sciences is an upcoming sector that has reformed a lot of economic and industrial scenarios. The concept can plug into any industrial sector, enterprise or organization to solve real-world problems using data. Since, the industry is relatively nascent, hiring data scientists is not yet a conventional recruitment procedure. While the position of Java programmers and UI designers demands very specific skill sets and expertise, data sciences is a broad term to test anyone on.
The role of a data scientist will vary according to the industry, domain, and purpose in question. This implies that any organization should first decide why they need a data sciences department in the first place. An organization must define the roles and responsibilities of the data scientist clearly before the search for the candidate begins.
What would an ideal candidate look like?
Enterprises that have an in-house data analytics department often need data analysts and scientists to know about industry and market norms, competitor behavior, business domain and the brand specific information. This would be necessary to derive meaningful insights out of the analyses and corpus of data available at one’s disposal. Providing valuable and effective recommendations also requires one to understand the dynamics of the industry and the corresponding customer set.
On the other hand, pure-play data analytics firms and consultancy companies have dedicated data scientists and decision scientists who are recruited to use data as their only tool and guide for solving real-world business problems. Such data scientists rely on technology, data and some basic industry research to come up with viable solutions to high-cost problems. While the requirements may differ across organizations, the basic skill set in a data scientist stays more or less the same.
The following three broad subjects play a significant role in determining the ability of a potential data scientist:
Math and Statistics
Programming and Databases
Business and Industry domain knowledge
Testing Proficiency in Math and Programming
An experience in mathematics and statistics normally includes the ability to understand statistical measures and tests like R-squared values, t-test, p-values, correlations and standard arithmetic operations. Furthermore, data scientists should be familiar with statistical modeling techniques like regression, classification, time-series analysis or even advanced machine learning. Although a person with a degree/ specialization in statistics or data sciences should fulfill this criterion easily, any individual with a related degree (like in math, engineering, or economics) may or may not have any knowledge about machine learning or developing algorithms for artificial intelligence. In the absence of clear machine learning experience or knowledge, recruiters should look for a related educational background that suggests the individual can learn the skill on the job.
As the name suggests, data sciences is the process of taking in raw data, processing it and then using it to derive actionable insights for the given problem. To process, analyze or visualize data, we need technological platforms and tools that can handle a large amount of data and run multiple mathematical algorithms on them. SQL (Structured Query Language) is often used across various industries to query (pull), store and aggregate data in multiple forms. It is necessary that every candidate has some basic knowledge of the various SQL platforms (like Oracle SQL developer, MySQL, Teradata, NoSQL etc.) and relational database systems. Some knowledge of Big Data frameworks like SAP HANA, Cloudera, AWS or Hadoop is beneficial.
Most industries employ R, MATLAB, and Python, along with SAS as the major programming platforms to conduct statistical tests, create machine learning models, visualize trends, charts and data distributions on the data. Any knowledge in either of these or a more GUI-based advanced programming language like Java should ensure that the candidate will be a good fit for the role of a data scientist.
A Little Business Understanding goes a long way
As stated above, business understanding, creative problem-solving skills, and knowledge of the company or industry are some basic desirable skills a candidate for the role of a data scientist should possess. However, the person may not be proficient in all the above aspects. It is also important to note that if an individual has a good understanding of even a few mathematical concepts, tools or industries, he/she can learn the others just as well, given the intent and interest to learn and grow.
What matters is the candidate should be curious, resilient and inherently smart. The interest and willingness to learn and improve qualifies an individual as a strong competitor. Hence, there is no perfect candidate who promises all the above-mentioned skills. Recruiters should thus design the screening process flexibly to gauge the true ability of the candidate.
How to put a potential data scientist to test?
While questions like ‘ When do you use logistic regression?’ or ‘When is DBSCAN clustering better than k-means clustering?’ sound jargon-heavy and fancy, they do not necessarily test a candidates’ potential. It is easy to look these questions up on the internet and prepare answers for the same. They do not test the person in question on their understanding of the underlying concepts. A good question to test a candidate on his mathematical skills would be to give a sample dataset and then question his understanding of the various tests and mathematical measures on the basis of his assessment of the data.
Given data on the various insurance claims and their descriptions along with claim amounts, dates and other numerical metrics, it would be wise to ask what algorithm the person would choose to cluster these claims into different buckets of severity and why. This would help the recruiter to assess how well the candidate understands data, math and statistical algorithms. Tools and platforms can be a gray area for testing a candidate. While syntactical dexterity might not be a strong suit, a person might be great at designing logic for a certain operation that needs to be run for analyzing data. To test this, the interviewer can ask for a pseudo-code or questions on his algorithmic approach towards a problem-statement.
Case studies are a really good way to test a candidate in an exhaustive manner. Real-world business scenarios, for instance, a problem that has already been solved by the company can be given to a candidate with the appropriate amount of time to see how he would solve it. This will cover all questions on his understanding of data, math, technology and industrial know-how.
Several firms are using platforms like Kaggle to find well-suited candidates for the role of a data scientist. People interested have to solve a data sciences problem and compete with several others in the community to get through. While there are a lot of innovative ways to screen the most deserving candidate for the job, it is necessary that firms instill a data-driven culture in their daily operations. Data scientists can then, help organizations identify and solve all interconnected latent problems that might not be visible at the surface.
Looking to leverage big data applications for your business? Contact Datahut, your big data experts.