A Data Scientist is someone who is better at Statistics than any software Engineer and better at Software Engineering than any Statistician. They are responsible for discovering insights from massive amounts of structured and unstructured data to help shape or meet specific business needs and goals. The data scientist role in data analysis is becoming increasingly important as businesses rely more heavily on big data and data analytics to drive decision-making.
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to data mining.
Data science is a “concept to unify statistics, data analysis, machine learning and their related methods” in order to “understand and analyze actual phenomena” with data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, information science, and computer science.
Turing award winner Jim Gray imagined data science as a “fourth paradigm” of science (empirical, theoretical, computational and now data-driven) and asserted that “everything about science is changing because of the impact of information technology” and the data deluge.
In 2012, when Harvard Business Review called it “The Sexiest Job of the 21st Century”, the term “data science” became a buzzword. It is now often used interchangeably with earlier concepts like business analytics,business intelligence, predictive modeling, and statistics. Even the suggestion that data science is sexy was paraphrasing Hans Rosling, featured in a 2011 BBC documentary with the quote, “Statistics is now the sexiest subject around.” Nate Silver referred to data science as a sexed up term for statistics. In many cases, earlier approaches and solutions are now simply rebranded as “data science” to be more attractive, which can cause the term to become “dilute beyond usefulness.” While many university programs now offer a data science degree, there exists no consensus on a definition or suitable curriculum contents. To its discredit, however, many data-science and big-data projects fail to deliver useful results, often as a result of poor management and utilization of resources.