Six common data roles in an organization

Many organizations throw different terms around for data roles. This may seem intimidating and confusing, and can lead to false expectations for all parties involved. At PeaceTech Lab NL, we aim to build strong partnerships between NGOs and data professionals. For this, we find it important to have a clear understanding of which data roles can be distinguished in an organization. This post sheds light on the six roles that are most commonly found and that all contribute to making optimal use of data. 

Data broker (or Data steward): Assembles data and makes it available for others to use. There are commercial data brokers who ask a fee for using data they curate (does not necessary mean they actually own that data). When a broker exists within an organization they usually are domain experts with data skills (e.g. from the finance department) and have data prepared for others to use, interpret and analyse. Within organizations, data brokers can also be called Data stewards.

Data engineer: A computer programmer. Focuses on deployment and maintenance of data products (e.g. machine learning pipelines, recurring reports or data views). Does not create prototypes or ad hoc analyses.

Data architect: Maintainer of the data infrastructure. Models data to make accessible to analysts or scientists. Designs an infrastructure where the data “lives”. Focuses on reliability of how data is stored, warehoused and preserved.

Business intelligence specialist: Business intelligence tooling helps organizations make better data driven decisions, often by presenting data in clear reports and interactive dashboards. Business intelligence specialists are the experts designing and building the needed reports and dashboards, but often leave more thorough analyses up to the data analysts or scientists.

Data analyst or Data scientist?

Most confusion exists about what the difference is between a data scientist and data analyst. The distinction as claimed below is not as strict as it seems. In practice these roles overlap. Also in Dutch it is more common to consider the data scientist roles that of a data analyst.

Data analyst: Does storytelling with data, but uses data more or less “as is”. Creates this story from the existing numbers by using charts and graphs. This may include analysing trends and some basic extrapolations, but the operations remain more fundamental. Many data analysts can create their product by sticking to spreadsheet manipulations (i.e. Excel)
A subset of data analysts are business analysts. They are more involved in the business aspect of the organization.

Data scientist: Some crude one liners float around that try to define a data scientist:

A data scientist is a better programmer than a statistician is … is a better statistician than a programmer is.

A data scientist is a data analyst that can write code.

However a more careful way to make a distinction with data analysts might be to look at how they use data: A data scientist processes the data more extensively before interpreting and presenting their findings. They use advanced mathematical techniques and use advanced programming. They do this in addition to the work a data analyst does. Typically data scientists are comfortable with data modelling, machine learning and other advanced algorithms, and typically hold an advanced degree. Their toolbelt should include a more advanced programming language in addition to spreadsheet skills (e.g. Python or R programming language).

Some caution!

A distinction between a traditional scientist and a data scientist is that the first actively collects data. On the other hand, a data scientist usually works with data that already exists in some form, albeit not yet fully suitable for analysis. It may require further data mining. This means the traditional scientist uses data that was collected for the aimed analysis (primary analysis). A data scientist uses data that was collected for another purpose (secondary analysis). Traditional research uses carefully crafted experiments to collect data. It demands caution to use data if the context in which the data is collected is unknown. It may hold invisible biases.

Want to dive deeper?

This post gives a brief description of the six roles most commonly found in an organization. If you are interested in hearing more about these roles, want to explore which others exist, or want to know how to build a data team that fits your organization? Please reach out to us – we are happy to help!