Here are data modelling interview questions for fresher as well as experienced candidates.
1) What is data modelling?
Data modelling is the process of creating a model for the data to store in a database. It is a conceptual representation of data objects, the association between different data objects, and the rules.
2) Explain various types of data models
There are mainly three different types of data models:
Conceptual: Conceptual data model defines what should the system contain. This model is typically created by business stakeholders and data architects. The purpose is to organize, scope, and define business concepts and rules.
Logical: Defines how the system should be implemented regardless of the DBMS. This model is typically created by data architects and business analysts. The purpose is to develop a technical map of rules and data structures.
Physical: This data model describes how the system will be implemented using a specific DBMS system. This model is typically created by DBA and developers. The purpose is the actual implementation of the database.
3) Explain the fact and fact table
The fact represents quantitative data. For example, the net amount which is due. A fact table contains numerical data as well as foreign keys from dimensional tables.
4) List out various design schema in data modelling
There are two different types of data modelling schemes schemas: 1) Star Schema, and 2) Snowflake Schema
5) When should you consider denormalization?
Denormalization is used when there is a lot of involvement of the table while retrieving data. It is used to construct a data warehouse.
6) Explain dimension and attribute
Dimensions represent qualitative data. For example, product, class, plan, etc. A dimension table has textual or descriptive attributes. For example, the product category and product name are two attributes of the product dimension table.
7) What is the fact less fact?
Fact less fact is a table having no fact measurement. It contains only the dimension keys.
8) What is in-memory analytics?
In-memory analytics is a process of caching the database in RAM.
9) What is the difference between OLTP and OLAP?
|OLTP is an online transactional system.||OLAP is an online analysis and data retrieving process.|
|It is characterized by a large number of short online transactions.||It is characterized by a large volume of data.|
|OLTP uses traditional DBMS.||OLAP uses a data warehouse.|
|Tables in OLTP database are normalized.||The tables in OLAP are not normalized.|
|Its response time is in a millisecond.||Its response time is in second to minutes.|
|OLTP is designed for real time business operations.||OLAP is designed for the analysis of business measures by category and attributes.|
10) What is table?
The collection of rows and columns is called as table. Each and every column has a datatype. Table contains related data in a tabular format.
11) What is column?
Column or field is a vertical arrangement of data that contain related information.
12) Define data sparsity
Data sparsity is a term used for how much data you have for entity/ dimension of the model.
13) What is composite primary key?
Composite primary key is referred to the case where more than one table column is used as a part of primary key.
14) What is primary key?
Primary key is a column or group of columns that unequally identify each and every row in the table. The value of primary key must not be null. Every table must contain one primary key.
15) Explain foreign key
Foreign key is a group of attributes which is used to link parent and child table. The value of the foreign key column, which is available in the child table, is referred to the value of the primary key in the parent table.
16) What is metadata?
Metadata describes the data about data. It shows what type of data is actually stored in the database system.
17) What is data mart?
A data mart is a condensed version of a data warehouse and is designed for use by a specific department, unit, or set of users in an organization. E.g., marketing sales, HR, or finance.
18) What is OLTP?
Online transaction processing, shortly known as OLTP, supports transaction-oriented application in 3-tier architecture. OLTP administers the day to day transaction of company or organization.
19) What are the examples of the OLTP system?
Example of OLTP system are:
- Sending a text message
- Add a book to shopping cart
- Online airline ticket booking
- Online banking
- Order entry
20) What is check constraint?
Check constraint is used to verify a range of values in a column.
21) List out the types of normalization?
Types of normalizations are: 1) first normal form, 2) second normal form, 3) third normal forms, 4) boyce-codd fourth, and 5) fifth normal forms.
22) What is forward data engineering?
Forward engineering is a technical term used to describe the process of translating a logical model into a physical implement automatically.
23) What is PDAP?
It is a data cube that stores data as a summary. It helps the user to analyse data quickly. The data in PDAP is stored in a way that reporting can be done with ease.
24) Explain snow flake schema database design
A snowflake schema is an arrangement of a dimension table and fact table. Generally, both tables are further broken down into more dimension tables.
25) Explain analysis service
Analysis service gives a combined view of the data that is used in data mining or OLAP.
26) What is sequence clustering algorithm?
Sequence clustering algorithm collects paths which are similar or related to each other and sequences of data having events.
27) What is discrete and continuous data?
Discreet data is a finite data or defined data. E.g., gender, telephone numbers. Continuous data is data that changes in a continuous and ordered manner. E.g., age.
28) What is the time series algorithm?
Time series algorithm is a method to predict continuous values of data in table. E.g., Performance one employee can forecast the profit or influence.
29) What is Business Intelligence?
BI (Business Intelligence) is a set of processes, architectures, and technologies that convert raw data into meaningful information that drives profitable business actions. It is a suite of software and services to transform data into actionable intelligence and knowledge.
30) What is bit mapped index?
Bitmap indexes are a special type of database index that uses bitmaps (bit arrays) to answer queries by executing bitwise operations.
31) Explain data warehousing in detail
Data warehousing is a process for collecting and managing data from varied sources. It provides meaningful business enterprise insights. Data warehousing is typically used to connect and analyse data from heterogeneous sources. It is the core of the BI system, which is built for data analysis and reporting.
32) What is junk dimension?
Junk dimension combines two or more related cardinality into one dimension. It is usually Boolean or flag values.
33) Explain data scheme
Data Scheme is a diagrammatic representation that illustrates data relationships and structures.
34) Explain data collection frequency
Data collection frequency is the rate to collect the data. It also passes through various stages. These stages are: 1) extracting from various sources, 3) transforming, 4) cleansing, and 5) storing.
35) What is database cardinality?
Cardinality is a numerical attribute of the relationship between two entities or entity sets.
36) What are the different types of cardinal relationships?
Different types of key cardinal relationships are:
- One-to-One Relationships
- One-to-Many Relationships
- Many-to-One Relationships
- Many-to-Many Relationships
37) Define Critical Success Factor and list its four types
Critical Success Factor is a favorable result of any activity needed for organization to reach its goal.
Four types of critical success factor are:
- Industry CSFs
- Strategy CSFs
- Environmental CSFs
- Temporal CSFs
38) What is data mining?
Data mining is a multi-disciplinary skill that uses machine learning, statistics, AI, and database technology. It is all about discovering unsuspected / previously unknown relationships amongst the data.
39) What is the difference between Star schema and Snowflake schema?
|Star Schema||Snowflake Schema|
|Hierarchies for the dimensions are stored in the dimensional table.||Hierarchies are divided into separate tables.|
|It contains a fact table surrounded by dimension tables.||One fact table surrounded by dimension table which is in turn surrounded by dimension table|
|In a star schema, only a single join creates the relationship between the fact table and any dimension tables.||A snowflake schema requires many joins to fetch the data.|
|It has a simple database design||It has a complex database design|
|Denormalized data structure and query also run faster.||Normalized data Structure.|
|High level of data redundancy||Very low-level data redundancy|
|Offers higher-performing queries using Star Join Query Optimization. Tables may be connected with multiple dimensions.||The Snow Flake Schema is represented by a centralized fact table which unlikely connected with multiple dimensions.|
40) What is identifying relationship?
Identifying entity relationships in DBMS is used to identify a relationship between two entities: 1) strong entity, and 2) weak entity.
41) What is a self-recursive relationship?
Recursive relationship is a standalone column in a table which is connected to the primary key of the same table.
42) Explain relational data modelling
Relational data modelling is representation of objects in a relational database, which is usually normalized.
43) What is predictive modelling analytics?
The process of validating or testing a model which would used to predict testing and validating outcomes. It can be used for machine learning, artificial intelligence, as well as statistics.
44) What is the difference between logical data model and physical data model?
|Logical data model||Physical data model|
|A logical data model can design the requirement of business logically.||A physical data model provides information about the target database source and its properties.|
|It is responsible for the actual implementation of data which is stored in the database.||A physical data model helps you to create a new database model from existing and apply the referential integrity constraint.|
|It contains an entity, primary key attributes, Inversion keys, alternate key, rule, business relation, definition, etc.||A physical data model contains a table, key constraints, unique key, columns, foreign key, indexes, default values, etc.|
45) What are the different types of constraints?
A different type of constraint could be unique, null values, foreign keys, composite key or check constraint, etc.
46) What is a data-modelling tool?
Data modelling tool is a software that helps in constructing data flow and the relation between data. Examples of such tools are Borland Together, Altova Database Spy, casewise, Case Studio 2, etc.
47) What is hierarchical DBMS?
In the hierarchical database, model data is organized in a tree-like structure. Data is stored in a hierarchical format. Data is represented using a parent-child relationship. In hierarchical DBMS parent may have many children, children have only one parent.
48) What are the drawbacks of the hierarchical data model?
The drawbacks of the hierarchical data model are:
- It is not flexible as it takes time to adapt to the changing needs of the business.
- The structure poses the issue in, inter-departmental communication, vertical communication, as well as inter-agency communication.
- Hierarchical data model can create problems of disunity.
49) Explain the process-driven approach of data modelling
Process-driven approach used in data modelling follows a step by step method on the relationship between the entity-relationship model and organizational process.
50) What are the advantages of using data modelling?
The advantages of using data modelling in data warehousing are:
- It helps you to manage business data by normalizing it and defining its attributes.
- Data modelling integrates the data of various systems to reduce data redundancy.
- It enables to create efficient database design.
- Data modelling helps the organization department to function as a team.
- It facilitates to access data with ease.
51) What are the disadvantages of using data modelling?
The disadvantages of using data modelling are:
- It has less structural independency
- It can make the system complex.
52) What is index?
Index is used for a column or group of columns to retrieve data fast.
53) What are the characteristics of a logical data model?
Characteristics of logical data model are:
- Describes data needs for a single project but could integrate with other logical data models based on the scope of the project.
- Designed and developed independently from the DBMS.
- Data attributes will have datatypes with exact precisions and length.
- Normalization processes to the model, which is generally are applied typically till 3NF.
54) What are the characteristics of physical data model?
Characteristics of physical data model are:
- The physical data model describes data need for a single project or application. It may be integrated with other physical data models based on project scope.
- Data model contains relationships between tables that address cardinality and nullability of the relationships.
- Developed for a specific version of a DBMS, location, data storage, or technology to be used in the project.
- Columns should have exact datatypes, lengths assigned, and default values.
- Primary and foreign keys, views, indexes, access profiles, and authorizations, etc. are defined.
55) What are the two types of data modelling techniques?
Two types of data modelling techniques are: 1) entity-relationship (E-R) Model, and 2) UML (Unified Modelling Language).
56) What is UML?
UML (Unified Modelling Language) is a general-purpose, database development, modelling language in the field of software engineering. The main intention is to provide a generalized way to visualize system design.
57) Explain object-oriented database model
The object-oriented database model is a collection of objects. These objects can have associated features as well as methods.
58) What is a network model?
It is a model which is built on hierarchical model. It allows more than one relationship to link records, which indicates that it has multiple records. It is possible to construct a set of parent records and child records. Each record can belong to multiple sets that enable you to perform complex table relationships.
59) What is hashing?
Hashing is a technique which is used to search all the index value and retrieve desired data. It helps to calculate the direct location of data, which are recorded on disk without using the structure of the index.
60) What is business or natural keys?
business or natural keys is a field that uniquely identifies an entity. For example, client ID, employee number, email etc.
61) What is compound key?
When more than one field is used to represent a key, it is referred to as a compound key.
62) What is first normal form?
First normal form or 1NF is a property of a relation available in a relational database management system. Any relation is called first normal form if the domain of every attribute contains values which are atomic. It contains one value from that domain.
63) What is the difference between primary key and foreign key?
|Primary key||Foreign key|
|Primary key helps you to uniquely identify a record in the table.||Foreign key is a field in the table that is the primary key of another table.|
|Primary Key never accepts null values.||A foreign key may accept multiple null values.|
|Primary key is a clustered index, and data in the DBMS table are physically organized in the sequence of the clustered index.||A foreign key cannot automatically create an index, clustered, or non-clustered. However, you can manually create an index on the foreign key.|
|You can have the single Primary key in a table.||You can have multiple foreign keys in a table.|
64) What are the requirements of the second normal form?
The requirements of second normal form are:
- It should be in first normal form.
- It does not contain any non-prime attribute, which is functionally dependent on any subset of candidate key of the table relation.
65) What are the rules for third normal form?
Rules for third normal forms are:
- It should be in second normal form
- It has no transitive functional dependencies.
66) What the importance of using keys?
- Keys help you to identify any row of data in a table. In a real-world application, a table could contain thousands of records.
- Keys ensure that you can uniquely identify a table record despite these challenges.
- Allows you to establish a relationship between and identify the relation between tables
- Help you to enforce identity and integrity in the relationship.
67) What is a Surrogate Key?
An artificial key which aims to uniquely identify each record is called a surrogate key. These kinds of key are unique because they are created when you don’t have any natural primary key. They do not lend any meaning to the data in the table. Surrogate key is usually an integer.
68) Explain alternate key in detail
Alternate key is a column or group of columns in a table that uniquely identifies every row in that table. A table can have multiple choices for a primary key, but only one can be set as the primary key. All the keys which are not primary key are called an Alternate Key.
69) What is fourth normal form in DBMS?
Fourth normal form is a level of database normalization where there must not have non trivial dependency other than candidate key.
70) What is a database management system?
Database management system or DBMS is a software for storing and retrieving user data. It consists of a group of programs which manipulate the database.
71) What is the rule of fifth normal form?
A table is in 5th normal form only if it is in 4th normal form, and it cannot be decomposed into any number of smaller tables without loss of data.
72) What is normalization?
Normalization is a database design technique that organizes tables in a manner that reduces redundancy and dependency of data. It divides larger tables into smaller tables and links them using relationships.
73) Explain the characteristics of a database management system
- Provides security and removes redundancy
- Self-describing nature of database system
- Insulation between programs and data abstraction
- Support of multiple views of data.
- Sharing of data and multiuser transaction processing
- DBMS allows entities and relations among them to form tables.
- It follows the ACID concept (Atomicity, Consistency, Isolation, and Durability).
- DBMS supports a multi-user environment that allows users to access and access and manipulate data in parallel.
74) List out popular DBMS software
Popular DBMS software is:
- Microsoft Access
- IBM DB2
- Microsoft SQL Server.
75) Explain the concept of RDBMS
Relational Database Management System is a software which is used to store data in the form of tables. In this kind of system, data is managed and stored in rows and columns, which is known as tuples and attributes. RDBMS is a powerful data management system and is widely used across the world.
76) What are the advantages of data model?
Advantages of the data model are:
- The main goal of a designing data model is to make sure that data objects offered by the functional team are represented accurately.
- The data model should be detailed enough to be used for building the physical database.
- The information in the data model can be used for defining the relationship between tables, primary and foreign keys, and stored procedures.
- Data Model helps businesses to communicate within and across organizations.
- Data model helps to documents data mappings in the ETL process
- Help to recognize correct sources of data to populate the model
77) What are the disadvantages of Data Model?
Disadvantages of Data model are:
- To develop Data model, one should know physical data stored characteristics.
- This is a navigational system that produces complex application development, management. Thus, it requires knowledge of the biographical truth.
- Even smaller changes made in structure require modification in the entire application.
- There is no set of data manipulation language in DBMS.
78) Explain various types of fact tables
There are three types of fact tables:
- Additive: It is a measure that is added to any dimension.
- Non-additive: It is a measure that can’t be added to any dimension.
- Semi-additive: It is a measure that can be added to a few dimensions.
79) What is aggregate table?
The aggregate table contains aggregated data that can be calculated using functions such as: 1) Average 2) MAX, 3) Count, 4) SUM, 5) SUM, and 6) MIN.
80) What is Confirmed dimension?
A conformed dimension is a dimension which is designed in a way that can be used across many fact tables in various areas of a data warehouse.
81) List types of Hierarchies in data modelling
There are two types of Hierarchies: 1) Level based hierarchies and 2) Parent-child hierarchies.
82) What is the difference between a data mart and data warehouse?
|Data mart||Data warehouse|
|Data mart focuses on a single subject area of business.||Data warehouse focuses on multiple areas of business.|
|It is used to make tactical decisions for business growth.||It helps business owners to take a strategic decision|
|Data mart follows the bottom-up model||Data warehouse follows a top-down model|
|Data source comes from one data source||Data source comes from more than one heterogeneous data sources.|
83) What is XMLA?
XMLA is an XML analysis that is considered as standard for accessing data in Online Analytical Processing (OLAP).
84) Explain junk dimension
Junk dimension helps to store data. It is used when data is not proper to store in schema.
85) Explain chained data replication
The situation when a secondary node selects target using ping time or when the closest node is a secondary, it is called as chained data replication.
86) Explain Virtual Data Warehousing
A virtual data warehouse gives a collective view of the completed data. A virtual data warehouse does not have historical data. It is considered as a logical data model having metadata.
87) Explain snapshot of data warehouse
Snapshot is a complete visualization of data at the time when data extraction process begins.
88) What is a bi-directional extract?
The ability of system to extract, cleanse, and transfer data in two directions is called as a directional extract.