The GATE 2025 Data Science (DA) Question paper with Solution PDF is available to download here. GATE 2025 was conducted by IIT Roorkee As per the updated exam pattern, the exam consisted of 65 questions totaling 100 marks, with 10 questions from General Aptitude and 55 questions covering Data Science and AI topics.
The difficulty level of GATE 2025 DA was moderately difficult.
GATE 2025 DA Question Paper with Solutions PDF
GATE 2025 DA Question Paper with Answer Key | ![]() |
Check Solutions |

General Aptitude
Courage : Bravery :: Yearning :
Select the most appropriate option to complete the analogy.
We _______ tennis in the lawn when it suddenly started to rain.
Select the most appropriate option to complete the above sentence.
A 4 × 4 digital image has pixel intensities (U) as shown in the figure. The number of pixels with \( U \leq 4 \) is:
In the given figure, the numbers associated with the rectangle, triangle, and ellipse are 1, 2, and 3, respectively. Which one among the given options is the most appropriate combination of \( P \), \( Q \), and \( R \)?
A rectangle has a length \(L\) and a width \(W\), where \(L > W\). If the width, \(W\), is increased by 10%, which one of the following statements is correct for all values of \(L\) and \(W\)?
Select the most appropriate option to complete the above sentence.
Column-I has statements made by Shanthala; and, Column-II has responses given by Kanishk.
Weight of a person can be expressed as a function of their age. The function usually varies from person to person. Suppose this function is identical for two brothers, and it monotonically increases till the age of 50 years and then it monotonically decreases. Let \( a_1 \) and \( a_2 \) (in years) denote the ages of the brothers and \( a_1 < a_2 \).
Which one of the following statements is correct about their age on the day when they attain the same weight?
A regular dodecagon (12-sided regular polygon) is inscribed in a circle of radius \( r \) cm as shown in the figure. The side of the dodecagon is \( d \) cm. All the triangles (numbered 1 to 12 in the figure) are used to form squares of side \( r \) cm, and each numbered triangle is used only once to form a square.
The number of squares that can be formed and the number of triangles required to form each square, respectively, are:
If a real variable \(x\) satisfies \(3^{x^2} = 27 \times 9^x\), then the value of \(\frac{2^{x^2}}{(2^x)^2}\) is:
The number of patients per shift (X) consulting Dr. Gita in her past 100 shifts is shown in the figure. If the amount she earns is \(Rs. 1000(X - 0.2)\), what is the average amount (in Rs.) she has earned per shift in the past 100 shifts?
View Solution
Step 1: Understanding the problem.
The number of shifts corresponding to different numbers of patients per shift is given in the bar graph. The amount Dr. Gita earns is \(1000(X - 0.2)\), where \(X\) is the number of patients per shift.
The data from the graph is as follows:
For \(X = 5\), the number of shifts is 20.
For \(X = 6\), the number of shifts is 40.
For \(X = 7\), the number of shifts is 30.
For \(X = 8\), the number of shifts is 10.
Step 2: Calculating the total earnings.
For \(X = 5\):
\[ Earnings = 1000 \times (5 - 0.2) \times 20 = 1000 \times 4.8 \times 20 = 96,000. \]
For \(X = 6\):
\[ Earnings = 1000 \times (6 - 0.2) \times 40 = 1000 \times 5.8 \times 40 = 232,000. \]
For \(X = 7\):
\[ Earnings = 1000 \times (7 - 0.2) \times 30 = 1000 \times 6.8 \times 30 = 204,000. \]
For \(X = 8\):
\[ Earnings = 1000 \times (8 - 0.2) \times 10 = 1000 \times 7.8 \times 10 = 78,000. \]
Step 3: Calculating the total earnings and average earnings.
Total earnings for all 100 shifts:
\[ Total Earnings = 96,000 + 232,000 + 204,000 + 78,000 = 610,000. \]
The average earnings per shift: \[ Average Earnings = \frac{610,000}{100} = 6,100. \]
Thus, the average earnings per shift are Rs.6,100, which corresponds to Option (A). Quick Tip: When calculating averages involving frequency distributions, first calculate the total earnings, then divide by the total number of shifts to find the average.
Suppose \( X \) and \( Y \) are random variables. The conditional expectation of \( X \) given \( Y \) is denoted by \( E[X | Y] \). Then \( E[E[X | Y]] \) equals:
The number of additions and multiplications involved in performing Gaussian elimination on any \( n \times n \) upper triangular matrix is of the order:
The sum of the elements in each row of \( A \in \mathbb{R}^{n \times n} \) is 1. If \( B = A^3 - 2A^2 + A \), which one of the following statements is correct (for \( x \in \mathbb{R}^n \))?
Let \( f(x) = \frac{e^x - e^{-x}}{2}, \, x \in \mathbb{R} \). Let \( f^{(k)}(a) \) denote the \( k^{th} \) derivative of \( f \) evaluated at \( a \). What is the value of \( f^{(10)}(0)? \) (Note: \( ! \) denotes factorial)
Let \( p \) and \( q \) be any two propositions. Consider the following propositional statements.
S1: \( p \rightarrow q \), S2: \( \neg p \land q \), S3: \( \neg p \lor q \), S4: \( \neg p \lor \neg q \)
where \( \land \) denotes conjunction (AND operation), \( \lor \) denotes disjunction (OR operation), and \( \neg \) denotes negation (NOT operation).
(Note: \( \equiv \) denotes logical equivalence)
Which one of the following options is correct?
If a relational decomposition is not dependency-preserving, which one of the following relational operators will be executed more frequently in order to maintain the dependencies?
Consider the following three relations:
Car (model, year, serial, color)
Make (maker, model)
Own (owner, serial)
A tuple in Car represents a specific car of a given model, made in a given year, with a serial number and a color. A tuple in Make specifies that a maker company makes cars of a certain model. A tuple in Own specifies that an owner owns the car with a given serial number. Keys are underlined; (owner, serial) together form key for Own. (\(\bowtie\) denotes natural join)
\[ \pi_{owner} \left( Own \bowtie \sigma_{color="red"} \left( Car \bowtie \sigma_{maker="ABC"} Make \right) \right) \]
Which one of the following options describes what the above expression computes?
Consider a hash table of size 10 with indices \( \{0, 1, \dots, 9\} \), with the hash function \[ h(x) = 3x \, (mod \, 10), \]
where linear probing is used to handle collisions. The hash table is initially empty and then the following sequence of keys is inserted into the hash table: 1, 4, 5, 6, 14, 15. The indices where the keys 14 and 15 are stored are, respectively:
Let \( X \) be a continuous random variable whose cumulative distribution function (CDF) \( F_X(x) \), for some \( t \), is given as follows:
\[ F_X(x) = \begin{cases} 0 & if x \leq t
\frac{x - t}{4 - t} & if t \leq x \leq 4
1 & if x \geq 4 \end{cases} \]
If the median of \( X \) is 3, then what is the value of \( t \)?
Let \( X = aZ + b \), where \( Z \) is a standard normal random variable, and \( a, b \) are two unknown constants. It is given that \[ E[X] = 1, \quad E[(X - E[X]) | Z] = -2, \quad E[(X - E[X])^2] = 4, \]
where \( E[X] \) denotes the expectation of random variable \( X \). The values of \( a, b \) are:
It is given that \( P(X \geq 2) = 0.25 \) for an exponentially distributed random variable \( X \) with \( E[X] = \frac{1}{\lambda} \), where \( E[X] \) denotes the expectation of \( X \). What is the value of \( \lambda \)?
(\(\ln\) denotes natural logarithm)
Consider designing a linear classifier \[ y = sign(f(x; w, b)), \quad f(x; w, b) = w^T x + b \]
on a dataset \( D = \{(x_1, y_1), (x_2, y_2), \dots, (x_N, y_N)\}, x_i \in \mathbb{R}^d, y_i \in \{+1, -1\}, i = 1, 2, \dots, N \). Recall that the sign function outputs \( +1 \) if the argument is positive, and \( -1 \) if the argument is non-positive. The parameters \( w \) and \( b \) are updated as per the following training algorithm: \[ w_{new} = w_{old} + y_n x_n, \quad b_{new} = b_{old} + y_n \]
whenever sign\( (f(x_n; w_{old}, b_{old})) \neq y_n \). In other words, whenever the classifier wrongly predicts a sample \( (x_n, y_n) \) from the dataset, \( w_{old} \) gets updated to \( w_{new} \), and likewise \( b_{old} \) gets updated to \( b_{new} \). Consider the case \( (x_n, +1), f(x_n; w_{old}, b_{old}) < 0 \). Then:
Consider the following Python declarations of two lists.
\[ A = [1, 2, 3] \quad and \quad B = [4, 5, 6]. \]
Which one of the following statements results in \( A = [1, 2, 3, 4, 5, 6] \)?
Consider two functions \( f: \mathbb{R} \to \mathbb{R} \) and \( g: \mathbb{R} \to (1, \infty) \). Both functions are differentiable at a point \( c \). Which of the following functions is/are ALWAYS differentiable at \( c \)? The symbol \( \cdot \) denotes product and the symbol \( \circ \) denotes composition of functions.
Which of the following statements is/are correct?
Which of the following statements is/are correct in a Bayesian network?
For which of the following inputs does binary search take time \( O(\log n) \) in the worst case?
Let \( A = I_n + xx^T \), where \( I_n \) is the \( n \times n \) identity matrix and \( x \in \mathbb{R}^n, x^T x = 1 \). Which of the following options is/are correct?
Suppose that insertion sort is applied to the array \( [1, 3, 5, 7, 9, 11, x, 15, 13] \) and it takes exactly two swaps to sort the array. Select all possible values of \( x \).
Let \( C_1 \) and \( C_2 \) be two sets of objects. Let \( D(x, y) \) be a measure of dissimilarity between two objects \( x \) and \( y \). Consider the following definitions of dissimilarity between \( C_1 \) and \( C_2 \):
\[ DIS-1(C_1, C_2) = \max_{x \in C_1, y \in C_2} D(x, y) \]
\[ DIS-2(C_1, C_2) = \min_{x \in C_1, y \in C_2} D(x, y) \]
Which of the following statements is/are correct?
There are three boxes containing white balls and black balls.
Box-1 contains 2 black and 1 white ball.
Box-2 contains 1 black and 2 white balls.
Box-3 contains 3 black and 3 white balls.
In a random experiment, one of these boxes is selected, where the probability of choosing Box-1 is \( \frac{1}{2} \), Box-2 is \( \frac{1}{3} \), and Box-3 is \( \frac{1}{6} \). A ball is drawn at random from the selected box. Given that the ball drawn is white, the probability that it is drawn from Box-2 is:
Evaluate the following limit: \[ \lim_{t \to \infty} \sqrt{t^2 + t - t} \]
On a relation named Loan of a bank:
The following SQL query is executed:
SELECT L1.loan_number FROM Loan L1 WHERE L1.amount \(>\) (SELECT MAX(L2.amount) FROM Loan L2 WHERE L2.branch_name = 'SR Nagar');
Given data \( \{(-1, 1), (2, -5), (3, 5)\} \) of the form \( (x, y) \), we fit a model \( y = wx \) using linear least-squares regression. The optimal value of \( w \) is:
(Round off to three decimal places)
The naive Bayes classifier is used to solve a two-class classification problem with class-labels \( y_1, y_2 \). Suppose the prior probabilities are \( P(y_1) = \frac{1}{3} \) and \( P(y_2) = \frac{2}{3} \). Assuming a discrete feature space with \[ P(x | y_1) = \frac{3}{4} \quad and \quad P(x | y_2) = \frac{1}{4} \]
for a specific feature vector \( x \). The probability of misclassifying \( x \) is: (Round off to two decimal places)
Let \( Y = Z^2 \), \( Z = \frac{X - \mu}{\sigma} \), where \( X \) is a normal random variable with mean \( \mu \) and variance \( \sigma^2 \). The variance of \( Y \) is
Let \( A \in \mathbb{R}^{n \times n} \) be such that \( A^3 = A \). Which one of the following statements is ALWAYS correct?
Let \( \{ x_1, x_2, \dots, x_n \} \) be a set of linearly independent vectors in \( \mathbb{R}^n \). Let the \( (i,j) \)-th element of matrix \( A \in \mathbb{R}^{n \times n} \) be given by \( A_{ij} = x_i^T x_j \), where \( 1 \leq i, j \leq n \). Which one of the following statements is correct?
Consider the cumulative distribution function (CDF) of a random variable \( X \):
\[ F_X(x) = \begin{cases} 0 & if x \leq -1
\frac{1}{4}(x + 1)^2 & if -1 \leq x \leq 1
1 & if x \geq 1 \end{cases} \]
The value of \( P(X^2 \leq 0.25) \) is:
A random variable \( X \) is said to be distributed as \( Bernoulli(\theta) \), denoted by \( X \sim Bernoulli(\theta) \), if
\[ P(X = 1) = \theta, \quad P(X = 0) = 1 - \theta \]
for \( 0 < \theta < 1 \). Let \( Y = \sum_{i=1}^{300} X_i \), where \( X_i \sim Bernoulli(\theta) \), \( i = 1, 2, \dots, 300 \) be independent and identically distributed random variables with \( \theta = 0.25 \). The value of \( P(60 \leq Y \leq 90) \), after approximation through the Central Limit Theorem, is given by
For \( x \in \mathbb{R} \), the floor function is denoted by \( f(x) = \lfloor x \rfloor \) and defined as follows
\[ \lfloor x \rfloor = k, \quad k \leq x < k + 1, \]
where \( k \) is an integer. Let \( Y = |X| \), where \( X \) is an exponentially distributed random variable with mean \( \frac{1}{\ln 10} \), where \( \ln \) denotes natural logarithm. For any positive integer \( \ell \), one can write the probability of the event \( Y = \ell \) as follows:
\[ P(Y = \ell) = q^\ell (1 - q) \]
The value of \( q \) is:
Consider the neural network shown in the figure with
\[ inputs: u = 2, \, v = 3 \] \[ weights: a = 1, b = 1, c = 1, d = -1, e = 4, f = -1 \] \[ output: y \]
R denotes the ReLU function, \( R(x) = \max(0, x) \).
Given \( u = 2, v = 3, a = 1, b = 1, c = 1, d = -1, e = 4, f = -1 \), which one of the following is correct?
Consider game trees Tree-1 and Tree-2 as shown. The first level is a MAX agent and the second level is a MIN agent. The value in the square node is the output of the utility function.
For what ranges of \( x \) and \( y \), the right child of node B and the right child of node E will be pruned by the alpha-beta pruning algorithm?
The state graph shows the action cost along the edges and the heuristic function \( h \) associated with each state. Suppose the A algorithm is applied on this state graph using a priority queue to store the frontier. In what sequence are the nodes expanded?
A random experiment consists of throwing 100 fair dice, each die having six faces numbered 1 to 6. An event \( A \) represents the set of all outcomes where at least one of the dice shows a 1. Then, \( P(A) = \)
Consider a fact table in an OLAP application: Facts(D1, D2, val), where D1 and D2 are its dimension attributes and val is a dependent attribute. Suppose attribute D1 takes 3 values and D2 takes 2 values, and all combinations of these values are present in the table Facts. How many tuples are there in the result of the following query?
\[ SELECT D1, D2, sum(val) \] \[ FROM Facts \] \[ GROUP BY CUBE (D1, D2); \]
Consider the following Python code snippet. \[ A = \{"this", "that"\}, \quad B = \{"that", "other"\}, \quad C = \{"other", "this"\} \] \[ while "other" in C: \] \[ \quad if "this" in A: \] \[ \quad \quad A, B, C = A - B, B - C, C - A \] \[ \quad if "that" in B: \] \[ \quad \quad A, B, C = C | A, A | B, B | C \]
When the above program is executed, at the end, which of the following sets contains "this"?
Which of the following statements is/are correct about the rectified linear unit (ReLU) activation function defined as ReLU(x) = max(x, 0), where \( x \in \mathbb{R} \)?
Consider the function \( f(x) = \frac{x^3}{3} + \frac{7}{2}x^2 + 10x + \frac{133}{2} \), \( x \in [-8, 0] \). Which of the following statements is/are correct?
Let \( x_1, x_2, x_3, x_4, x_5 \) be a system of orthonormal vectors in \( \mathbb{R}^{10} \). Consider the matrix \[ A = x_1 x_1^T + x_2 x_2^T + x_3 x_3^T + x_4 x_4^T + x_5 x_5^T. \]
Which of the following statements is/are correct?
Let \( f : \mathbb{R} \to \mathbb{R} \) be a twice-differentiable function, and suppose its second derivative satisfies \( f''(x) > 0 \) for all \( x \in \mathbb{R} \). Which of the following statements is/are ALWAYS correct?
An \( n \times n \) matrix \( A \) with real entries satisfies the property: \[ \|Ax\|^2 = \|x\|^2, \quad for all x \in \mathbb{R}^n, \]
where \( \| \cdot \| \) denotes the Euclidean norm. Which of the following statements is/are ALWAYS correct?
Consider designing a linear binary classifier \( f(x) = sign(w^T x + b), x \in \mathbb{R}^2 \) on the following training data:
Class-1: \( \left\{ \left( \begin{array}{c} 2
0 \end{array} \right), \left( \begin{array}{c} 0
2 \end{array} \right), \left( \begin{array}{c} 2
2 \end{array} \right) \right\} \quad Class-2: \left\{ \left( \begin{array}{c} 0
0 \end{array} \right) \right\} \)
Hard-margin support vector machine (SVM) formulation is solved to obtain \( w \) and \( b \). Which of the following options is/are correct?
Consider a coin-toss experiment where the probability of head showing up is \( p \). In the \( i \)-th coin toss, let \( X_i = 1 \) if head appears, and \( X_i = 0 \) if tail appears. Consider \[ \hat{p} = \frac{1}{n} \sum_{i=1}^n X_i, \]
where \( n \) is the total number of independent coin tosses. Which of the following statements is/are correct?
Consider a two-class problem in \( \mathbb{R}^d \) with class labels red and green. Let \( \mu_{red} \) and \( \mu_{green} \) be the means of the two classes. Given test sample \( x \in \mathbb{R}^d \), a classifier calculates the squared Euclidean distance (denoted by \( \| \cdot \|^2 \)) between \( x \) and the means of the two classes and assigns the class label that the sample \( x \) is closest to. That is, the classifier computes \[ f(x) = \| \mu_{red} - x \|^2 - \| \mu_{green} - x \|^2 \]
and assigns the label red to \( x \) if \( f(x) < 0 \), and green otherwise. Which of the following statements is/are correct?
Consider the following two relations, named Customer and Person, in a database:
Which of the following statements is/are correct?
Consider a database relation \( R \) with attributes \( A, B, C, D, E, F, G \), and having the following functional dependencies:
\[ A \rightarrow BCEF \quad E \rightarrow DG \quad BC \rightarrow A \]
Which of the following statements is/are correct?
Let \( G \) be a simple, unweighted, and undirected graph. A subset of the vertices and edges of \( G \) are shown below.
It is given that \( a - b - c - d \) is a shortest path between \( a \) and \( d \); \( e - f - g - h \) is a shortest path between \( e \) and \( h \); \( a - f - c - h \) is a shortest path between \( a \) and \( h \). Which of the following is/are NOT the edges of \( G \)?
Let \( f : \mathbb{R} \to \mathbb{R} \) be such that \( |f(x) - f(y)| \leq (x - y)^2 \) for all \( x, y \in \mathbb{R} \). Then \[ f(1) - f(0) = \ \hspace{2cm} \quad (Answer in integer) \]
Let \( D = \{ x^{(1)}, x^{(2)}, \dots, x^{(n)} \} \) be a dataset of \( n \) observations where each \( x^{(i)} \in \mathbb{R}^{100} \). It is given that \[ \sum_{i=1}^{n} x^{(i)} = 0. \]
The covariance matrix computed from \( D \) has eigenvalues \( \lambda_i = 100^2 - i \), for \( 1 \leq i \leq 100 \). Let \( u \in \mathbb{R}^{100} \) be the direction of maximum variance with \( u^T u = 1 \). The value of \[ \frac{1}{n} \sum_{i=1}^{n} \left( u^T x^{(i)} \right)^2 = \hspace{2cm} \quad (Answer in integer) \]
A bag contains 5 white balls and 10 black balls. In a random experiment, \( n \) balls are drawn from the bag one at a time with replacement. Let \( S_n \) denote the total number of black balls drawn in the experiment. The expectation of \( S_{100} \) denoted by \( E[S_{100}] \) is \( \_\ \_\ \_\ \_\ \_\ \) (Round off to one decimal place).
Consider the following tables, \textbf{Loan} and \textbf{Borrower}, of a bank.
Query: \[ \pi_{branch\_name, customer\_name} (Loan \bowtie Borrower) \div \pi_{branch\_name}(Loan) \]
where \( \bowtie \) denotes natural join.
The number of tuples returned by the above relational algebra query is \underline{1 (Answer in integer).
Consider the following Python code snippet.
The value printed by the code snippet is \underline{160 (Answer in integer).
Consider the following pseudocode.
The value of \( sum \) output by a program executing the above pseudocode is:
Consider a directed graph \( G = (V,E) \), where \( V = \{0,1,2,\dots,100\} \) and \[ E = \{(i,j) : 0 < j - i \leq 2, for all i,j \in V \}. \]
Suppose the adjacency list of each vertex is in decreasing order of vertex number, and depth-first search (DFS) is performed at vertex 0. The number of vertices that will be discovered after vertex 50 is:
Comments