乐趣区

关于算法:BUCI057H7问题应用

BUCI057H7 page 2 of 9

Question 1 (10 marks)
Provost & Fawcett have defined Data Science in terms of 9 computational problems.
Define the Similarity problem in general and propose examples on multi-dimensional data.
Your answer:

BUCI057H7 page 3 of 9

Question 2 (10 marks)
Spectral analysis can be used to reduce data dimensionality. Explain why dimensionality reduction is
desirable and how Spectral analysis can achieve it.
Your answer:

BUCI057H7 page 4 of 9

Question 3 (10 marks)
Over D = {a, b, c, d, e}, frequency of observations gives us the following distribution:
P = Pr[X=xi] = [3/8, 3/16, 1/8, 1/8, 3/16].
To simplify calculations, however, we decide to adopt the“simpler”distribution
Q = Pr[X=xi] = [1/2, 1/8, 1/8, 1/8, 1/8].
Compute the Kullback-Leibler divergence between P and Q, defined as

To simplify calculations, assume that log23 (logarithm in base 2 of 3) equals 1.585 and show the
process by which you calculated the divergence.
Your answer:

BUCI057H7 page 5 of 9

Question 4 (10 marks)
Define the decision trees employed in the Supervised Segmentation task and describe in words how
the CART algorithm can recursively build a decision tree for a given dataset of labeled Yes/No
examples.
Your answer:

BUCI057H7 page 6 of 9

Question 5 (10 marks)
Sports Rating & Ranking: if a function S(i) measures the strength of a team/player i attending a
tournament, how could we predict the outcome of a match between, say, team i and team j?
What method would you use, among those seen in class, to extract function S(i) from a dataset of past
results?
Your answer:

BUCI057H7 page 7 of 9

Question 6 (10 marks)
Define the Kernel method for creating a feature space and discuss why it is used in combination with
Support Vector Machines to classify data.
Your answer:

BUCI057H7 page 8 of 9

Question 7 (10 marks)
Define the Degree sequence of networks, explain why the sum of degrees is always even and discuss
its usage in network analysis.
Your answer:

BUCI057H7 page 9 of 9

Question 8 (10 marks)
Ranking in Networks: what is the model of i) importance and ii) human navigation of Web pages that
underpins PageRank?

退出移动版