The philanthropic foundation of a large pharmaceutical company wishes to fund a study to examine whether diet and weight loss can prevent heart attacks and strokes in overweight and obese people with Type 2 diabetes. Generally, the expectation is these strategies would lower blood sugar levels, blood pressure and cholesterol levels.

Near 25 million Americans have Type 2 diabetes. Many are overweight or obese. On average, the disease increases heart disease risk by 2 to 2 ½ times. It seems logical that diet and exercise would help reduce that risk.

A previous study found that an intense diet and exercise program helped prevent overweight or obese people with elevated sugar levels from crossing the line into diabetes. The hope is that a similar protocol could also protect diabetics from heart attacks, strokes and cardiovascular disease.

The researchers were able to identify 5,145 overweight or obese people with Type 2 diabetes and randomly assign each to one of two treatment groups:

(1) Rigorous diet and exercise regimen, or

(2) Sessions providing general health information (control group).

For comparability with the prior study, the same diet and exercise regimen will be used. The diet involves 1,200 to 1,500 calories a day for those weighing less than 250 pounds and 1,500 to 1,800 calories a day for those weighing more. The exercise program is at least 175 minutes of moderate exercise per week. Confounding the study is the possibility that existing treatments (e.g., smoking cessation, statins to reduce cholesterol, blood pressure medications) are so powerful that they could overwhelm the modest effects of weight loss or exercise on cardiovascular risk.

You’re the data mining expert on the project team. Before proceeding to accept the project, the team leader wants answers to the following questions 1 through 5:

1.List the possible independent variables (predictor) and dependent variables (predicted).

2.Formulate key hypotheses to examine the data. List each one with a rationale of why it is relevant.

3.Recommend data mining technique(s) to examine the hypotheses from part C. Why these techniques? 

4.What specific assumptions would you make to scope & limit your study?

5.Briefly discuss managerial & policy implications of the findings from your study.