# Data Processing and Data Analyzing Techniques Assignment

9 Pages2047 Words399 Views
Part 1 Obtaining Compensation DatasetsLogin in WRDSDownload “Compustat/ExecuComp/ Annual Compensation data. oDownload the entire database by CUSIP for the (fiscal) years 2006-2015. You need the variables CUSIP and Issue Number; Executive IDnumber; CEOANN -- Annual CEO Flag; SALARY -- Salary (\$); TDC1 --Total Compensation (Salary + Bonus + Other An; OPTION_AWARDS –Value of Option Awards – FAS 123R (\$); YEAR -- Fiscal Year; CEOgender—Gender; Date became CEOoUse None Compression Type and the Output and Date Format moreappropriate to you. You may save this query if you prefer.o[NOTE: Make sure that you handle cusip identifiers as a stringvariable. Excel (and other non-advanced software) will have anobsession in converting the all number identifiers to numbers insteadof text. This would imply that zeros on the left would be deleting,creating possible problems in your matching.i.e. make sure that cusip “0000360” is not converted to cusip “360”)Keep only CEOs data: variable ceoann has to be equal to “CEO”.keep if ceoann=="CEO" First drop all observations generated by firms with more than one CEO in agiven year, and by CEOs work in more than one firm in a given year.sort cusip year execidby cusip year: gen count_obs = [_N]drop if count_obs > 1 // Firm has more than 1 CEO in a year.drop count_obssort execid year cusipby execid year: gen count_obs = [_N]drop if count_obs > 1 // CEO works in more than 1 firm in ayear.drop count_obsCalculate the variables ln_salary, ln_totalcomp, ln_option. ln_salary is thelogarithm of salary, ln_totalcomp is the logarithm of tdc1, ln_option Is thelogarithm of option_awards.
gen ln_salary = ln(salary) gen ln_totalcomp = ln(tdc1)gen ln_option = ln(option_awards) Generate 8-digit CUSIP identifier from the first 8 digits in the CUSIPgen cusip_8d = cusipWinsorize the three variables at the 1 %, 99% to prevent extreme values.That means that you need to change the values above 99% into 99% andthe value below 1% to 1%. Importantly this the winsorization processshould take once every year in case there is any time trend. There is anavailable package for this process which is winsor2. You could install itaccording to the following code.ssc install winsor2 //not every package is directlyavailable, you might want to search it first, just type findit***(so easy!) and then install it. Now you have the package, let’s kick out the outliers.winsor2 ln_salary, suffix(_w) cuts(1 99) by(year)//here yougenerate a new variable, ln_salary_wFor such kind of repeating work, we could use some loop. Try to google“foreach stata” and see whether you could make it more efficient.Let’s call this database, “CEO”.Hint for Q1 &2 (google “summary statistics Stata”)Q1: [0.5 point] For “CEO”, report the following and [0.5 point] explain howwinsorize process changes your data.VariablesObs.MeanStandardDeviationMinimumMaximumSkewnessKurtosisln_salary???????ln_salary_w???????ln_totalcomp???????ln_totalcomp_w???????ln_option???????

## End of preview

Want to access all the pages? Upload your documents or become a member.

### Support

#### +1 306 205-2269

Chat with our experts. we are online and ready to help.