Analysis of Computer Science Career Salaries

Data is gathered from the a user submitted survey from the subreddit /r/CSCareerQuestions. CSCareerQuestions is a subreddit about careers in Computer Science, Computer Engineering, Software Engineering, and related fields. This is gathered from a 2016 Salary Sharing Survey which can be found below.

Survey Responses

Some of the questions asked in the survey include:

  • What is your job title?
  • Company name?
  • What was the major or concentration of your highest level of completed education?
  • What is your gross annualized base pay?
  • Please list any benefits you receive that you feel make an important part of your compensation. (Include monetary benefits such as 401k matching.)
  • How old are you?
  • Gender
For a full list of the questions, you can see the Documentation section at the bottom.

My goal with this project was to:

  1. Gather and collect data
  2. Clean data for inconsistences and inaccuraries
  3. Analyze and visualize statistics of data
  4. Discover patterns between inputs in data
  5. Predict trends using correlations and regression
In [1]:
from IPython.display import HTML
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')
Out[1]:
The raw code for this IPython notebook is by default hidden for easier reading. To toggle on/off the raw code, click here.

I used Pandas to analyze this dataset.

In [2]:
from pandas import Series, DataFrame
import pandas as pd
%pylab inline
Populating the interactive namespace from numpy and matplotlib

1) Gathering and Collecting the Data

I downloaded the dataset from Google Spreadsheet here as a CSV (Comma Separated Values) and opened it using pandas.

In [3]:
data = pd.read_csv('CSCQ.csv')

Let's take a look at what the survey data looks like and the responses.

In [4]:
data.head()
Out[4]:
Timestamp Choose the option that best fits your job type. What is your job title? Choose the option that bests fits your job title. Is this a remote position? Company name? Company purpose/Industry? How much cumulative work experience do you have across all jobs held (even those not relevant to Computer Science)? How much cumulative work experience do you have that is relevant to Computer Science? (use your best judgment for whether certain positions were relevant) How long have you worked for your current employer? ... How old are you? Gender Ethnicity (check all that apply) Current primary country of citizenship? Current primary country of residence? Current primary country of employment? Cost of living plus rent index number (For example, San Francisco is 110.58) In what city do you live and work? (Only provide this if you are comfortable doing so!) Username: Please use commas to separate the tokens.
0 12/15/2016 11:26:09 Intern Software Development Engineering Intern Developer/Engineering Generalist Fully on-site position (never or almost-never ... Zillow NaN 0yr 6mo 0yr 6mo 0yr 0mo ... 19.0 Male South Asian (India, Pakistan, Nepal, etc) NaN NaN NaN 77.95 NaN NaN NaN
1 12/15/2016 11:28:21 Full-time Software Engineer 3 Developer/Engineering Generalist Mostly in office (occasional remote allowed) NaN ecommerce 6yr 0mo 5yr 0mo 0yr 6mo ... 29.0 Female South Asian (India, Pakistan, Nepal, etc) India United States United States NaN NaN NaN Machine learning, Python, ecommerce,
2 12/15/2016 11:36:51 Full-time Software Development Engineer II Developer/Engineering Generalist Fully on-site position (never or almost-never ... Amazon Everything 7yr 0mo 2yr 6mo 2yr 6mo ... 35.0 Male East Asian (China, Japan, Korea, etc) United States United States United States 77.95 Seattle /u/termd NaN
3 12/15/2016 11:45:15 Contractor (primarily one employer) Win7 Break/fix technician Systems Administrator (SysAdmin) Fully on-site position (never or almost-never ... Department of Defense Defense 2yr 0mo 2yr 0mo 1yr 0mo ... 19.0 Male White (Europe, European descent) United States United States United States 87.06 Ft. Meade, MD /u/__rocks NaN
4 12/15/2016 11:46:54 Full-time Mobile Application Developer Mobile app developer/Mobile app engineer Mostly in office (occasional remote allowed) NaN Retail inventory management 4yr 6mo 3yr 8mo 0yr 10mo ... 22.0 Male White (Europe, European descent) United States United States United States 56.62 NaN NaN NaN

5 rows × 65 columns

In [5]:
print 'Rows (Responses): ' + str(len(data))
print 'Columns (Index, Questions): ' + str(len(data.iloc[0]))
Rows (Responses): 1224
Columns (Index, Questions): 65

There were 64 total questions asked, and 1,224 people responded.

2) Cleaning Data

You can read a step-by-step process of how I chose to clean and restructure the data here.

Goals:

  • Find and delete empty entries
  • Look for any unreasonable user submitted responses

Results:

  • Deleted 3 empty rows
  • Dropped 2 responses that seemed highly unlikely
    • A 10 year old who makes \$600 million a year and identifies as an apache helicopter
    • A person who makes $555 million a year at the company Sufferer and has a total of 1 year of working experience

You can download the cleaned csv here.

3) Analyzing and Visualizing Data

What does our data look like? What kind of people represent our data and how are they represented?

In [6]:
data = pd.read_csv('cleaned_data.csv')

What's the age and gender distribution?

In [7]:
fig, (ax1, ax2) = subplots(nrows=1,
                           ncols=2,
                           figsize=(16, 8)
                          )

# Age
data['How old are you?'].hist(ax=ax1, bins=20)
ax1.set_xlabel('Age', size=15)
ax1.set_ylabel('Frequency', size=15)
ax1.set_title('Histogram of Age', size=20)

# Gender
data['Gender'].value_counts().plot(ax=ax2, kind='pie', fontsize=15)
ax2.set_title('Gender Distribution', size=20)

legend(loc='best')
Out[7]:
<matplotlib.legend.Legend at 0x10a187f50>
In [8]:
city = data['In what city do you live and work? (Only provide this if you are comfortable doing so!)']
city.value_counts()[:20].sort_values(ascending=True).plot(kind='barh', figsize=(12, 8), fontsize=15)
xlabel('Number of Users', size=15)
ylabel('City', size=15)
title('Most Popular Cities to Live In', size=20)
Out[8]:
Text(0.5,1,u'Most Popular Cities to Live In')

What kind of jobs do people have?

In [9]:
job_types = data['Choose the option that best fits your job type.'].value_counts()[:4].sort_values(ascending=True)
job_types.plot(kind='barh', figsize=(10, 6), fontsize=15)
xlabel('Number of People', size=15)
ylabel('Job Type', size=15)
title('Job Types of Users', size=20)
Out[9]:
Text(0.5,1,u'Job Types of Users')
In [10]:
company = data['Company name?']
company.value_counts()[:20].sort_values(ascending=True).plot(kind="barh", figsize=(14, 8), fontsize=15)
title('Most Popular Companies Reddit Users Work In')
xlabel('Number of Users Working at Company', size=15)
ylabel('Company', size=15)
title('Most Popular Companies to Work For', size=20)
Out[10]:
Text(0.5,1,u'Most Popular Companies to Work For')

What percentage of computer scientists make over 100k?

In [11]:
over_100k = data[data['total_compensation_in_usd'] >= 100000]
under_100k = data[data['total_compensation_in_usd'] < 100000]
s = pd.Series([len(over_100k), len(under_100k)], index=['Total compensation over 100k', 'Total compensation under 100k'], name='total compensation')
s.plot.pie(autopct= '%2.2f' + '%%', fontsize=15, figsize=(8, 8))
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x112f8e510>

What are the most common salaries?

In [12]:
annual_salary = data['annual_salary_in_usd']
total_comp = data['total_compensation_in_usd']
In [13]:
fig, (ax1, ax2) = subplots(nrows=1,
                           ncols=2,
                           figsize=(20, 8),
                          )

annual_salary.hist(ax=ax1, bins=np.linspace(0, 250000, 25))
ax1.set_xlabel('Annual Salary (USD)', size=15)
ax1.set_ylabel('Number of People', size=15)
ax1.set_title('Histogram of Annual Salary ($)', size=20)

total_comp.hist(ax=ax2, bins=np.linspace(0, 250000, 25))
ax2.set_xlabel('Total Compensation (USD)',size=15)
ax2.set_ylabel('Number of People',size=15)
ax2.set_title('Histogram of Total Compensation ($)', size=20)
Out[13]:
Text(0.5,1,u'Histogram of Total Compensation ($)')

How much more do people make on top of their annual salary?

In [14]:
len(data[data['annual_salary_in_usd'] > 0]), len(data[data['total_compensation_in_usd'] > 0])

print 'Median amount extra on top of total compensation: $' + str((data['total_compensation_in_usd'] - data['annual_salary_in_usd']).median())
Median amount extra on top of total compensation: $7500.0

How do intern salaries compare?

In [15]:
fig, (ax1) = subplots(nrows=1,
                           ncols=1,
                           figsize=(10, 10)
                          )

interns = data[data['Choose the option that best fits your job type.'] == 'Intern']
interns['annual_salary_in_usd'].hist(ax=ax1, bins=10)
ax1.set_xlabel('Salary', size =15)
ax1.set_ylabel('Frequency', size=15)
ax1.set_title('Interns - Annual Salary ($)', size=20)
Out[15]:
Text(0.5,1,u'Interns - Annual Salary ($)')

How much do interns make hourly?

In [16]:
data['hourly'] = data['What is your base hourly rate? (If the number changes regularly, pick a value that best represents the average rate.)']

interns['What is your base hourly rate? (If the number changes regularly, pick a value that best represents the average rate.)'].hist(bins=15, figsize=(10,10))
xlabel('Hourly Rate', size=15)
ylabel('Frequency', size=15)
title('Interns - Base Hourly Rate',size=20)
Out[16]:
Text(0.5,1,u'Interns - Base Hourly Rate')

Is there a pay difference between males and females?

In [17]:
female_data = data[data['Gender'] == 'Female']
male_data = data[data['Gender'] == 'Male']
In [18]:
f_pay = female_data['annual_salary_in_usd']
print 'Average female salary:', f_pay.mean()
print 'Median female salary:', f_pay.median()
Average female salary: 82426.6710847
Median female salary: 77500.0
In [19]:
m_pay = male_data['annual_salary_in_usd']
print 'Average male salary:', m_pay.mean()
print 'Median male salary:', m_pay.median()
Average male salary: 88545.9885882
Median male salary: 80000.0

Which majors in college end up making the most?

In [20]:
print "Some of the most common majors:"
data['What was the major or concentration of your highest level of completed education?'].value_counts()[:15]
Some of the most common majors:
Out[20]:
Computer Science                       300
Computer Engineering                    25
Computer science                        21
Software Engineering                    17
Mathematics                             11
Information Technology                  10
Computer Information Systems            10
CS                                      10
Electrical Engineering                   9
Computer Science                         8
Physics                                  7
Information Systems                      6
Electrical and Computer Engineering      5
Computer Science and Engineering         5
computer science                         5
Name: What was the major or concentration of your highest level of completed education?, dtype: int64
In [21]:
# Shorten labels of questions
data['major'] = data['What was the major or concentration of your highest level of completed education?']
major = data['What was the major or concentration of your highest level of completed education?']
In [22]:
# Get CS majors
def f(s):
    if type(s) == float:
        return False
    return 'computer science' in s.lower() or 'cs' in s.lower()

cs_mask = major.map(f)
cs_majors = data[cs_mask]
In [23]:
# Get information majors (MIS, information security)
def f(s):
    if type(s) == float:
        return False
    return 'information' in s.lower()

i_mask = major.map(f)
i_majors = data[i_mask]
In [24]:
# Get engineering majors
def f(s):
    if type(s) == float:
        return False
    return 'engineer' in s.lower()

e_mask = major.map(f)
e_majors = data[e_mask]
In [25]:
# get other majors (without engineer, information, computer, or cs in major)
def f(s):
    if type(s) == float:
        return False
    
    cs = 'computer' not in s.lower() or 'cs' not in s.lower()
    information = 'information' not in s.lower()
    engineer = 'engineer' not in s.lower()
    
    # computer engineer -> False and True and False
    return cs and information and engineer

other_mask = major.map(f)
other_majors = data[other_mask]
In [26]:
fig, ((ax1, ax2,), (ax3, ax4)) = subplots(nrows=2,
                                     ncols=2,
                                     sharey=False,
                                     figsize=(20, 20))

cs_majors['annual_salary_in_usd'].hist(ax=ax1, bins=np.linspace(0, 200000, 25))
ax1.set_title('Computer Science Majors Annual Salary', size=20)
ax1.set_xlabel('Annual Salary USD', size=15)
ax1.set_ylabel('Frequency', size=15)

i_majors['annual_salary_in_usd'].hist(ax=ax2, bins=np.linspace(0, 200000, 25))
ax2.set_title('Information Majors Annual Salary', size=20)
ax2.set_xlabel('Annual Salary USD', size=15)
ax2.set_ylabel('Frequency', size=15)

e_majors['annual_salary_in_usd'].hist(ax=ax3, bins=np.linspace(0, 200000, 25))
ax3.set_title('Engineering Majors Annual Salary', size=20)
ax3.set_xlabel('Annual Salary USD', size=15)
ax3.set_ylabel('Frequency', size=15)

other_majors['annual_salary_in_usd'].hist(ax=ax4, bins=np.linspace(0, 200000, 25))
ax4.set_title('Other Majors Annual Salary', size=20)
ax4.set_xlabel('Annual Salary USD', size=15)
ax4.set_ylabel('Frequency', size=15)
Out[26]:
Text(0,0.5,u'Frequency')

Does negotiating make a difference?

In [27]:
data['negotiate'] = data['Did you negotiate your compensation?']
negotiate = data['Did you negotiate your compensation?']

yes = data[negotiate == 'Yes']
no = data[negotiate == 'No']
print 'Did Negotiate: mean total comp = $' + str(yes['total_compensation_in_usd'].mean()) + ' median = $' + str(yes['total_compensation_in_usd'].median())
print 'Did Negotiate: mean total comp = $' + str(no['total_compensation_in_usd'].mean()) + ' median = $' + str(no['total_compensation_in_usd'].median())
Did Negotiate: mean total comp = $112954.620579 median = $97000.0
Did Negotiate: mean total comp = $93992.542208 median = $80000.0

What are the highest paid positions?

In [28]:
#top_salaries = data['annual_salary_in_usd'].sort_values(ascending=False)[:20]
#top_positions = data.iloc[top_salaries.index.values]['What is your job title?']
#pd.concat([top_positions, top_salaries], axis=1)

top_comp = data['total_compensation_in_usd'].sort_values(ascending=False)[:20]
top_positions = data.iloc[top_comp.index.values]['What is your job title?']
df = pd.concat([top_positions, top_comp], axis=1)
df
Out[28]:
What is your job title? total_compensation_in_usd
143 Software engineer 800000.0
1058 Software Engineering Intern 600000.0
454 Software Manager 550000.0
356 Software Developer 550000.0
429 Staff Software Engineer 460000.0
405 Software Architect 445000.0
144 Developer 425000.0
174 Software Engineer 420000.0
685 Sr. UIE Developer 400000.0
426 Principal 400000.0
646 Senior Software Engineer 397000.0
373 Senior Software Engineer 360000.0
925 SDE 350000.0
1017 Senior Software Engineer 350000.0
504 Software Engineer 3 350000.0
147 Senior Software Engineer 345000.0
1215 Software Engineer 325000.0
521 Staff Engineer 308000.0
448 Product Managet 305000.0
491 NaN 300000.0

5) Predict and forecast

What trends will continue to the future?

Does experience correlate with higher salary?

In [29]:
# Special packages
import statsmodels.api as sm
from patsy import dmatrices
/anaconda2/lib/python2.7/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools
In [30]:
experience = data['How much cumulative work experience do you have that is relevant to Computer Science? (use your best judgment for whether certain positions were relevant)'] 

def get_months(s):
    # 2yr 10mo
    lst = s.split()
    year = int(lst[0].split('y')[0])
    month = int(lst[1].split('m')[0])
    return (year * 12 + month)

experience_in_months = experience.map(get_months) 

y, X = dmatrices('annual_salary_in_usd ~ experience_in_months', data=data, return_type='dataframe')

model = sm.OLS(y, X)
result = model.fit()
print result.summary()
                             OLS Regression Results                             
================================================================================
Dep. Variable:     annual_salary_in_usd   R-squared:                       0.066
Model:                              OLS   Adj. R-squared:                  0.065
Method:                   Least Squares   F-statistic:                     74.80
Date:                  Sun, 01 Apr 2018   Prob (F-statistic):           1.89e-17
Time:                          08:26:56   Log-Likelihood:                -13212.
No. Observations:                  1067   AIC:                         2.643e+04
Df Residuals:                      1065   BIC:                         2.644e+04
Df Model:                             1                                         
Covariance Type:              nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
Intercept             7.363e+04   2479.140     29.700      0.000    6.88e+04    7.85e+04
experience_in_months   311.6902     36.038      8.649      0.000     240.976     382.405
==============================================================================
Omnibus:                     2061.520   Durbin-Watson:                   2.045
Prob(Omnibus):                  0.000   Jarque-Bera (JB):          4976681.024
Skew:                          13.924   Prob(JB):                         0.00
Kurtosis:                     336.414   Cond. No.                         96.4
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The R squared value is extremely low, and after playing around with different variables, the data seems to be too narrow to make predictions. Most of the salaries are within the same range and most people are around the same age

In [31]:
scatter(x=experience_in_months, y=data['annual_salary_in_usd'])
ylim(0, 200000)
Out[31]:
(0, 200000)
In [32]:
data['How old are you?'].describe()
Out[32]:
count    1122.000000
mean       26.423351
std         5.095686
min        15.000000
25%        23.000000
50%        25.000000
75%        29.000000
max        51.000000
Name: How old are you?, dtype: float64
In [33]:
data['annual_salary_in_usd'].describe()
Out[33]:
count    1.067000e+03
mean     8.865442e+04
std      5.974716e+04
min      0.000000e+00
25%      6.172200e+04
50%      8.000000e+04
75%      1.100000e+05
max      1.500000e+06
Name: annual_salary_in_usd, dtype: float64
In [34]:
data['total_compensation_in_usd'].describe()
Out[34]:
count      1219.000000
mean     100218.604941
std       69592.767303
min           0.000000
25%       60000.000000
50%       85000.000000
75%      130000.000000
max      800000.000000
Name: total_compensation_in_usd, dtype: float64

Instead of doing predictions with this rather small dataset, I try to find more interesting observations.

Which companies pay the highest?

In [35]:
company_compensation = data.groupby(['Company name?'])[['total_compensation_in_usd']].mean()
company_compensation['total_compensation_in_usd'].sort_values(ascending=False)[:50]
Out[35]:
Company name?
Verite Group                             600000.000000
Small consultancy that I run             400000.000000
Two Sigma                                350000.000000
Twitch/Amazon                            350000.000000
Rubrik                                   300000.000000
X                                        270000.000000
Apex Systems, SNG                        230000.000000
Large Construction Company               230000.000000
Airbnb                                   227666.666667
Palantir                                 225000.000000
Apple, Inc.                              219150.000000
Slack                                    210000.000000
Shopify                                  205375.000000
Facebook                                 205321.428571
Mozilla                                  200000.000000
palantir                                 200000.000000
microsoft                                200000.000000
PayScale                                 195000.000000
Google                                   191569.113030
FireEye                                  185000.000000
Stitch Fix                               185000.000000
LinkedIn                                 183800.000000
Amazon.com                               181250.000000
SONY Playstation                         180000.000000
AWS                                      180000.000000
Apple                                    178714.285714
Morgan Stanley                           175125.000000
Uber                                     173600.000000
Salesforce                               172400.000000
Affirm                                   170000.000000
Facebook Inc                             170000.000000
Deseret Digital Media                    168480.000000
Schoology                                165000.000000
Amazon Lab126                            165000.000000
Toyota Connected                         160000.000000
Ancestry                                 160000.000000
Amazon.com Inc                           160000.000000
Datadog                                  160000.000000
VMware                                   160000.000000
Snapchat                                 160000.000000
Bloomberg                                159166.666667
US Digital Service                       158600.000000
Bank of America Merrill Lynch            158000.000000
Arete Associates                         154000.000000
Amazon Web Services                      150958.000000
OpenMarket                               150000.000000
Dropbox                                  150000.000000
Cleary, Gottlieb, Steen, and Hamilton    150000.000000
Raytheon Centers of Innovation           150000.000000
n/a                                      150000.000000
Name: total_compensation_in_usd, dtype: float64

Which industries pay the highest?

In [36]:
industry_comp = data.groupby(['Company purpose/Industry?'])[['total_compensation_in_usd']].mean()
# type(industry_comp)
industry_comp['total_compensation_in_usd'].sort_values(ascending=False)
Out[36]:
Company purpose/Industry?
Wev                                                                                                                                800000.000000
Literally everything                                                                                                               397000.000000
Consumer Electronics                                                                                                               362500.000000
Video streaming                                                                                                                    350000.000000
World domination                                                                                                                   350000.000000
Common knowledge                                                                                                                   345000.000000
International Organisation                                                                                                         308000.000000
Cloud storage and backup                                                                                                           300000.000000
Moonshots                                                                                                                          270000.000000
Web                                                                                                                                238000.000000
Google                                                                                                                             230750.000000
Search and Ads                                                                                                                     230000.000000
Consulting companies that act as the middleman/lend me out to configure/customize the ServiceNow platform for various clients.     230000.000000
Construction                                                                                                                       230000.000000
Social Network                                                                                                                     230000.000000
Data Integration                                                                                                                   225000.000000
Digital Advertising                                                                                                                221328.000000
Insurance/Healthcare                                                                                                               215000.000000
Mobile app                                                                                                                         215000.000000
Enterprise productivity software                                                                                                   210000.000000
Hedge Fund                                                                                                                         210000.000000
E-commerce and cloud computing                                                                                                     209000.000000
Security                                                                                                                           207491.666667
SAAS Software                                                                                                                      205000.000000
Storage                                                                                                                            202500.000000
Financial Services / News / Analytics                                                                                              200000.000000
Education, STEM                                                                                                                    200000.000000
Data Processing                                                                                                                    200000.000000
B2B Software                                                                                                                       195000.000000
Marketplace                                                                                                                        195000.000000
                                                                                                                                       ...      
Software development                                                                                                                15523.200000
Develop document management software                                                                                                15000.000000
IT consultancy                                                                                                                      14784.000000
Online Education Systems                                                                                                            14500.000000
Time management                                                                                                                     13552.000000
Banking software                                                                                                                    12000.000000
Bank                                                                                                                                11000.000000
Computer Security                                                                                                                   10500.000000
Online retailing / CRM                                                                                                               9860.928000
Local, N/A                                                                                                                           9600.000000
Data Aggregation                                                                                                                     9000.000000
University Residences                                                                                                                8580.000000
Telco + Mobile Banking                                                                                                               8241.970000
Software architecture                                                                                                                7800.000000
Game Publishing                                                                                                                      7750.000000
Software Development                                                                                                                 7392.000000
making security great again                                                                                                          6652.800000
Manufacturing Electronics                                                                                                            5600.000000
Audio Video Domain                                                                                                                   5250.000000
Parallel Programming and High-performance computing                                                                                  4062.960000
Toolbar / marketing                                                                                                                  4000.000000
Software consultancy (outsourcing)                                                                                                   3820.500000
Automotive industry                                                                                                                  3696.000000
Enterprise software (ERP)                                                                                                            3500.000000
City IT Department                                                                                                                   1600.000000
Data collection and processing                                                                                                         86.020000
Accounting/ Audit/ Tax                                                                                                                  0.000000
Traffic Data                                                                                                                            0.000000
Medical Technology                                                                                                                      0.000000
Web Hosting                                                                                                                             0.000000
Name: total_compensation_in_usd, Length: 592, dtype: float64

Which financial company is most popular?

In [37]:
finance_df = data[data['Company purpose/Industry?'] == 'Finance']
finance_df['Company name?'].value_counts()
Out[37]:
Capital One             5
Fidelity Investments    2
Blackrock               1
Suncorp                 1
Two Sigma               1
KPIT Technologies       1
Vanguard                1
Intuit                  1
CME                     1
Bank of America         1
One main financial      1
JP Morgan Chase         1
Credit Kraken           1
Name: Company name?, dtype: int64

Which industries have the youngest and oldest workers?

In [38]:
industry_age = data.groupby(['Company purpose/Industry?'])[['How old are you?']].agg(['mean', 'count'])
industry_age.dropna(inplace=True)
industry_age['How old are you?']['mean'].sort_values()
Out[38]:
Company purpose/Industry?
City IT Department                                             15.0
Data Aggregation                                               18.0
Data Science, Research, Web Development, Computation Sales.    18.0
Enterprise software (ERP)                                      18.0
Online retailing / CRM                                         19.0
Consumer Authentication                                        19.0
Fintech                                                        19.0
Oil and gas                                                    19.0
Military                                                       19.0
Oil                                                            20.0
OS etc.                                                        20.0
Music streaming                                                20.0
"We are a camera company."                                     20.0
MOOC                                                           20.0
Aviation / Consumer Electronics                                20.0
Lithography systems supplier                                   20.0
Bank                                                           20.0
Database                                                       20.0
MarktePlace                                                    20.0
Logistics Software                                             20.0
Music Industry                                                 20.0
Catalog of ideas                                               20.0
Phamecutical                                                   20.0
Social                                                         20.0
Software Development                                           20.0
Software architecture                                          20.0
Software/Hardware                                              20.0
Tech/Retail                                                    20.0
Web Analytics                                                  20.0
University Residences                                          20.0
                                                               ... 
Consumer Credit                                                36.0
We make cybers                                                 37.0
Getting data about anything                                    37.0
Mobile messaging cloud SaaS provider                           37.0
Financial Services, Media                                      37.0
Common knowledge                                               37.0
Education, STEM                                                37.0
Waste Handling                                                 37.0
Marketing/Retail/Point-Of-Sale                                 37.0
Energy Grid Management                                         38.0
Govt. Contractor                                               38.0
CRM                                                            38.5
Toolbar / marketing                                            39.0
Version Control                                                39.0
Job Board                                                      39.0
healthcare devices                                             39.0
oil and gas                                                    40.0
Storage                                                        40.0
government                                                     42.0
Wev                                                            44.0
MSP                                                            44.0
Medical Billing                                                44.0
Software Development / IT                                      45.0
All                                                            45.0
SaaS Database Software                                         45.0
Bulk mailing                                                   46.0
Saas website analytics                                         47.0
Banking/Finance                                                48.0
Energy Analytics/Customer Engagement                           49.0
Medical Software                                               51.0
Name: mean, Length: 564, dtype: float64

Conclusion

From this dataset of around 1000 entries, it most accurately represents the reddit population that also has a career in computer science. Because the size of the data is relatively small, there may be instances where the statistics are skewed and not completely accurately displayed.

Documentation

The questions asked were:

  1. Choose the option that best fits your job type.
  2. What is your job title?
  3. Choose the option that bests fits your job title.
  4. Is this a remote position?
  5. Company name?
  6. Company purpose/Industry?
  7. How much cumulative work experience do you have across all jobs held (even those not relevant to Computer Science)?
  8. How much cumulative work experience do you have that is relevant to Computer Science? (use your best judgment for whether certain positions were relevant)
  9. How long have you worked for your current employer?
  10. How long have you held your current position?
  11. Please provide any context about your employment track. Did you work in a different field, before? Have you worked your way up in one company or job hopped a bunch? Any and all details are appreciated.
  12. Are you currently enrolled in some form of school? Please select the option that best matches your stage of current enrollment. (If you are between terms select the option that best matches the term you will begin in the near future.)
  13. What is the major or concentration of your current enrollment?
  14. Choose the option that best matches the major or concentration of your current enrollment.
  15. Have you graduated or completed any form of education already? Select the option that best matches your highest level of completed education.
  16. What was the major or concentration of your highest level of completed education?
  17. Choose the option that best matches the major or concentration of your highest level of completed education.
  18. Please use this field to explain your education track and provide as much context as you are able. Please include any information you feel will help someone better understand your path such as the institutions you attended.
  19. Which currency will your compensation be represented in?
  20. In which way are you compensated?
  21. What is your base hourly rate? (If the number changes regularly, pick a value that best represents the average rate.)
  22. In the average week how many hours do you work for your base hourly rate?
  23. What is your gross annualized base pay? (Multiply your monthly pay by 12.)
  24. Non-compensated overtime? In the average week how many overtime hours (after 40 hrs/wk) do you work WITHOUT any additional compensation?
  25. Compensated overtime? Do you receive ADDITIONAL hourly compensation for working overtime (after 40 hrs/wk)?
  26. What is your overtime (after 40 hrs/wk) hourly rate? (If the number changes regularly, pick a value that best represents the average rate.)
  27. In the average week how many hours do you work for your overtime (after 40 hrs/wk) rate?
  28. Did you receive a signing bonus when you joined your current employer? Please provide the amount you received.
  29. Do you receive a yearly bonus? If so, please provide your best estimate of it for an average year.
  30. Is your yearly bonus performance-based?
  31. If applicable, please provide any context you can about your signing and/or yearly bonus amounts.
  32. Do you receive a monthly housing stipend or free corporate housing? Please provide the monthly value of the free housing or the monthly stipend amount. (For lump sum housing stipends, divide the total by the number of months covered.)
  33. Do you receive any company equity? What is the approximate value of the equity you receive?
  34. Is the equity in your compensation publicly traded?
  35. Please provide any further context about the company equity. RSUs or Options? Employer market capitalization? To what percentage of the entire company does your equity equate? What is the vesting schedule? What series of funding did the startup recently complete? If the equity is private, how likely is the company to IPO or sell?
  36. Do you receive equity refreshers? (more equity grants/options for staying with the company in addition to the initial grant/options)
  37. If applicable, please explain the refreshers and how they work at your particular employer. How much will be granted? When? How does performance impact refreshers? How do refreshers vest?
  38. Please list any benefits you receive that you feel make an important part of your compensation. (Include monetary benefits such as 401k matching.)
  39. Did you negotiate your compensation?
  40. Please provide any context about the negotiation process. What was your initial offer? How were you able to negotiate it? If you had any offers from other companies, what were they and did they help you in the negotiation process?
  41. By your own calculations, what is your total annual income when you include all forms of compensation? (Include all stock value, overtime pay, 401k match and any other form of "cash equivalent" compensation in the sum.)
  42. In minutes, how long is the average commute one way? (If you live and work in the same building, 0 is a sufficient answer.)
  43. How many hours do you work in a normal week?
  44. How many sick days are you allowed per year? (Use -1 if your employer allows for "infinite" sick days.)
  45. How many discretionary vacation days are you allowed per year? (Use -1 if your employer allows for "infinite" discretionary vacation days.)
  46. For how many days in an average year does the office "close" due to a holiday?
  47. How many days out of the year do you expect to be in "crunch time"?
  48. What does "crunch time" look like at your office? Details here are much appreciated!
  49. How many days out of the year do you expect to be on-call?
  50. What does on-call look like at your office? The more information, the better.
  51. Do you receive incentives (additional compensation, time off, etc) for a successful "crunch time" or on-call? Even if this is rare, please provide any details you have.
  52. In the average year, how many nights, weekends or vacation/holiday days do you anticipate needing to do any work at all?
  53. Do you have a noncompete or other legally binding contract preventing employment outside of work that you are required to adhere to?
  54. If you do have a noncompete, can you provide any context? What does it cover? What does it not cover? Where is it legally enforceable? What are the repercussions for breaking it?
  55. How old are you?
  56. Gender
  57. Ethnicity (check all that apply)
  58. Current primary country of citizenship?
  59. Current primary country of residence?
  60. Current primary country of employment?
  61. Cost of living plus rent index number (For example, San Francisco is 110.58)
  62. In what city do you live and work? (Only provide this if you are comfortable doing so!)
  63. Username:
  64. Please use commas to separate the tokens.

Data was downloaded from Google Sheets on 03/30/2018.

I love hearing feedback! If you didn't like something let me know and feel free to reach out to me at murraylee10 at gmail .com