1) Set up libaries
2) Set up for Kaggle API
3) Download data from Kaggle
4) Create a process to find all the PDF files on the drive, write them to list and count the number of files.
5) Run the extraction and token process (Read all the files and words and tokenize.)
6) Specify words to exclude from level of difficulty tagging
7) Create a process to extract keywords by exclusion and ensure the words are unique
8) Run the keyword extraction process
Look up each word for level of difficulty (Twinword API) and write words and difficulty levels to a file
9) Prompt for unirest key
10) Iterate through words looking up the level of difficulty and write to list. (Example set to 100 for now)
11) Create a process to sort and output the data to a .csv file with the most difficult words at the top.
12) Run the process to create the .csv
Referencing this research article and acknowledging this tool: http://gender-decoder.katmatfield.com/about#masculine, look up words for contains male or femal bias words in the job description.
13) Import gender bias words from .csv
.csv test data
14) Transform to lists for matching
Test Data for Bias
15) Process for Male bias words and write matches to file
16) Process for Female bias words and write matches to file