Data scientist to help analyze large-scale web URL data to support a high-impact research project
Upwork

Remoto
•19 hours ago
•No application
About
Description We are professors and scholars at Carnegie Mellon University (https://www.cmu.edu). We are seeking a data science freelancer for an academic research project with high impact. The data science freelancer is expected to start work immediately. Responsibilities The data science freelancer will be involved in the core stage of the research project. The most important task for the data science freelancer will be to use advanced techniques (including rule-based/modern machine learning/natural language processing/large language model techniques) to analyze consumer behavior from large (hundreds of TBs) web Uniform Resource Locator (URL) datasets of URLs visited by online users. Specific tasks include: 1. Use information in the URL itself to classify each URL into the category indicating purchase activity and the category indicating ad click activity; 2. For each user in the dataset, estimate each individual’s total number of purchase activities and ad click activities over a period, based on the information from Step 1. The data science freelancer will meet regularly with the research team—scholar Mr. An and Professors Acquisti and Sen—to discuss their work, exchange ideas, and receive feedback. Qualifications Successful candidates will have demonstrated relevant expertise and experience, including: a. Expertise in web Uniform Resource Locator (URL) analysis b. Strong programming skills in Python, Spark, and SQL c. Experience working with large (TB-sized), complex datasets d. Expertise in using modern machine learning/natural language processing/large language model techniques e. Experience working with Linux/Unix-based systems f. Strong written and oral communication skills in English g. Independent judgement, organization, attention to detail, and patience Application Instructions Applications will be evaluated on a rolling basis, but priority will be given to applications received by December 6th, 2025. Short-listed candidates may be called for an interview. To apply, please submit a single PDF document containing: 1) A brief cover letter, describing: a. Your interest in the position b. Your available start and end dates c. Your programming experience (especially with Python, Spark, and SQL) d. Any relevant data analysis experience (especially with (1) web URL analysis, (2) large, complex datasets, and (3) modern machine learning/natural language processing/large language model techniques) e. Contact information for 2-3 professional references (name, email address, and phone number). 2) Your resume. 3) Evidence of coding ability (if available), e.g., a. Link to GitHub repository with sample projects b. Code sample from a project c. Description of a data analysis project with technical details





