PySpark SME Needed for Content Creation & Validation
Upwork

Remoto
•3 hours ago
•No application
About
Project Overview: We are looking for an experienced PySpark Subject Matter Expert (SME) to support our content team in creating, reviewing, and validating high-quality technical assessment content. This includes PySpark coding problems, data transformation exercises, conceptual MCQs, and scenario-based tasks that measure real-world data engineering and big data processing skills. Responsibilities Create clear, accurate, and role-relevant PySpark questions, including: RDD and DataFrame operations Transformations & actions Joins, aggregations, window functions UDFs & performance considerations Optimization techniques (caching, partitioning, bucketing) Handling semi-structured data (JSON, Parquet, ORC) Develop PySpark coding challenges based on realistic data engineering scenarios. Review existing content for: Technical correctness and completeness Difficulty alignment (Beginner / Intermediate / Advanced) Clarity, grammar, and logical flow Real-world applicability and relevance Provide reference solutions, expected outputs, dataset setups, and explanations. Required Skills & Expertise Strong hands-on experience with PySpark and Apache Spark. Deep knowledge of: Spark DataFrames, SQL, and RDD APIs Data processing pipelines (batch & streaming preferred) Catalyst optimizer, Tungsten, and query planning basics Performance tuning (shuffle optimization, broadcast joins, caching strategies) File formats (Parquet, Delta, JSON) Cluster execution basics (executors, partitions, stages, tasks) Ability to create high-quality assessment content that tests practical skills, not theoretical knowledge. Prior experience with content development, technical reviewing, or data engineering training is a plus. Ideal Candidate 3–6+ years of experience working with PySpark in production environments. Comfortable designing both simple and complex transformation problems. Able to explain solutions clearly and concisely. Detail-oriented and able to critically evaluate technical content.





