Build a lightweight AI tool for semantic column mapping and data harmonization

Build a lightweight AI tool for semantic column mapping and data harmonization

Build a lightweight AI tool for semantic column mapping and data harmonization

Upwork

Upwork

Remoto

3 hours ago

No application

About

We need a small, focused tool that uses AI to detect matching columns between two CSV files and outputs a merged CSV and JSON schema. Example: File 1 has Cost_Center and Expense_Total File 2 has Org_Code and Actual_Spend The tool should recognize that Cost_Center ≈ Org_Code and Expense_Total ≈ Actual_Spend, then merge them automatically. No fancy add on's or analytics — just a simple tool that works reliably. Core Requirements: File Input: Upload or read in two CSV files. Detect columns and sample a few values. AI Matching Logic: Use embeddings (OpenAI or Sentence Transformers) to compare column names and short data samples. Output top matches with confidence scores (e.g., 0.92 similarity). Merge & Output: Automatically align matched columns. Export: merged.csv schema.json (describing unified columns and their sources). Basic UI: Simple Streamlit screen to upload two CSVs and download outputs. Text-based output showing matches and confidence scores. Tech Stack: Python 3.x Streamlit (lightweight UI) Sentence Transformers or OpenAI embeddings pandas Estimated Effort Task Hours Setup and environment 4–6 hrs CSV ingestion and parsing 8–10 hrs Embedding & matching logic 14–16 hrs Merge and export (CSV + JSON) 6–8 hrs Simple Streamlit UI 8–10 hrs Testing and cleanup 4–6 hrs Documentation 2–4 hrs Total Estimated Hours: 46–60 hours (Target: 1 week full-time or 2 weeks part-time.) What We’re Looking For Strong Python developer comfortable with embeddings and data similarity. Proven ability to deliver lightweight prototypes quickly. Clear, well-commented code. To Apply Please include: Your total hours estimate and hourly rate. A short explanation of how you’d detect and align semantically similar columns. Example of similar past work (if available). Contract Type Hourly with a ceiling. IP fully assigned to Viderity upon completion.