At Fujitsu, our purpose is to make the world more sustainable by building trust in society through innovation. Founded in Japan in 1935, Fujitsu has been a pioneer in technology and innovation for decades. Today, as a world-leading digital transformation partner, we are committed to transforming business and society in the digital age.
With approximately 130,000 employees across over 50 countries, Fujitsu offers a broad range of products, services, and solutions. We collaborate with our customers to co-create solutions that drive enterprise-wide digitalization while actively working to address social issues and contribute to the United Nations Sustainable Development Goals (SDGs).
Job Title: Azure Data Engineer – Azure Data Factory, Azure Data Lake, Azure Databricks
Experience: 3+ years
Location: Pune, India
(Hybrid/Remote as per project need)
Shifts: 6:30 AM to 3:30 PM IST
(Client shift may apply)
Role Summary
You will build and support Azure-based data platforms.
You will create pipelines for ingestion, transformation, and analytics.
You will manage data lake and warehouse layers with strong data modeling.
You will enable AI/ML workloads by preparing quality datasets and supporting Azure ML.
Primary Skills (Must Have)
- Azure Data Factory (ADF) – pipeline design, triggers, monitoring, error handling
- Azure Databricks (Spark / PySpark) – transformations, performance tuning, Delta (if used)
- Azure Data Lake Storage (ADLS Gen2) – lake design, folder structure, partitioning
- Azure Synapse Analytics – analytics/warehouse concepts and data serving
- SQL (Advanced) – complex queries, validation, tuning
- Python – data processing + scripting (ML exposure is a plus)
- Data Modeling & ETL – strong warehouse and dimensional modeling understanding
- Integration of multiple Azure services end-to-end
Key Responsibilities
1) Data Ingestion & Orchestration (Azure Data Factory)
- Design and build scalable ADF pipelines for batch and incremental loads.
- Configure linked services, datasets, triggers, and integration runtime.
- Implement retry logic, alerts, and failure handling.
- Maintain pipeline standards, parameters, and reusable templates.
- Monitor daily runs and fix failures with proper RCA.
2) Data Lake Design & Storage Management (ADLS + Azure SQL)
- Design data lake layers: raw, staged, curated, consumption.
- Ensure correct formats like Parquet/Delta/CSV based on need.
- Apply partitioning and naming standards for performance and clarity.
- Manage curated datasets in Azure SQL Database when required.
- Ensure data availability, retention, and lifecycle policies.
3) Data Transformation & Big Data Processing (Databricks)
- Develop transformations using PySpark / Spark SQL in Databricks.
- Implement data quality checks and reconciliation rules.
- Optimize cluster usage, caching, and job performance to reduce cost.
- Implement incremental processing and upsert patterns (MERGE) if needed.
- Schedule and run Databricks jobs through ADF or job workflows.
4) Data Warehousing & Analytics (Synapse)
- Build and support analytics solutions using Azure Synapse.
- Design warehouse objects and implement loading strategies.
- Support query tuning and performance improvement.
- Publish curated, trusted datasets for BI and downstream apps.
5) Data Modeling & ETL Design
- Create logical and physical data models for reporting and analytics.
- Apply star schema / dimensional modeling where needed.
- Maintain source-to-target mapping and transformation rules.
- Ensure data consistency across lake, warehouse, and BI layers.
6) AI/ML Enablement (Azure Machine Learning)
- Support ML pipelines through feature preparation and dataset readiness.
- Work with Data Scientists for training and deployment support.
- Build Python scripts for model experiments when required.
- Use libraries like Scikit-learn (preferred), and TensorFlow/PyTorch (good to have).
- Track model inputs, outputs, and repeatable pipeline execution.
7) SQL, Python & Engineering Practices
- Write optimized SQL for validation, reconciliation, and transformations.
- Write clean Python code for automation and data processing.
- Use Git with good branching and PR review practices.
- Support CI/CD practices for data pipelines if project has it.
8) Security, Compliance & Governance
- Follow best practices for secure data handling and access control.
- Work with RBAC, managed identity, and Key Vault where applicable.
- Ensure compliance with client policies and audit needs.
- Implement encryption, access boundaries, and safe data sharing.
9) Agile Delivery & Production Support
- Work in Agile/Scrum mode and deliver stories on time.
- Provide estimates and daily updates to stakeholders.
- Support production issues and perform RCA with prevention steps.
- Maintain runbooks and operational documents.
Secondary Skills (Good to Have)
- Power BI – dataset modeling, dashboards, refresh, performance basics
- Azure Functions / Logic Apps – automation and integration support
- Azure Cognitive Services – awareness for AI use cases (optional)
- Big data background: Hadoop basics, strong Spark understanding
- Monitoring tools: Log Analytics / Azure Monitor (as used in project)
- DevOps exposure: Azure DevOps pipelines for data workloads
Tools / Technologies (Typical)
- Azure: ADF, ADLS Gen2, Databricks, Synapse, Azure SQL, Azure ML
- Languages: Python, SQL, PySpark
- Dev Tools: Git, Azure DevOps / Jira (as applicable)
- Monitoring: ADF monitor, Databricks job runs, Azure Monitor (if enabled)
Qualification
- BE/BTech/BCA/MCA or equivalent practical experience
Soft Skills
- Clear communication and strong ownership.
- Good problem solving and troubleshooting mindset.
- Good documentation habit and disciplined delivery.
Works well with business, platform, and security teams
At Fujitsu, we are committed to an inclusive recruitment process that values the diverse backgrounds and experiences of all applicants. We believe that hiring people from a wide variety of backgrounds makes us stronger, not because it's the right thing to do, but because it allows us to draw on a wider range of perspectives and life experiences.