
Data Engineer
Cleber excels in designing, implementing, and deploying reliable data products. He has a strong ability to collaborate with stakeholders to define business requirements and ensure effective data ingestion from external sources.
Additionally, he thrives in agile environments, promoting data-driven quality and quick iterations. Cleber is dedicated to staying updated with the latest technologies and trends in the field, showcasing his commitment to continuous learning and professional growth.
Related Sciences is a biotechnology firm building an AI-driven platform to map global scientific innovation by analyzing scholarly publications, patents, and research networks.
Developed and maintained Python-based data pipelines integrating patent and publication data from OpenAlex, USPTO, and PubMed to power drug discovery analytics.
Designed and optimized network graph models and ontologies using Neo4j and NetworkX, improving entity linking and semantic search accuracy.
Built visualization tools to represent relationships across scientific disciplines, leveraging Plotnine and interactive visualization libraries.
Implemented continuous integration workflows with GitHub Actions, mypy type checking, and pre-commit hooks to ensure code reliability.
Collaborated closely with AI researchers and data scientists to embed LLM-driven data enrichment and embedding models into production workflows.
Contributed to internal documentation and open-source components supporting the “All of Science” knowledge graph initiative.

Implemented a scalable real-time data pipeline using Kafka and ClickHouse to parse events and enrich customer data across hundreds of tenants, supporting cross-provider ad tracking and personalized analytics.
Developed and tested different technologies for a new identity resolution system, exploring graph database and Python-based solutions to improve user matching and deduplication while keeping downstream data accurate and consistent.
Improved query performance in ClickHouse by implementing key-hashed dictionaries, aggregation tables, and optimizing partitioning and indexing. These enhancements significantly reduced query latency and improved overall system efficiency, enabling faster access to large volumes of analytical data.
Introduced dbt to build and maintain incremental data pipelines for preprocessing analytical data, enhancing data reliability and enabling scalable analytics.
One More DMCC is a data engineering and analytics company based in Germany, providing advanced tracking and marketing optimization solutions for digital advertisers and agencies.
Designed and optimized high-performance databases in PostgreSQL and ClickHouse to support large-scale marketing analytics workloads.
Improved query performance and reduced storage costs by implementing table partitioning, materialized views, and advanced indexing strategies.
Automated ETL workflows to ingest multi-source tracking data, ensuring data consistency and low-latency availability for reporting.
Collaborated with founders and engineers on schema design, query tuning, and architecture decisions to support rapid feature iteration.
Actively participated in code reviews and architecture discussions, promoting transparency, accountability, and technical excellence within the distributed team.
TVA2 LLC is a U.S.-based data analytics consultancy that builds scalable data pipelines and automates reporting systems for clients across medical, financial, retail, and eCommerce sectors.
Built and automated ETL pipelines in Python and AWS Lambda to clean, transform, and deliver analytical data across multiple client environments.
Developed SQL data models and optimized database architecture to ensure efficient storage and querying across large datasets.
Designed cloud-based automation processes for error logging, tracking, and fault-tolerant execution of data workflows.
Supported data visualization initiatives by preparing datasets for Tableau and Power BI, enabling clients to derive actionable insights.
Collaborated with cross-functional teams and clients to define project requirements, ensuring data accuracy and timely delivery.

Successfully led a team of five data engineers, providing mentorship, task delegation, and code reviews to ensure high-quality deliverables across all data team projects;
Took charge of refactoring a critical legacy pipeline, significantly improving its performance and scalability. The release of the new version granted customers more autonomy in building the campaign creation platform, reducing the average time for client requests received by the data engineering team by 1 hour;
Developed new data routines utilized by Oto CRM's 50+ Brazilian retail clients, enabling the processing of customer data for millions of customers. These routines facilitated the implementation of optional features on the Oto CRM user interface, such as campaign priorities, preferential customer store, best customer-seller matching, best customer-store matching, random redistribution, weighted redistribution, and multiple ready-to-use campaigns;
Designed and implemented a Python command line interface to streamline the data engineering team's preparation of new Oto CRM client infrastructures. This interface significantly reduced human error and improved the onboarding experience for new clients by expediting the data provisioning process;
Strategized and executed the development of a Python API integration to ingest external data sources and enrich them with internal data from various sources;
Demonstrated strong skills in data warehousing, data quality, Git, data modeling, data ingestion, Apache Airflow, communication, problem-solving, data pipelines, NoSQL, ETL (Extract, Transform, Load), data governance, Python programming language, SQL, big data, team leadership, machine learning, and agile methodologies.
Developed a customer identifier engine capable of identifying and assigning internal user identifiers to behavioral website, CRM, and omnichannel transactional data. This engine facilitated scalable and customized data integration from diverse external data sources;
Restructured the RFM Ledger, a critical data pipeline that generates historical Recency, Frequency, and Monetary Value groupings for each customer based on custom rules. By implementing the SQL template technique to parameterize the query according to Oto CRM clients' settings, the average processing speed of this pipeline increased by over 75%;
Refactored MySQL queries and built new ones in ClickHouse to align with specific business requirements. These queries were parameterized and made available as metrics for Oto CRM users, significantly reducing latency by up to 95%. The metrics included Omnichannel media attribution, CRM analytical view of store performance, CRM analytical view of product sales, analytical view of personas and clusters by store, Digital channel performance, and Ranking of sales reps by different KPIs;
Created a Python reporting automation tool that automated data extraction from analytical tools and generated ready-to-use reporting files. This automation significantly reduced the time spent on repetitive tasks within the team;
Demonstrated expertise in data warehousing, Git, data modeling, data ingestion, Apache Airflow, communication, problem-solving, data pipelines, NoSQL, ETL (Extract, Transform, Load), Python programming language, SQL, big data, machine learning, data analysis, and agile methodologies.
Conducted e-commerce data analysis and web analytics for Grupo Herval, leveraging external tools and internal databases to gather relevant data;
Identified opportunities for improving marketing results by reducing costs and optimizing campaigns based on data analysis;
Gathered requirements from stakeholders to understand their critical business needs for reporting purposes;
Developed interactive dashboards to provide stakeholders with insightful and actionable information for informed decision-making;
Collaborated closely with stakeholders to ensure the dashboards met their specific requirements and provided valuable insights;
Played a key role in improving marketing strategies and overall business performance by utilizing data-driven insights and analytics.

Managed marketing analytics responsibilities, specifically handling Facebook and Instagram promotions for new products;
Oversaw the reporting of customer feedback and analyzed the data to derive actionable insights for marketing strategies;
Conducted website analytics to monitor and optimize performance, ensuring effective online presence and customer engagement;
Collaborated with the IT sector to implement automation processes in various websites, including Beira Rio Conforto, Moleca, Vizzano, Molekinha, and Modare Ultraconforto;
Utilized effective communication skills to collaborate with cross-functional teams and stakeholders, ensuring alignment and understanding of marketing goals and strategies;
Demonstrated proficiency in data analytics and data analysis techniques to drive informed decision-making in marketing initiatives;
Successfully leveraged marketing analytics and automation to improve campaign performance, customer satisfaction, and overall online presence.
Engineering excellence
Cleber’s overall performance in a 90-minute live technical assessment ranks in the top 25% of vetted Data Engineers at Proxify.

Issued Aug 2025

Issued Aug 2025





Talk to an expert and get tailored matches from our network in just 2 days.
A network of over 6,000+ tech experts
Get matched with perfect-fit talent in 2 days on average
Hire quickly and easily with 94% match success