Offer summary

Qualifications:

0–2 years of experience in software development, data engineering, or related fields., Degree in Computer Science, Computer Engineering, Information Systems, or equivalent technical background., Proficiency in Python for data manipulation and automation., Understanding of HTML, CSS selectors, and web page structures..

Key responsibilities:

Design and maintain robust data collection pipelines from various sources.

Extract and structure information from unstructured or semi-structured formats.

Clean, filter, and validate raw data to ensure high quality and usability.

Collaborate with engineering and product teams to optimize data storage and access patterns.

Job description

Data Science at TRACTIAN

The Data Science team at TRACTIAN focuses on extracting valuable insights from vast amounts of industrial data. Using advanced statistical methods, algorithms, and data visualization techniques, this team transforms raw data into actionable intelligence that drives decision-making across engineering, product development, and operational strategies. The team constantly works on optimizing prediction models, identifying trends, and providing data-driven solutions that directly enhance the company’s operational efficiency and the quality of its products.

What you'll do

We’re looking for software and data engineers to join our newly established Data Gathering and Labeling (DGL) team. In this role, you'll be critical to building Tractian's comprehensive and diverse datasets, from industrial equipment documentation to sensor data like vibration and temperature. Your work will directly power new features in our platform and enhance our competitive advantage through richer and more reliable data resources.

Responsibilities

Design and maintain robust data collection pipelines from a wide range of sources, including websites, documents, APIs, and raw sensor data

Extract and structure information from unstructured or semi-structured formats into clean, standardized schemas

Handle real-world data challenges like pagination, rate limits, CAPTCHAs, noise, missing values, and inconsistent formatting

Clean, filter, and validate raw data to ensure high quality, consistency, and usability across our systems

Develop small tools and utilities to support and automate data collection workflows

Support the creation and maintenance of labeling pipelines for ML applications

Collaborate with engineering and product teams to optimize data storage and access patterns

Document data sources, collection methodologies, and processing procedures for reproducibility

Requirements

0–2 years of experience in software development, data engineering, or related fields

Degree in Computer Science, Computer Engineering, Information Systems, or equivalent technical background

Understanding of HTML, CSS selectors, and how web pages are structured

Strong problem-solving skills and an eye for detail

Ability to work in a fast-paced environment and manage shifting priorities

Technical Skills

Proficiency in Python, especially for data manipulation and automation

Experience (academic or professional) with data extraction using tools like `requests`, `BeautifulSoup`, or similar

Familiarity with REST APIs and the HTTP protocol

Experience with data cleaning techniques such as:

Handling missing or inconsistent values

Removing duplicates and outliers

Standardizing formats (e.g., dates, units, text normalization)

Validating data against schemas or expected ranges

(Optional) Exposure to browser automation tools like Selenium or Playwright

Nice to Have

Experience with web scraping libraries/frameworks like Scrapy, Playwright, or Selenium

Familiarity with proxy usage, headless browsers, or CAPTCHA bypass techniques

Understanding of database systems (SQL or NoSQL)

Exposure to rapid prototyping tools like Streamlit

Previous experience working with or around industrial equipment or maintenance systems

Required profile

Are you interested?

Machine Learning Engineer Related jobs

Data Scientist / Machine Learning Engineer (LLM & RAG Systems)

30+ days ago

Lynceus

Full time
21 - 23K

Natural Language Processing (NLP)Machine LearningAmazon Web ServicesPython (Programming Language)

Senior Product Engineer, Machine Learning

30+ days ago

Intercom

Full time
25 - 25K

Data & Machine Learning Engineer

30+ days ago

TWL Global Services

Full time

KubernetesMachine LearningMongoDB

Senior Machine Learning Engineer - Vietnam remotely

30+ days ago

Rackspace Technology

Full time

Machine LearningCloud ComputingETL (Extract Transform Load)Python (Programming Language)

CTO - Encryption & Machine Learning

30+ days ago

Hyphen Connect

Full time

See more Machine Learning Engineer jobs

Machine Learning Engineer - Data Scrapping

Offer summary

Qualifications:

Key responsibilities:

Job description

Required profile

Experience

Hard Skills

Other Skills

Machine Learning Engineer Related jobs

Data Scientist / Machine Learning Engineer (LLM & RAG Systems)

Senior Product Engineer, Machine Learning

Data & Machine Learning Engineer

Senior Machine Learning Engineer - Vietnam remotely

CTO - Encryption & Machine Learning