About the job Python Web Crawler
Company Description
Our client is a leading Norwegian-headquartered software company specializing in managing and maintaining safety data sheets (SDS) for a vast array of industries. Their database holds over 14 million safety data sheets, which require constant updates from various manufacturers. To streamline this process, they rely on cutting-edge crawler technology to ensure that their database is always up to date with the latest information.
About The Role
Our client is seeking a highly skilled Python Web Crawler with advanced experience in building and optimizing web crawlers for complex data collection needs. If you have experience with Scrapy, that would be an advantage. The ideal candidate will create robust and efficient spiders capable of bypassing advanced website blocking mechanisms, such as IP restrictions, CAPTCHA, and JavaScript-rendered pages. Additionally, the candidate should be able to write crawlers that intelligently analyze and differentiate the content they crawl, ensuring only the required files or data are collected. You will also deploy and manage spiders using the ZYTE platform and build a basic dashboard to monitor crawler operations.
Key Responsibilities
- Design and develop advanced-level Python (experience with Scrapy would be an advantage) Web Crawler/Spiders to automate the collection of safety data sheets from websites.
- Implement strategies to bypass blocking mechanisms, such as Dynamic IP rotation and proxy management, solving CAPTCHAs using libraries or external services, and crawling JavaScript-rendered content using tools like Splash or Puppeteer.
- Write spiders with intelligent filtering logic to identify and download only relevant files (e.g., PDFs matching specific criteria such as file name patterns or content).
- Monitor, debug, and maintain existing spiders, ensuring they remain accurate and efficient.
- Deploy and optimize crawlers using the ZYTE platform for scalability and performance.
- Create a basic dashboard to monitor crawler operations, including metrics like success rate, error logs, and crawl status.
- Collaborate with the team to improve crawler strategies and tackle complex web scraping challenges.
Skills & Qualifications
- Minimum of 3 years of relevant work experience.
- Bachelor's Degree in Software Engineering or related field or relevant work experience.
- Extensive experience in developing advanced crawlers using Python programming and building efficient, modular code.
- Experience in Scrapy Framework is highly desirable.
- Expertise in handling web scraping challenges, such as IP rotation, CAPTCHA solving, and rendering JavaScript-based content.
- Proven ability to write intelligent spiders capable of differentiating content based on criteria such as file types, naming conventions, or metadata.
- Experience deploying and managing spiders on the ZYTE platform.
- Solid understanding of web technologies, including HTML, CSS, JavaScript, and HTTP protocols.
- Basic experience in developing dashboards or monitoring tools using frameworks like Flask, Django, or front-end libraries.
- Experience with machine learning or rule-based algorithms to classify and filter crawled data.
- Familiarity with managing and scaling large-scale scraping operations.
Employment Structure
- Hybrid in Dhaka | Full-time
- Salary: BDT 80,000 - 120,000
- Benefits: 2 Annual Bonuses after permanent (probation is 3-6 months)
- Work Week: Monday - Friday, 10 am to 6 pm BST
Hiring Process
1. Conversation with Talvette
2. Home-based technical assignment
3. Interview with the client's management team
5. Receive an offer
6. Join their team full-time