Web Scraping and Senior Data Acquisition Engineer
Job Description:
Our client is an AI-powered research platform. They organize unstructured information in crypto, making it accessible to investors and researchers
Responsibilities
- Work closely with co-founding team to define priorities and develop information sourcing roadmaps
- Lead the effort to design and implement the architecture of a large-scale crawling system (100+ crawlers)
- Design, implement, and maintain various components of data acquisition infrastructure (building new crawlers, maintaining existing crawlers, data cleaners & loaders)
- Build pragmatic, scalable, and statistically rigorous solutions to large-scale web and data infrastructure problems by leveraging or developing statistical and machine learning methodologies
- Effectively advocate technical solutions to research, engineering teams and business audiences
Requirements
- Bachelors degree in quantitative field (e.g. Computer Science, Engineering, Mathematics, Statistics, Operations Research or other related field)
- 3+ years of experience with Python for data wrangling and cleaning
- Expertise in running, monitoring and maintaining all aspects of a scraping pipeline end to end (building and maintaining 100+ spiders, avoiding bot prevention techniques, data cleaning and pipelining); familiarity with scraping libraries and monitoring tools highly recommended (BeautifulSoup, Xpaths, Selenium, Puppeteer, Splash)
- Experience in extracting data from multiple disparate sources including HTML, XML, REST, GraphQL, PDF, and spreadsheets
- Experience in using techniques to protect web scrapers against bot detection, site ban, IP leak, browser crash, CAPTCHA and proxy failure
- OOP, SQL and Django ORM basics
What's next?
If you're interested in this role, click Apply To Position or drop an email to don@adstifysearch.com
Don Chan
Senior Consultant
EA Personnel Number: R1763146
EA License Number: 20C0292
Required Skills:
REST Prevention OOP Django Spreadsheets Selenium Sourcing Machine Learning Components Statistics Architecture Infrastructure XML Mathematics Computer Science Python Email Research HTML SQL Engineering Design Business Science
Salary Package:
$ 75,000.00 - 150,000.00 (US Dollar)