Image Data Mining & Process Automation
For an Artificial Intelligence & Machine Learning solution provider in UK
- Customer wanted 250 million images to be mined for which they had a list of 2.5 Million species scientific and generic names.
- Customer wanted APISCRAPY to collect 300 images for each 2.5 million species. – 10 HD (High Definition) images for each species were to be mined from Flickr/Instagram/Google. Next 290 Images were to be mined from Google images.
- Customer was looking for an optimized solution for image data collection at minimum price.
- APISCRAPY Team developed a custom programmed web crawler using paid Google API to mine image data.
- Custom Google API crawler was able to mine required image data with manual intervention.
- Team performed manual validation and ensured that all images are of expected quality.
- Wherever there were less than 300 valid images available for specious – Team converted the available images to 300 using Python code.
- 90 % cost saving.
- Above 95% quality assured.
- Quick Turn Around – delivered 250 million images in 70 business days.
90% cost saving
250 million images mined
Quick Turnaround Delivered in 70 days
MRO Product Data Scraping & Process Automation
For a E-commerce company in US
- Customer wanted APISCRAPY to mine a database of 1.3 million products in 35 categories which they planned to add to their e-commerce website.
- Each product category had 400 pages of data, and there were more than 30% duplicate products listed.
- Target website had 1000+ columns and customer wanted to have images to be downloaded and zipped into a single file along with image links – in output .csv format.
- Customer was looking for a cost effective and optimized solution for data collection and cleansing.
- APISCRAPY Team developed a custom programmed web crawler.
- Analyzed the web structure and set multiple rules to ensure that all desired data points are captured.
- Automated de-duplication and file compiling process using Python code.
- Custom crawler and automation was able to help process the data with minimal manual intervention.
- Above 99% Quality assurance.
- An approximate 75% cost saving.
- Quick turnaround – 25 business days.
75% cost saving
1.2 million records processed
Quick Turnaround Delivered in 25 days
Retail Product Data Processing
For a Management Consulting company in US
- Client wanted ASIN and EAN data for Monitors, Televisions & Audio-Video players.
- Classify products into the right product category and allocate the ASIN number to avoid duplication in Amazon.
- Provide product description and specification data and image from the manufacturer site.
- Established process for mining ASIN, EAN and product specifications.
- Product duplication avoidance by using multi level quality check by experienced resources.
- Approximately 50% cost saving.
- 40% increase in sales conversion.
- Right classification, description, specification data along with image.
- Increased online store traffic.
- Increased customer satisfaction.
50% cost saving
10K records processed daily
40% increase in sales conversion
Medical Equipment Product Data Mining
For an medical product e-commerce marketplace in US
- Customer required a database of 1.2 million – laboratory equipments and customer shared the website to be mined.
- The target website did not have a uniform structure and it was not built in professional manner.
- Every product was having a different format with 1.2 million rows and 900+ columns.
- Customer wanted all product related images to be downloaded and zipped into a single file, along with the link of image in output .csv file.
- Customer was looking for an optimized solution for data collection.
- APISCRAPY Team developed custom programmed web crawler to mine data.
- As the website was built in unstructured fashion, team had to set multiple business and logical – rules to ensure that all desired data is captured.
- Custom crawler was able to mine desired data points – instead all dump data – in the template designed by the customer. This reduced manual intervention substantially.
- Customer cost saving around 80%.
- Above 99% Quality assurance.
- Quick turnaround – delivered in 3 weeks time.
80% cost saving
1.2 million equipments Data processed
Quick Turn Around Delivered in 3 weeks
US Election Debate Winner Predictive Analytics Engine Using Twitter Data
for a US based consulting firm
- Customer wanted a real time dashboard to understand public reaction in real time and predict winner of US presidential election debate.
- Collect twitter data in real time and predict winner for the day.
- Customer was looking collecting twitter data from specified geographies.
- Used Twitter API for collecting the real time streaming data using a collection of tags.
- Analyzed each tweets for its relevancy.
- Developed a curated text data analysis to identify reaction in favorable, unfavorable, and neutral to each presidential candidate.
- Custom web based portal for twitter dashboards and reports.
- High accuracy in predicting winner candidate ± 25 compared to leading pollsters.
- Quick turn around of the dashboard.
- 60% Cost savings.
60% cost saving
Quick Turn Around