Case Studies

Image Data Mining & Process Automation

For an Artificial Intelligence & Machine Learning solution provider in UK

Client Situation

  • Customer wanted 250 million images to be mined for which they had a list of 2.5 Million species scientific and generic names.
  • Customer wanted APISCRAPY to collect 300 images for each 2.5 million species. – 10 HD (High Definition) images for each species were to be mined from Flickr/Instagram/Google. Next 290 Images were to be mined from Google images.
  • Customer was looking for an optimized solution for image data collection at minimum price.

APISCRAPY Solution

  • APISCRAPY Team developed a custom programmed web crawler using paid Google API to mine image data.
  • Custom Google API crawler was able to mine required image data with manual intervention.
  • Team performed manual validation and ensured that all images are of expected quality.
  • Wherever there were less than 300 valid images available for specious – Team converted the available images to 300 using Python code.

Client Benefits

  • 90 % cost saving.
  • Above 95% quality assured.
  • Quick Turn Around – delivered 250 million images in 70 business days.
Cost Saving 1 90% cost saving
Images Mined 1250 million images mined
Untitled 1 Quick Turnaround Delivered in 70 days

Image Data Mining Process Automation

MRO Product Data Scraping & Process Automation

For a E-commerce company in US

Client Situation

  • Customer wanted APISCRAPY to mine a database of 1.3 million products in 35 categories which they planned to add to their e-commerce website.
  • Each product category had 400 pages of data, and there were more than 30% duplicate products listed.
  • Target website had 1000+ columns and customer wanted to have images to be downloaded and zipped into a single file along with image links – in output .csv format.
  • Customer was looking for a cost effective and optimized solution for data collection and cleansing.

APISCRAPY Solution

  • APISCRAPY Team developed a custom programmed web crawler.
  • Analyzed the web structure and set multiple rules to ensure that all desired data points are captured.
  • Automated de-duplication and file compiling process using Python code.
  • Custom crawler and automation was able to help process the data with minimal manual intervention.

Client Benefits

  • Above 99% Quality assurance.
  • An approximate 75% cost saving.
  • Quick turnaround – 25 business days.
Cost Saving 1 75% cost saving
Images Mined 1 1.2 million records processed
Untitled 1 Quick Turnaround Delivered in 25 days

Mro Product Data Scraping Process Automation

Retail Product Data Processing

For a Management Consulting company in US

Client Situation

  • Client wanted ASIN and EAN data for Monitors, Televisions & Audio-Video players.
  • Classify products into the right product category and allocate the ASIN number to avoid duplication in Amazon.
  • Provide product description and specification data and image from the manufacturer site.

APISCRAPY Solution

  • Established process for mining ASIN, EAN and product specifications.
  • Product duplication avoidance by using multi level quality check by experienced resources.

Client Benefits

  • Approximately 50% cost saving.
  • 40% increase in sales conversion.
  • Right classification, description, specification data along with image.
  • Increased online store traffic.
  • Increased customer satisfaction.
Cost Saving 1 50% cost saving
Images Mined 1 10K records processed daily
Untitled 1 40% increase in sales conversion

Retail Product Data Processing

Medical Equipment Product Data Mining

For an medical product e-commerce marketplace in US

Client Situation

  • Customer required a database of 1.2 million – laboratory equipments and customer shared the website to be mined.
  • The target website did not have a uniform structure and it was not built in professional manner.
  • Every product was having a different format with 1.2 million rows and 900+ columns.
  • Customer wanted all product related images to be downloaded and zipped into a single file, along with the link of image in output .csv file.
  • Customer was looking for an optimized solution for data collection.

APISCRAPY Solution

  • APISCRAPY Team developed custom programmed web crawler to mine data.
  • As the website was built in unstructured fashion, team had to set multiple business and logical – rules to ensure that all desired data is captured.
  • Custom crawler was able to mine desired data points – instead all dump data – in the template designed by the customer. This reduced manual intervention substantially.

Client Benefits

  • Customer cost saving around 80%.
  • Above 99% Quality assurance.
  • Quick turnaround – delivered in 3 weeks time.
Cost Saving 1 80% cost saving
Images Mined 1 1.2 million equipments Data processed
Untitled 1 Quick Turn Around Delivered in 3 weeks

Medical Equipment Product Data Mining

US Election Debate Winner Predictive Analytics Engine Using Twitter Data

for a US based consulting firm

Client Situation

  • Customer wanted a real time dashboard to understand public reaction in real time and predict winner of US presidential election debate.
  • Collect twitter data in real time and predict winner for the day.
  • Customer was looking collecting twitter data from specified geographies.

APISCRAPY Solution

  • Used Twitter API for collecting the real time streaming data using a collection of tags.
  • Analyzed each tweets for its relevancy.
  • Developed a curated text data analysis to identify reaction in favorable, unfavorable, and neutral to each presidential candidate.
  • Custom web based portal for twitter dashboards and reports.

Client Benefits

  • High accuracy in predicting winner candidate ± 25 compared to leading pollsters.
  • Quick turn around of the dashboard.
  • 60% Cost savings.
Cost Saving 1 60% cost saving
Images Mined 1 Quick Turn Around
Untitled 1 High Accuracy

Us Election Debate Winner Predictive Analytics Engine Using Twitter Data