Exploring the Diffbot Automatic API

Diffbot is a popular machine learning-powered web scraper that allows developers to extract structured data and insights from web pages at scale. The Diffbot Automatic API is a RESTful API that lets developers easily integrate Diffbot's web scraping technology into their applications. In this blog post, we'll explore the documentation for the Diffbot Automatic API and provide some example code in JavaScript.

Getting Started

Before you can start using the Diffbot Automatic API, you'll need to sign up for a developer account and obtain an API key. Once you have an API key, you can start making requests to the API endpoints.

API Endpoints

The Diffbot Automatic API documentation lists several endpoints for different types of scraping tasks. Here are some of the most common endpoints:

  • Article API: extracts content from articles, blogs, news sites, and other text-heavy pages.
  • Product API: extracts product data, such as names, descriptions, prices, and images, from online stores.
  • Image API: extracts image data and metadata from web pages.
  • Analyze API: analyzes a page and returns a list of categorized fields and their associated data.

Request Format

To send a request to the API, you'll need to specify the URL of the web page you want to process, along with any required parameters. The request should be formatted as a JSON object.

Here's an example of a request to the Article API that extracts the title, author, and text of a web page:

const fetch = require('node-fetch');

const apiKey = 'your-api-key-here';
const apiUrl = 'https://api.diffbot.com/v3/article';

const url = 'https://www.example.com/article';

const params = {
  token: apiKey,
  url: url,
  fields: ['title', 'author', 'text']
};

fetch(apiUrl, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json'
    },
    body: JSON.stringify(params)
  })
  .then(response => response.json())
  .then(data => console.log(data))
  .catch(error => console.error(error));

In this example, we're using the node-fetch library to send a POST request to the API endpoint with the specified parameters. We're also logging the response data to the console.

Conclusion

The Diffbot Automatic API provides a powerful and simple way to extract structured data and insights from web pages. Whether you're building a web scraping tool or an intelligent web application, the Diffbot Automatic API is a valuable resource to have in your toolkit.

Related APIs in Data Access