How to Use the Shodan API at Scale

This is a quick post mostly for refreshing my memory in the future. I recently wanted to download the data Shodan had on a large corporate IP space with disparate ranges and several hundred thousand IP addresses for post processing.

As far as I can tell the Shodan help docs are scattered across too many pages and domains and subdomains. There are a few guides out there on the basics of Shodan CLI and API but I didn’t see anything that documented things at a slightly larger scale so here are a few quick notes on gathering this data. Shodan needs no introduction, and the basics are well covered so I’ll dive in.

API Plans and Credits and Scanning Credits and Download Credits and Oh My

API plans? Credits? What do you need to know to make sure you have the right one? In my case I just wanted to download the data Shodan had and for the moment didn’t need to make use of any scanning or real-time data feed features. The Freelancer plan would work in this case but fortunately for me, my employer is helping funding Shodan with a corporate license. This also gives me access to bulk lookups which speeds up time to results.

Pricing

DataFreelancerSmall BusinessCorporateEnterprise
Price$59/month$299/month$899/month$You don’t want to ask
Results/month
(query credits)
1MM
(100 credits)
20MM
(2,000 credits)
Unlimited
(100,000 credits)
Unlimited
IP scans/month
(scan credits)
5,00065,000300,000Unlimited
Results Downloads
(export credits)
200,000
(20 credits)
Unsure1MM
(100 credits)
Bulk Data Feed (all of it)
FiltersMost
undefined)
Most
(undefined)
AllAll
Vulnerability Search FilterYesYesYes
Bulk IP LookupsYesYes
Tag Search FilterYesYes

If you’re not going past the first page and not using filters, no credits are being used and no account is required.

Credit Types

Obtaining data costs you “credits”. Depending how you access it you’re using different credit types. The following info comes from the Shodan Credits Explained page, which mostly just confuses me.

Credit TypePurposeWhen they’re used
Query
  • Searching via the API with filters
  • Searching via the website beyond page 2
  • 100 results per query credit
  • Renewed monthly
Used by default with the website
and API
Scan
  • Request network scans
  • 1 IP per scan credit
  • Renewed monthly
When you want results faster
than Shodan’s monthly internet scan (On Demand Scanning).
Used with the scan() API call or scan submit CLI command.
Export
  • Download search results from the website
  • 10,000 results per credit
  • Single use, pricing per credit varis 2.50-5$/credit
If you need to download from the web. Note that every download
request will use a credit, even if your
search has 100 results. CSV format loses 90% of the data and you can’t
change the format once
selected without using another
credit - download the JSON.

You can check your current usage at https://developer.shodan.io/dashboard

CLI Usage

This is straightforward and the docs are at https://cli.shodan.io/ . The CLI can also be used to search and parse data.

$ shodan search -h
shodan search [OPTIONS] <search query>
$ shodan search --fields ip_str,port,org,hostnames microsoft iis 6.0

$ shodan download -h
shodan download [OPTIONS] <filename> <search query>
$ shodan download --limit 100 file_name filter:query

$ shodan parse -h
shodan parse [OPTIONS] <filenames>
$ shodan parse --fields ip_str,port,org --separator , microsoft-data.json.gz

$ shodan convert -h
shodan convert [OPTIONS] <input file> <output format>

$shodan convert file_name.json.gz csv

Keep in mind, this is all in Python so the CLI tools can be trivially modified - find where your shodan.py files are installed and modify as you please. For example, shodan scan list only returns the last 10 results, but if you go look at the source, you’ll see you can quickly make changes to the REST calls (at ~/.local/lib/python3.6/site-packages/shodan/cli/scan.py on my system).

Downloading data with the API

There’s a pretty basic API example in the Shodan API Guide to get you started. In my case I had thousands of IPs and ranges I was looking at and the api.host() bulk lookup function was useful as it can take in array of 100 IPs per request. Note this feature requires a Corporate API plan.

import shodan
api = shodan.Shodan('YOUR CORPORATE API KEY')

hosts = api.host([
    '8.8.8.8',
    '8.8.4.4',
])

for info in hosts:
    print(info['ip_str'])

Once data has been downloaded, you can use the CLI to parse, but additional processing with Python can be useful. There are a few built in helper functions outlined on the Working with Shodan Data Files page. I ended up just writing my own, dumping the json in a format that still allowed the CLI to parse it.

The following scripts should help get you started with downloading data from Shodan, start scans, and parse out some CVEs from the downloaded data files.

Download Shodan Data from a list of CIDRs or IPs

Scan IPs from a file

Parse out CVEs and save to CSV

Of note: the banner specs if you’re parsing this data can be useful for types and optional fields - https://developer.shodan.io/api/banner-specification.

Also useful is the list of query filters (below), most of which can be used with shodan parse --fields <filter>.

Using the Scanning API

Just a quick blurb on this. Again, this requires a paid API plan. From the docs

Shodan crawls the entire Internet at least once a month, but if you want to request Shodan to scan a network immediately you can do so using the on-demand scanning capabilities of the API. A few common reasons to launch a scan are:

  • Validate firewall rules
  • Confirm issue was patched/ fixed
  • Check custom ports

And a note about scan status. Due to the way banner grabbing and services enumeration is done, a scan status might say DONE but the results won’t actually be ready. DONE in this case means it’s been picked up and has started to process. It’s a known caveat and if you need results ASAP they can be picked up with Shodan network monitor or simply waiting some period of time before downloading the results. Check out the REST API docs on scan statuses.

Query Filters

From the API docs, a list of search query filters:

FilterDescription
afterOnly show results that were collected after the given date (dd/mm/yyyy).
asnThe Autonomous System Number that identifies the network the device is on.
beforeOnly show results that were collected before the given date (dd/mm/yyyy.
cityShow results that are located in the given city.
countryShow results that are located within the given country.
geoThere are 2 modes to the geo filter: radius and bounding box. To limit results based on a radius around a pair of latitude/ longitude, provide 3 parameters; ex: geo:50,50,100. If you want to find all results within a bounding box, supply the top left and bottom right coordinates for the region; ex: geo:10,10,50,50.
hashHash of the “data” property
has_ipv6If “true” only show results that were discovered on IPv6.
has_screenshotIf “true” only show results that have a screenshot available.
hostnameSearch for hosts that contain the given value in their hostname.
ispFind devices based on the upstream owner of the IP netblock.
linkFind devices depending on their connection to the Internet.
netSearch by netblock using CIDR notation; ex: net:69.84.207.0/24
orgFind devices based on the owner of the IP netblock.
osFilter results based on the operating system of the device.
portFind devices based on the services/ ports that are publicly exposed on the Internet.
postalSearch by postal code.
productFilter using the name of the software/ product; ex: product:Apache
stateSearch for devices based on the state/ region they are located in.
versionFilter the results to include only products of the given version; ex: product:apache version:1.3.37
bitcoin.ipFind Bitcoin servers that had the given IP in their list of peers.
bitcoin.ip_countFind Bitcoin servers that return the given number of IPs in the list of peers.
bitcoin.portFind Bitcoin servers that had IPs with the given port in their list of peers.
bitcoin.versionFilter results based on the Bitcoin protocol version.
http.componentName of web technology used on the website
http.component_categoryCategory of web components used on the website
http.htmlSearch the HTML of the website for the given value.
http.html_hashHash of the website HTML
http.statusResponse status code
http.titleSearch the title of the website
ntp.ipFind NTP servers that had the given IP in their monlist.
ntp.ip_countFind NTP servers that return the given number of IPs in the initial monlist response.
ntp.moreWhether or not more IPs were available for the given NTP server.
ntp.portFind NTP servers that had IPs with the given port in their monlist.
sslSearch all SSL data
ssl.alpnApplication layer protocols such as HTTP/2 (“h2”)
ssl.chain_countNumber of certificates in the chain
ssl.versionPossible values: SSLv2, SSLv3, TLSv1, TLSv1.1, TLSv1.2
ssl.cert.algCertificate algorithm
ssl.cert.expiredWhether the SSL certificate is expired or not; True/ False
ssl.cert.extensionNames of extensions in the certificate
ssl.cert.serialSerial number as an integer or hexadecimal string
ssl.cert.pubkey.bitsNumber of bits in the public key
ssl.cert.pubkey.typePublic key type
ssl.cipher.versionSSL version of the preferred cipher
ssl.cipher.bitsNumber of bits in the preferred cipher
ssl.cipher.nameName of the preferred cipher
telnet.optionSearch all the options
telnet.doThe server requests the client to support these options
telnet.dontThe server requests the client to not support these options
telnet.willThe server supports these options
telnet.wontThe server doesnt support these options

References