DownloadBackLinks

Command status: Active
Supported by OpenApps API: Yes
Supported by Internal/Reseller API: Yes
Possibly queued processing: Yes (Always)

Calls to this function will result in usage of the following subscription resources:

Resource Description

AnalysisResUnits

This resource will be decreased by a variable amount depending how large analysed item is. Precise amount should be queried by calling GetIndexItemInfo data column DownloadBacklinksAnalysisResUnitsCost.

RetrievalResUnits

This resource will be decreased by actual number of rows of data retrieved (returned) by this function.

This function is designed to allow large scale backlinks retrieval beyond maximum of 50000 links supported by GetBackLinkData. Due to typically high volume of data this command is expected to process it will always work asyncrounously with results being made available as a downloadable .GZ compressed CSV file. This command will generate equivalent data as obtained by Download interface for Advanced Reports in Majestic web front end.

Important: a parameter called SkipIfAnalysisCostGreaterThan is set by default to 1,000,000 to prevent accidental request to retrieve data from very large domains (such as google.com). If you intend to get backlinks for items with higher cost then you will need to supply SkipIfAnalysisCostGreaterThan set to maximum acceptable cost.
Always use GetIndexItemInfo function to get analysis cost before actually calling this function, read more in common problems to avoid disappointment.

Parameter Description

cmd

Required: must be set to: DownloadBackLinks

datasource

Optional - defaults to historic
Either: "fresh" - to query against fresh index, or "historic" - to query against historic index.

Query

Required: index item that is being queried using same convention as for GetIndexItemInfo, examples:
http://www.example.com - URL
www.example.com - subdomain
example.com - root domain

SkipIfAnalysisCostGreaterThan

Analysis will be automatically aborted for any index item that will have greater analysis cost than specified. This is useful to prevent human errors in trying to download backlinks for items that are too large.

Default: 1000000

NotifyURL

Important: notification URL must be accessible from outside your intranet - do not specify internal servers that can't be assessed from our servers, if you do then you will never get the notification you expect!

Optional: if specified this URL (with HTTP/HTTPS protocols only) will be requested to notify you that the download has been fully prepared. The URL you provide can contain query string parameters to help you identify the download request that was made, we will also substitute (if they are present) the following macro variables (case sensitive): %%DOWNLOAD_FILE%% - will be changed to the download filename %%DOWNLOAD_FILE_LOCATION%% - will be changed to the download filename location using PublicDownloadLocation variable below

Your notify URL should respond with single piece of data: OK (2 characters - no HTML) to indicate that you have successfully received this notification. Any other response (including server failures on your side) will be treated as an error. In the case of an error, the notification URL will be called again a number of times using exponential backoff before failing.

Example of notification url: http://www.example.com/mjseonotify.php (this URL doesn't actually exist!)

UploadToFTP

Important: this FTP URL must be accessible from outside your intranet - do not specify internal servers that can't be assessed from our side: if you do then you will never get the upload!

Optional: if not specified prepared file with backlinks will be uploaded to majestic.com, this behaviour can be overridden by specifying your own FTP server using appropriate URI format such as:

ftp://username:password@yourftpserver.com/public_html/uploads/ with the relative path of the upload directory or ftp://username:password@yourftpserver.com//home/user/public_html/uploads/ with the absolute path of the upload directory.

Make sure that the user specified is allowed to write files in the directory that you designated for these uploads. If a trailing filename is specified, this will be prepended to the output filename when it is upload to your server.

If not specified the upload will be made to www.majestic.com from where you will be able to download the file.

PublicDownloadLocation

Optional: by default prepared backlinks file will be available from URL: http://downloads.majestic.com (note: directory listing is denied by design there) - IF you use alternative FTP location and know which public URL would correspond to it then you are advised to supply it here unless you are planning to analyse the data locally.

Additionally filtering rules described below can be supplied in order to make analysis only deal with the desired data.

Filtering rules

Filtering rules enable targeting processing of backlinks in order to retrieve data that matches specified criteria. For example a particular analysis request may wish to focus on backlinks marked as "nofollow" or exclude such links from any analysis. Filtering rules can be combined, ie: analyse only backlinks found in a particular time period that exclude those of them that were marked as "nofollow".

Rule name Description

Target URLs filtering rules

URLs

One or more Target URLs delimited by CR LF - only meaningful for domain level analysis, this will force analysis to run only on backlinks pointing to these URLs. Exact matching of URLs will be performed.

IncludeMatchedURLs

Comma delimited list of strings that will be required to be matched in Target URLs in order for them to be analysed.

ExcludeMatchedURLs

Same as IncludeMatchedURLs but matched Target URLs will be excluded.

Source URL filtering rules

FlagIncludeDeleted (also works old flag: FlagIncludeOldCrawl)

FlagIncludeNoFollow
FlagIncludeRedirect
FlagIncludeImage
FlagIncludeFrame
FlagIncludeAltText
FlagIncludeMention

Includes into analysis source URLs that had at least one of the following flags set:

  • Deleted (also works OldCrawl) - links that were present on a page but after recrawl found to be deleted (removed)
  • NoFollow - links marked with "nofollow"
  • Redirect - links that were redirecting
  • Image - links that were image based rather than text
  • Frame - source URL was used in a frameset (useful to find if someone frames content)
  • AltText - text used in title attribute of A tag
  • Mentions - text mentions of a domain or link

See more about Source Flags: http://www.majestic.com/glossary.php#SourceFlags

These are set by setting the parameter to 1
eg 'FlagIncludeNoFollow=1'

FlagExcludeDeleted (also works old flag: FlagExcludeOldCrawl)

FlagExcludeNoFollow
FlagExcludeRedirect
FlagExcludeImage
FlagExcludeFrame
FlagExcludeAltText
FlagExcludeMention

Same as FlagInclude* filtering rules, only setting them will result in exclusion of source URLs with specified flags set.

Recommended use: exclude non-rank passing backlinks such as those marked with Deleted, NoFollow, Mention, AltText, Frame, Redirect flags.

These are set by setting the parameter to 1
eg 'FlagExcludeNoFollow=1'

IncludeMatchedRefDomains

If set then only backlinks for matching referring domains will be retrieved. Referring domains can be comma delimited to provide multiple referring domains of interest.

Example of valid referring domains: example.com,example.net

It's possible that there will be no backlinks for specified referring domains in which case empty file (with headers) will be produced.

ExcludeMatchedRefDomains

If set then backlinks for matching referring domains will be excluded. Referring domains can be comma delimited to provide multiple referring domains to be excluded.

Example of valid referring domains: example.com,example.net

It's possible that there will be no backlinks returned when these referring domains are excluded.

IncludeMatchedRefURLs

If set then only backlinks containing the specified text in the source URL will be returned, multiple text can be specified in a comma seperated list

Example of valid text: page,folder

It's possible that there will be no backlinks for specified text in which case empty file (with headers) will be produced.

ExcludeMatchedRefURLs

Same as IncludeMatchedRefURLs but matched URLs will be excluded from the analysis.

IncludeExactAnchorText

If set then only backlinks with exactly matching (lower cased) anchor text will be selected.

Example: "yahoo" (without quotes) - this will only match anchor texts that are exactly "yahoo", it won't match "yahoo!" or anything different.

Use | (pipe) to separate multiple anchor texts, though this is not recommended.

ExcludeExactAnchorText

Same as IncludeExactAnchorText but matched anchor text items will be excluded from analysis.

IncludeContainsAnchorText

If set then only backlinks with anchor text containing (lower cased) specified anchor texts.

Example: yahoo

This will match anchor texts that contain that word, ie: "yahoo!", "click here to go yahoo" etc

Use | (pipe) to separate multiple anchor texts, though this is not recommended.

ExcludeExactAnchorText

Same as IncludeExactAnchorText but matched anchor text items will be excluded from analysis.

IncludeContainsAnchorText

If set then only backlinks with anchor text containing (lower cased) specified anchor texts.

Example: yahoo

This will match anchor texts that contain that word, ie: "yahoo!", "click here to go yahoo" etc

Use | (pipe) to separate multiple anchor texts, though this is not recommended.

ExcludeContainsAnchorText

Same as IncludeContainsAnchorText but matched anchor text items will be excluded from analysis.

IncludeMatchedRefTLDs

Comma delimited list of TLDs (Top Level Domains) that will be included in analysis, ie: edu,gov - note there is no . in front of TLD.

ExcludeMatchedRefTLDs

Same as IncludeMatchedRefTLDs but matched referring TLDs will be excluded from analysis.

EnableBackLinkDateRange

If set to 1 then Last Crawled date range analysis will be enabled.

IncludeBackLinksDateFrom

Date in format of DD/MM/YYYY (ie: 05/02/2008 for 5 February 2008) - links from Source URLs with Last Crawled date starting at that moment will be included in analysis.

Requires EnableBackLinkDateRange to be set to 1.

IncludeBackLinksDateTo

Same as IncludeBackLinksDateFrom only this is the cut off Last Crawled date for backlinks analysis by date.

Requires EnableBackLinkDateRange to be set to 1.

EnableBackLinkFFDateRange

If set to 1 then First Found date range analysis will be enabled.

IncludeBackLinksFFDateFrom

Date in format of DD/MM/YYYY (ie: 05/02/2008 for 5 February 2008) - links from Source URLs with First Found date starting at that moment will be included in analysis.

Note: fresh index only includes data going back 30 days and it can only see when link was first found during that date range effectively making it impossible to say if the same link wasn't found 3 years ago. It is recommended to use historical index for first found dates.

Requires EnableBackLinkFFDateRange to be set to 1.

IncludeBackLinksFFDateTo

Same as IncludeBackLinksFFDateFrom only this is the cut off First Found date for backlinks analysis by date.

Requires EnableBackLinkFFDateRange to be set to 1.

IncludeMatchedIPs

Includes links from Source URLs with matched IP addresses of the resolved referring domains that matched comma delimited list of IPs. Prefix matching is used, you can match subnets by leaving out the end of the IP, ie: 212.34.4.

ExcludeMatchedIPs

Same as IncludeMatchedIPs but matched referring TLDs will be excluded from analysis.

IncludeMatchedGeoCountry

Includes backlinks referring domains of which were geo-located in specified countries.

Use two letter country codes or NA for not geo-located IPs or unresolved domains.

List of country codes: http://www.maxmind.com/app/iso3166

ExcludeMatchedGeoCountry

Same as IncludeMatchedGeoCountry but matched referring TLDs will be excluded from analysis.

MinRefACRank

If specified only source URLs with ACRank higher than specified will be considered.
Acceptable values: 0-15

MinCitationFlow

If specified, only source URLs with a citation flow greater than the value provided will be considered

0 - 100 (Default 0)

MaxCitationFlow

If specified, only source URLs with a citation flow less than the value provided will be considered

0 - 100 (Default 100)

MinTrustFlow

If specified, only source URLs with a trust flow greater than the value provided will be considered

0 - 100 (Default 0)

MaxTrustFlow

If specified, only source URLs with a trust flow less than the value provided will be considered

0 - 100 (Default 100)

CitationTrustDeltaChoice

Enable/Disable the Citation / Trust flow delta filtering

-1 : Off
0 : Minimum
1 : Maximum
(Default 0)

CitationTrustDeltaValue

Apply filtering on trust/citation flows based on relative ratios to one another.
For example, only consider URLs with a trust flow 50% higher than their citation flow.

0 - 100 (Default 0)

UsePrefixScan

This parameter indicates that you would like links to any page under the specified prefix

eg setting
item=http://www.site.com/folder/
and
UsePrefixScan=1
will return any links to any page on www.site.com under /folder/

Default: 0

CreateBackLinkCountsOnly

Adding this parameter with any value will create a list of known URLs within the query scope, summarised by the number of backlinks to them.

eg CreateBackLinkCountsOnly=1 or CreateBackLinkCountsOnly=0

Default: parameter not sent

MinBackLinks

Used in conjunction with CreateBackLinkCountsOnly to specify the minimum number of external backlinks a URL should have to be included in the report.
0 will include all known URLs within the scope of the report, 1 will include all URLs with 1 or more external backlinks, etc.

Default: 0

Sample query and response

This is a protocol-level example query that uses a special URL that was overridden to have zero cost of analysis (you will need to use your own API_KEY to analyse other urls):

https://api.majestic.com/api/xml?app_api_key=API_KEY&cmd=DownloadBackLinks&Query=http://www.majestic.com/comparedomainbacklinkhistory.php

Sample XML response:



https://api.majestic.com/api/json?app_api_key=API_KEY&cmd=DownloadBackLinks&Query=http://www.majestic.com/comparedomainbacklinkhistory.php

Sample JSON response:



This response indicates that the request was queued for asynchronous processing. Note: JobID value is returned in this case. This response indicates that this request has already been requested recently and data files were prepared for it - see GetDownloadsList command how to check for those data files. Sample check query for this particular request data files (note: DownloadJobID value will be different - take it from XML returned by previous call):
https://api.majestic.com/api/xml?app_api_key=API_KEY&cmd=GetDownloadsList&DownloadJobID=23C65F56E42833BE94305BCA6B10320F



https://api.majestic.com/api/json?app_api_key=API_KEY&cmd=GetDownloadsList&DownloadJobID=BE82255B70F2CD1DCC06762FA68BE9B9

Sample JSON response:

The response indicates where data files are located. Please note that it is possible to supply your own FTP location for such uploads and also it is highly recommended to use NotifyURL functionality that will call your web application to confirm that processing of such long running request is over.


Data file can now be downloaded from PublicDownloadLocation shown in XML above. The format of that file is CSV (Comma Separated Values) with UTF-8 encoding format. The first line of the file is always the header. Target URL - URL where backlink is pointing to
Target ACRank - ACRank of Target URL
Source URL - Source URL (backlink)
Source ACRank - ACRank of Source URL
Anchor Text - anchor text used in linking, for images it will be text used in ALT="" part of the tag and for Mentions it will be text used (ie: example.com)
Source Crawl Date - the date when backlink was last (most recently) crawled
Source First Found Date - the date when backlink was first found on source page (historical data is more meaningful in this context because fresh index only covers last 30 days of crawl)
FlagNoFollow - if set to + then source URL was marked as nofollow
FlagImageLink - if set to + then source URL was image
FlagRedirect - if set to + then source URL was redirect
FlagFrame - if set to + then source URL was used in FRAME or IFRAME
FlagOldCrawl - if set to + then source URL was found to be deleted (removed)
FlagAltText - if set to + then source URL was taken from TITLE part of an A tag
FlagMention - if set to + then source URL was actually text mention


Sample API request to delete this job:
https://api.majestic.com/api/xml?app_api_key=API_KEY&cmd=DeleteDownloads&DownloadJobIDs=23C65F56E42833BE94305BCA6B10320F

Sample XML response indicating that the job was deleted successfully:



Sample API request to delete this job:
https://api.majestic.com/api/json?app_api_key=API_KEY&cmd=DeleteDownloads&DownloadJobIDs=BE82255B70F2CD1DCC06762FA68BE9B9

Sample JSON response indicating that the job was deleted successfully:

Common problems

Below you can see a number of common problems that were experienced by our customers.

Problem: Calling to analyse very large domains (like google.com) can quickly use up available resources.
Solution: Do not analyse large domain unless you really mean it - each call to this function results in server processing data and this reduces your resource allowance. Use SkipIfAnalysisCostGreaterThan parameter to avoid analysing too large domains and/or call GetIndexItemInfo function to get analysis cost first.
Problem: calling analysis of the same domain many times over with slightly different parameters can quickly use up available resources.
Solution: call this function once to get all data you need and then on your end you can make all sort of separate calls to the data.
Problem: calling analysis function with a set of parameters yields low or no results.
Solution: review the parameters - it is likely that you've made such a narrow filtering that no or few backlinks can satisfy it. This may be particularly true in case of date range filtering: you need to take into account that our main index is not updated every day (daily updates on website are done separately).

For more information about access to the Majestic API suite, visit our Plans & Pricing page.