Cognitive Services Pricing
The Face API uses uses state-of-the-art cloud-based face algorithms to detect and recognize human faces in images. Capabilities include features like face detection, face verification, and face grouping to organize faces into groups based on their visual similarity.
|Face API – free||Up to 20 transactions/minute||30,000 transactions free per month|
|Face API – standard||Up to 10 transactions per second|
|0-1,000,000 transactions||￥ 6.36 per 1,000 transactions new|
|1,000,001-5,000,000 transactions||￥ 5.09 per 1,000 transactions new|
|5,000,001-100,000,000 transactions||￥ 3.82 per 1,000 transactions new|
|Over 100,000,000 transactions||￥2.54 per 1,000 transactions new|
|Face Storage||Stores images sized up to 4 MB each||￥ 1.59 per month per 1,000 images new|
This state-of-the-art, cloud-based API provides developers with access to advanced algorithms that allow you to extract rich information from images to categorize and process visual data. Capabilities include image analysis, tagging, recognition celebrities, text extraction, and smart thumbnail generation.
|Computer Vision API—free||5,000 transactions free per month|
|S1||Tag, Face, Get Thumbnail Color, and Image Types||0-1 million transactions - ￥ 6.36 per 1,000 transactions|
|1 million-5 million transactions - ￥ 5.088 per 1,000 transactions|
|5 million+ transactions - ￥ 4.134 per 1,000 transactions|
|S2||OCR (printed), Adult, Celebrity, and Landmark||0-1 million transactions - ￥ 9.54 per 1,000 transactions|
|1 million-5 million transactions - ￥ 6.36 per 1,000 transactions|
|5 million+ transactions - ￥ 4.13 per 1,000 transactions|
|S3||Describe and OCR (handwriting)||￥ 15.90 per 1,000 transactions|
Content moderator enhances your ability to detect potentially offensive or unwanted images through machine-learning based classifiers, custom blacklists, and optical character recognition (OCR). It helps you detect potential profanity in more than 100 languages and match text against your custom lists automatically. Content Moderator also checks for possible personally identifiable information (PII). Each Text API call can contain up to 1,024 characters each. Scan images (minimum 128 pixels, maximum 4MB size) for adult and racy content, optical character recognition (OCR) and face detection. You can also match against custom image lists. Each API call is a transaction.
|INSTANCE||TRANSACTIONS PER SECOND (TPS)||FEATURES||PRICE|
|Free||1 TPS||Moderate||5,000 transactions free per month|
0-1M transactions - ￥10.18 per 1,000 transactions
1M-5M transactions - ￥7.63 per 1,000 transactions
5M-10M transactions - ￥6.11 per 1,000 transactions
10M+ transactions - ￥4.07 per 1,000 transactions
Text Analytics API is a cloud-based service that provides advanced natural language processing over raw text, and includes three main functions—sentiment analysis, key phrase extraction, and language detection.
|Free - Web/Container||Sentiment Analysis
Key Phrase Extraction
|5,000 transactions free per month|
up to 100 requests per second and 1,000 requests per minute
Key Phrase Extraction
0-500,000 text records — ￥20.35 per 1,000 text records
0.5M-2.5M text records — ￥10.18 per 1,000 text records
2.5M-10.0M text records — ￥5.09 per 1,000 text records
10M+ text records — ￥2.54 per 1,000 text records
Translator Text API is a cloud-based machine translation service supporting multiple languages, reaching more than 95% of world's gross domestic product (GDP). Use Translator to build applications, websites, tools, or any solution requiring multi-language support.
|2M chars free per month|
|￥102 per million chars|
|￥20,912 / month / Up to 250M chars per month, Overage : ￥84 per million chars|
|￥61,056 / month / Up to 1B chars per month, Overage : ￥61 per million chars|
|￥457,920 / month / Up to 10B chars per month, Overage : ￥46 per million chars|
Language Understanding (LUIS) offers a fast and effective way of adding language understanding to applications. With LUIS, you can use pre-existing, world-class, pre-built models whenever they suit your purposes. When you need specialized models, LUIS guides you through the process of quickly building them.
|INSTANCE||TRANSACTIONS PER SECOND (TPS)1||FEATURES||PRICE|
|5 TPS||Text Requests||10,000 transactions* free per month*|
|50 TPS||Text Requests||￥15.26 per 1000 transactions per month*|
|Free - Web/Container1 concurrent request1||Speech to Text||Standard||5 audio hours free per month|
|Custom Speech||5 audio hours free per month|
|Custom Speech endpoint hosting||1 model free per month|
|Conversation Transcription Multichannel Audio||5 audio hours free per month|
|Standard - Web/Container 20 concurrent request1||Speech to Text||Standard||¥3 per audio hour|
|Custom Speech||¥14.5 per audio hour|
|Custom Speech endpoint hosting||¥407.65 per model per month|
How are the Cognitive Services APIs billed?
The Face API and Computer Vision API are billed per 1,000 API transaction calls when a production API call is being actively executed.
What will happen if I exceed the transaction limit at the Standard tier?
If the usage on a standard tier is exceeded, the account starts to accrue overages. These overages are billed on a monthly basis, and are calculated at the rate specified for each tier.
Can I change the service tier I subscribed to?
You can upgrade to a higher tier at any time. The billing rate corresponding to the higher tier and the amounts included will take effect immediately.
What is Face Storage? What can it be used for?
Face Storage allows a subscription to store additional persisted faces when using person objects and face lists for identification, or for similarity matching with the Face API.
How are the number of stored faces calculated for billing if it varies throughout the month?
Stored images are charged at ￥1.59 per 1,000 faces. This rate is prorated on a daily basis. For example, if your account used 10,000 persisted faces each day for the first half of the month and none during the second half, you would only be billed for the number of days the 10,000 faces were stored for. This would be calculated as (￥1.59/1,000) x (10,000 x 15 + 0 x 16)/31 = ￥7.69. Alternatively, if you retained 1,000 faces for a few hours each day during a month and then deleted them every night, you would still be billed for the 1,000 persisted faces each day.
What is the capacity of Face Storage?
The quota for stored person groups is currently 1,000. Each person group or FaceList can have a maximum of 1,000 people.
What operations can be completed with Computer Vision API?
Tag - The Computer Vision API returns tags based on more than 2,000 recognizable objects, living beings, types of scenery, and actions. If tags are ambiguous or unusual, the API response will provide 'hints' to clarify the meaning of the tag.
Face - Detects human faces within a picture.
GetThumbnail - GetThumbnail generates high quality thumbnails after images are uploaded. The Computer Vision API algorithm analyzes objects within images, then crops images according to the requirements for the region of interest (ROI).
Color - The Computer Vision algorithm extracts colors from an image. The colors are analyzed in three different contexts (foreground, background, and whole). They are grouped into 12 dominant accent colors.
Image Type - The Computer Vision API can set a Boolean flag to indicate whether an image is black and white or color. It can use the same method to indicate whether an image is a line drawing. It can also indicate whether an image is clip art, along with its quality.
OCR - Optical Character Recognition (OCR) technology detects text content in an image and extracts the identified text into a machine-readable character stream. You can use the results for searches and numerous other purposes, from medical records to security and banking. It automatically detects the language. OCR saves time and provides convenience for users by allowing them to simply take photos of text instead of transcribing it. Please refer to the Computer Vision Documentation page for supported languages.
Adult - Apply adult/racy settings to automatically restrict adult content in images.
Celebrities - Azure’s celebrity recognition model can recognize 200,000 celebrities from business, politics, sports, and entertainment around the world.
What are the limits/restrictions of the content that can be moderated by using the API?
When using the API, images need to have a minimum of 128 pixels and a maximum file size of 4MB. Text can be at most 1024 characters long.
What happens if the content passed to the text API or the image API exceeds the size limits?
The text API will return an error code that informs that the text is longer than permitted. The image API will also return an error code that informs that the image does not meet the size requirements.
Is there an extra cost to using Human review tool?
Human review tool is included in your subscription.
How does billing for the Text Analytics API work?
The Text Analytics API can be purchased in units of the S0-S4 tier at a fixed price. Each unit of a tier comes with included quantities of API transactions. If the user exceeds the included quantities, overages are charged at the rate specified in the pricing table above. These overages are prorated and the service is billed on a monthly basis. The included quantities in a tier are reset each month. In the S tier, the service is billed for only the amount of Text Records submitted to the service.
What happens if I exceed the transaction limit on my free tier for Text Analytics?
Usage is throttled if the transaction limit is reached on the Free tier. Customers cannot accrue overages on the free tier.
What constitutes a transaction in the S0-S4 tiers of the Text Analytics API?
Any annotation to a document counts as a transaction. Batch scoring calls will also take into consideration the number of documents that need to be scored in that transaction. So for instance, if 1,000 documents are sent for sentiment analysis in a single API call, that will count for 1,000 transactions. If an API supports more than one annotation operation, that will also be considered. Let’s say an API call performs both sentiment analysis and key-phrase extraction on 1,000 documents, that will count for 2,000 transactions (2 annotations × 1,000 documents).
What happens if I exceed the transaction limit on the S0-S4 tier?
If the usage on the S0-S4 tier is exceeded, the account starts to accrue overages. These overages are billed on a monthly basis and are calculated at the rate specified for each tier.
Can I change the tier of service I subscribed to?
You may upgrade to a higher tier at any time. Billing rate and included quantities corresponding to the higher tier will begin immediately.
What constitutes a Text Record in the S Tier?
A text record in the S tier contains up to 1,000 characters as measured by String.Length. If an input document into the text analytics API is more than 1,000 characters, it counts as one text record for each unit of 1,000 characters. For instance, if an input document sent to the API contains 7,500 characters, it would count as 8 text records. If an input document sent to the API contains 500 characters, it would count as 1 text record. If two documents are submitted, one document of 500 characters and one document of 1,200 characters, then the service would be billed for three text records in total: one record for the 500 character document and two text records for the 1,200 character document.
How do I calculate monthly volume?
For the Microsoft Translator Text API, the volume you are billed for is the number of characters in the input. Every Unicode code point counts as a character. Every character of the input counts. Each translation of a text to a new language counts as a separate translation. The number of queries, words, bytes, or sentences is irrelevant.
To estimate your monthly volume, take the total characters to translate, multiply it by the number of languages you want to have it translated into, then take the number and spread it over the maximum number of hours or days you are able to wait for completion.
More information on how we count characters for the Translator Text API can be found in our documentation.
What happens if I reach the limit of the free subscription plan?
If you subscribe to the free subscription plan, the Microsoft Translator service will stop if you reach 2 million characters during a subscription month for the Text Translation API. The Microsoft Translator service will start again at the beginning of your next subscription month or when you change your subscription to a paid plan.
Can I customize my translations?
Customization currently is not available with subscriptions on Azure.cn.
What is a transaction?
For text requests, a transaction is an API call with query length up to 500 characters.
For speech requests, a transaction is an utterance with query length up to 15 seconds long.
Is speech requests included in the free tier?
No, the free tier only includes text requests with max length 500 characters.
What is a Dispatch?
Dispatch is a feature that enables processing two models/applications with one API call.
How does billing work?
For Speech Translation, Speech to Text, and Speech to Text with Custom Speech Model : usage is billed in one-second increments
For Text to Speech and Text To Speech with Custom Voice Font: usage is billed per character
For Custom Speech Model Hosting: usage is billed hourly; For Custom Voice Font Hosting: usage is billed daily
What is a "Custom Speech Model"?
The Speech service enables users to adapt baseline models based on their own acoustic and language data, leading to custom speech models that can be used against both Speech to Text and Speech Translation.
What is a language model and why customize it?
The language model is a probability distribution over sequences of words. The language model helps the system decide among sequences of words that sound similar, based on the likelihood of the word sequences themselves. For example, "recognize speech" and "wreck a nice beach" sound alike but the first hypothesis is far more likely to occur, and therefore will be assigned a higher score by the language model. If you expect voice queries to your application to contain particular vocabulary items, such as product names or jargon that rarely occur in typical speech, it is likely that you can obtain improved performance by customizing the language model. For example, if you were building an app to search MSDN by voice, it`s likely that terms like "object-oriented" or "namespace" or "dot net" will appear more frequently than in typical voice applications. Customizing the language model will enable the system to learn this.
What is an acoustic model and why customize it?
The acoustic model is a classifier that labels short fragments of audio into one of several phonemes, or sound units, in each language. These phonemes can then be stitched together to form words. For example, the word "speech" is comprised of four phonemes "s p iy ch". These classifications are made on the order of 100 times per second. Customizing the acoustic model can enable the system to learn to do a better job recognizing speech in atypical environments. For example, if you have an app designed to be used by workers in a warehouse or factory, a customized acoustic model can more accurately recognize speech in the presence of the noises found in these environments.
Support & SLA
If you have any questions or need help, please visit Azure Support and select self-help service or any other method to contact us for support.
We guarantee that Cognitive Services running at the Standard tier will be available at least 99.9% of the time. No SLA is provided for the Free tier. If you want to learn more about the details of our server level agreement, please visit the Service Level Agreement page.