Speech Services

Add smart API capabilities to enable contextual interactions

Apply

Speech Services pricing

Speech Services

*The following prices are tax-inclusive.
INSTANCE CATEGORY FEATURES PRICE
Free - Web 1 concurrent1 Speech to Text Standard 5 audio hours free per month
Custom

5 audio hours free per month

Endpoint hosting: 1 model free per month2

Text to Speech Standard 5M characters free per month
Neural 0.5M characters free per month
Speech Translation Standard 5 audio hours free per month
Standard - Web 20 concurrent request1 Speech to Text Standard ¥3 per audio hour
Custom

¥4.452 per audio hour

Endpoint hosting: ¥0.547 per model per hour

Text to Speech Standard ¥9.9 per 1M characters
Neural ¥101.76 per 1M characters
Speech Translation Standard ¥25.5 per audio hour
INSTANCE CATEGORY FEATURES PRICE
Free - Web 1 concurrent1 Speech to Text Standard 5 audio hours free per month
Custom

5 audio hours free per month

Endpoint hosting: 1 model free per month2

Text to Speech Standard 5M characters free per month
Neural 0.5M characters free per month
Speech Translation Standard 5 audio hours free per month
Standard - Web 20 concurrent request1 Speech to Text Standard ¥3 per audio hour
Custom

¥4.452 per audio hour

Endpoint hosting: ¥0.547 per model per hour

Text to Speech Standard ¥9.9 per 1M characters
Neural ¥101.76 per 1M characters
Speech Translation Standard ¥25.5 per audio hour
INSTANCE CATEGORY FEATURES PRICE
Free - Web 1 concurrent1 Speech to Text Standard 5 audio hours free per month
Custom

5 audio hours free per month

Endpoint hosting: 1 model free per month2

Text to Speech Standard 5M characters free per month
Neural 0.5M characters free per month
Speech Translation Standard 5 audio hours free per month
Standard - Web 20 concurrent request1 Speech to Text Standard ¥3 per audio hour
Custom

¥4.452 per audio hour

Endpoint hosting: ¥0.547 per model per hour

Custom Neural Training ¥529.152 per hour

Custom Neural Long Audio Characters ¥1017.6 per M

Text to Speech Standard ¥9.9 per 1M characters
Neural ¥101.76 per 1M characters
Speech Translation Standard ¥25.5 per audio hour
1 The concurrent requests applied to web endpoints only.

2 Unused models will be automatically decommissioned after 7 days.

Commitment Tiers

Instance Category Features Price(Per Month) Overage
Azure-Standard Text to Speech Neural1 ¥6512.64 for 80M characters
¥26457.6 for 400M characters
¥101760 for 2,000M characters
¥81.41 per 1M characters
¥66.14 per 1M characters
¥50.88 per 1M characters
Connected container - Standard Text to Speech Neural1 ¥6187.01 for 80M characters
¥25134.72 for 400M characters
¥96672 for 2,000M characters
¥77.34 per 1M characters
¥62.84 per 1M characters
¥48.34 per 1M characters
1Real-time synthesis only, this does not include long audio creation.

Face

The Face API uses uses state-of-the-art cloud-based face algorithms to detect and recognize human faces in images. Capabilities include features like face detection, face verification, and face grouping to organize faces into groups based on their visual similarity.

*The following prices are tax-inclusive.
Tier Feature Price
Face API – free Up to 20 transactions/minute 30,000 transactions free per month
Face API – standard Up to 10 transactions per second
0-1,000,000 transactions ¥ 6.36 per 1,000 transactions new
1,000,001-5,000,000 transactions ¥ 5.09 per 1,000 transactions new
5,000,001-100,000,000 transactions ¥ 3.82 per 1,000 transactions new
Over 100,000,000 transactions ¥2.54 per 1,000 transactions new
Face Storage Stores images sized up to 4 MB each ¥ 1.59 per month per 1,000 images new

Computer Vision

This state-of-the-art, cloud-based API provides developers with access to advanced algorithms that allow you to extract rich information from images to categorize and process visual data. Capabilities include image analysis, tagging, recognition celebrities, text extraction, and smart thumbnail generation.

*The following prices are tax-inclusive.
Tier Features Pricing
Computer Vision API—free 5,000 transactions free per month
S1 - Web/Container Tag, Face,
Get Thumbnail Color,
and Image Types

0-1 million transactions - ¥ 6.36 per 1,000 transactions

1 million-5 million transactions - ¥ 5.088 per 1,000 transactions

5 million+ transactions - ¥ 4.134 per 1,000 transactions

S2 - Web/Container OCR (printed), Adult,
Celebrity, and Landmark

0-1 million transactions - ¥ 9.54 per 1,000 transactions

1 million-5 million transactions - ¥ 6.36 per 1,000 transactions

5 million+ transactions - ¥ 4.13 per 1,000 transactions

S3 - Web/Container Describe and OCR (handwriting) ¥ 15.90 per 1,000 transactions
Spatial analysispreview Free: 750 hours for one camera per month
Additional: ¥0.07314 per 1 Video Stream Edge Hour

Customers are charged per transaction not per API call. Learn more about what transactions are below.

* Products in Preview

+ Non-English languages are in Preview

Content Moderator

Content moderator enhances your ability to detect potentially offensive or unwanted images through machine-learning based classifiers, custom blacklists, and optical character recognition (OCR). It helps you detect potential profanity in more than 100 languages and match text against your custom lists automatically. Content Moderator also checks for possible personally identifiable information (PII). Each Text API call can contain up to 1,024 characters each. Scan images (minimum 128 pixels, maximum 4MB size) for adult and racy content, optical character recognition (OCR) and face detection. You can also match against custom image lists. Each API call is a transaction.

*The following prices are tax-inclusive.
INSTANCE TRANSACTIONS PER SECOND (TPS) FEATURES PRICE
Free 1 TPS Moderate 5,000 transactions free per month
1 TPS Review N/A
Standard 10 TPS Moderate 0-1M transactions - ¥10.18 per 1,000 transactions
1M-5M transactions - ¥7.63 per 1,000 transactions
5M-10M transactions - ¥6.11 per 1,000 transactions
10M+ transactions - ¥4.07 per 1,000 transactions

Language Service

Language Service API is a cloud-based service that provides advanced natural language processing over raw text, and includes three main functions—sentiment analysis, key phrase extraction, and language detection.

*The following prices are tax-inclusive.
INSTANCE FEATURES Inferencing
Per 1,000 text records
Free - Web Sentiment Analysis
Key Phrase Extraction
Language Detection
Entity Extraction
Document summarization (Extractive) – Preview
Conversational language understanding
5,000 transactions free per month
Standard
up to 100 requests per second and 1,000 requests per minute
Sentiment Analysis
Key Phrase Extraction
Language Detection
Entity Extraction
Document summarization (Extractive) – Preview
0-500,000 text records — ¥10.176 per 1,000 text records
0.5M-2.5M text records — ¥7.632 per 1,000 text records
2.5M-10.0M text records — ¥3.053 per 1,000 text records
10M+ text records — ¥2.54 per 1,000 text records
¥20.352 per 1,000 text records
Conversational language understanding ¥21.56

Translator Text

Translator Text API is a cloud-based machine translation service supporting multiple languages, reaching more than 95% of world's gross domestic product (GDP). Use Translator to build applications, websites, tools, or any solution requiring multi-language support.

*The following prices are tax-inclusive.
INSTANCE FEATURES PRICE
Free Text Translation
Language Detection
Bilingual Dictionary
Transliteration
2M chars free per month
S1 Text Translation
Language Detection
Bilingual Dictionary
Transliteration
¥102 per million chars
S2 Text Translation
Language Detection
Bilingual Dictionary
Transliteration
¥20,925 / month / Up to 250M chars per month, Overage : ¥84 per million chars
S3 Text Translation
Language Detection
Bilingual Dictionary
Transliteration
¥61,070 / month / Up to 1B chars per month, Overage : ¥61 per million chars
S4 Text Translation
Language Detection
Bilingual Dictionary
Transliteration
¥457,932 / month / Up to 10B chars per month, Overage : ¥46 per million chars
D3
Variable cost plus Fixed plus overage
Document Translation ¥61,817/month
675M chars per month included
Overage: ¥10.1124 per million chars

Language Understanding

Language Understanding (LUIS) offers a fast and effective way of adding language understanding to applications. With LUIS, you can use pre-existing, world-class, pre-built models whenever they suit your purposes. When you need specialized models, LUIS guides you through the process of quickly building them.

*The following prices are tax-inclusive.
INSTANCE TRANSACTIONS PER SECOND (TPS)1 FEATURES PRICE
Free2 -
Web
5 TPS Text Requests 10,000 transactions* free per month*
Standard -
Web
50 TPS Text Requests ¥15.26 per 1000 transactions per month*
1 TPS only applies to web endpoint.

2 Free Tier includes only text as an input.

* Dispatch will do two text transactions per request.

Training

Instance Feature Training
Free - Web Conversational language understanding Standard training: free
Advanced training: up to 1 hour free
Standard(S) - Web Conversational language understanding Standard training: free
Advanced training: ¥32.3 /hour

FAQ

Expand all

Common

Face

  • What is Face Storage? What can it be used for?

    Face Storage allows a subscription to store additional persisted faces when using person objects and face lists for identification, or for similarity matching with the Face API.

  • How are the number of stored faces calculated for billing if it varies throughout the month?

    Stored images are charged at ¥1.59 per 1,000 faces. This rate is prorated on a daily basis. For example, if your account used 10,000 persisted faces each day for the first half of the month and none during the second half, you would only be billed for the number of days the 10,000 faces were stored for. This would be calculated as (¥1.59/1,000) x (10,000 x 15 + 0 x 16)/31 = ¥7.69. Alternatively, if you retained 1,000 faces for a few hours each day during a month and then deleted them every night, you would still be billed for the 1,000 persisted faces each day.

  • What is the capacity of Face Storage?

    The quota for stored person groups is currently 1,000. Each person group or FaceList can have a maximum of 1,000 people.

Computer Vision

  • What operations can be completed with Computer Vision API?

    Tag - The Computer Vision API returns tags based on more than 2,000 recognizable objects, living beings, types of scenery, and actions. If tags are ambiguous or unusual, the API response will provide 'hints' to clarify the meaning of the tag.

    Face - Detects human faces within a picture.

    GetThumbnail - GetThumbnail generates high quality thumbnails after images are uploaded. The Computer Vision API algorithm analyzes objects within images, then crops images according to the requirements for the region of interest (ROI).

    Color - The Computer Vision algorithm extracts colors from an image. The colors are analyzed in three different contexts (foreground, background, and whole). They are grouped into 12 dominant accent colors.

    Image Type - The Computer Vision API can set a Boolean flag to indicate whether an image is black and white or color. It can use the same method to indicate whether an image is a line drawing. It can also indicate whether an image is clip art, along with its quality.

    OCR - Optical Character Recognition (OCR) technology detects text content in an image and extracts the identified text into a machine-readable character stream. You can use the results for searches and numerous other purposes, from medical records to security and banking. It automatically detects the language. OCR saves time and provides convenience for users by allowing them to simply take photos of text instead of transcribing it. Please refer to the Computer Vision Documentation page for supported languages.

    Adult - Apply adult/racy settings to automatically restrict adult content in images.

    Celebrities - Azure’s celebrity recognition model can recognize 200,000 celebrities from business, politics, sports, and entertainment around the world.

Content Moderator

Text Analytics

  • How does billing for the Text Analytics API work?

    The Text Analytics API can be purchased in units of the S0-S4 tier at a fixed price. Each unit of a tier comes with included quantities of API transactions. If the user exceeds the included quantities, overages are charged at the rate specified in the pricing table above. These overages are prorated and the service is billed on a monthly basis. The included quantities in a tier are reset each month. In the S tier, the service is billed for only the amount of Text Records submitted to the service.

  • What happens if I exceed the transaction limit on my free tier for Text Analytics?

    Usage is throttled if the transaction limit is reached on the Free tier. Customers cannot accrue overages on the free tier.

  • What constitutes a transaction in the S0-S4 tiers of the Text Analytics API?

    Any annotation to a document counts as a transaction. Batch scoring calls will also take into consideration the number of documents that need to be scored in that transaction. So for instance, if 1,000 documents are sent for sentiment analysis in a single API call, that will count for 1,000 transactions. If an API supports more than one annotation operation, that will also be considered. Let’s say an API call performs both sentiment analysis and key-phrase extraction on 1,000 documents, that will count for 2,000 transactions (2 annotations × 1,000 documents).

  • What happens if I exceed the transaction limit on the S0-S4 tier?

    If the usage on the S0-S4 tier is exceeded, the account starts to accrue overages. These overages are billed on a monthly basis and are calculated at the rate specified for each tier.

  • Can I change the tier of service I subscribed to?

    You may upgrade to a higher tier at any time. Billing rate and included quantities corresponding to the higher tier will begin immediately.

  • What constitutes a Text Record in the S Tier?

    A text record in the S tier contains up to 1,000 characters as measured by String.Length. If an input document into the text analytics API is more than 1,000 characters, it counts as one text record for each unit of 1,000 characters. For instance, if an input document sent to the API contains 7,500 characters, it would count as 8 text records. If an input document sent to the API contains 500 characters, it would count as 1 text record. If two documents are submitted, one document of 500 characters and one document of 1,200 characters, then the service would be billed for three text records in total: one record for the 500 character document and two text records for the 1,200 character document.

Translator Text

  • How do I calculate monthly volume?

    For the Microsoft Translator Text API, the volume you are billed for is the number of characters in the input. Every Unicode code point counts as a character. Every character of the input counts. Each translation of a text to a new language counts as a separate translation. The number of queries, words, bytes, or sentences is irrelevant.

    To estimate your monthly volume, take the total characters to translate, multiply it by the number of languages you want to have it translated into, then take the number and spread it over the maximum number of hours or days you are able to wait for completion.

    More information on how we count characters for the Translator Text API can be found in our documentation.

  • What happens if I reach the limit of the free subscription plan?

    If you subscribe to the free subscription plan, the Microsoft Translator service will stop if you reach 2 million characters during a subscription month for the Text Translation API. The Microsoft Translator service will start again at the beginning of your next subscription month or when you change your subscription to a paid plan.

  • What languages does Microsoft Translator support?

    See the language list for text translation using the Microsoft Translator Text API.

    Developer oriented language lists, including language codes can be found in our documentation.

  • Can I customize my translations?

    Customization currently is not available with subscriptions on Azure.cn.

Language Understanding

  • What is a transaction?

    For text requests, a transaction is an API call with query length up to 500 characters.

    For speech requests, a transaction is an utterance with query length up to 15 seconds long.

  • Is speech requests included in the free tier?

    No, the free tier only includes text requests with max length 500 characters.

  • What is a Dispatch?

    Dispatch is a feature that enables processing two models/applications with one API call.

Speech Services

  • How does billing work?

    For Speech Translation, Speech to Text : usage is billed in one-second increments

    For Text to Speech : usage is billed per character

    Please reference the pricing note here for the SSML tag charging, Chinese, Japanese and Korean(CJK) character pricing.

Support & SLA

If you have any questions or need help, please visitAzure Support and select self-help service or any other method to contact us for support.

We guarantee that Cognitive Services running at the Standard tier will be available at least 99.9% of the time. No SLA is provided for the Free tier. If you want to learn more about the details of our server level agreement, please visit the Service Level Agreement page.