Inference API down?

While accessing this(speechbrain/lang-id-voxlingua107-ecapa · Hugging Face) model via Inference API, I am getting the following error -

(MaxRetryError(‘HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/speechbrain/lang-id-voxlingua107-ecapa (Caused by NameResolutionError(“<urllib3.connection.HTTPSConnection object at 0x7f5306290cd0>: Failed to resolve 'huggingface.co' ([Errno -3] Temporary failure in name resolution)”))’), ‘(Request ID: d978f641-257c-45c4-b95b-c51865344dfe)’)

Can someone provide more insight into this error? And how do we solve it?

I’m also facing issues with Inference API all of a sudden

same here.

We are also facing issues with the Inference Endpoints

Same for hours.:sweat:

Thanks for reporting, issue should be fixed now.

Fixed. Thank you.:blush:

@nielsr Does fixed mean that it is now a 500 internal server error? I am currently facing this error with all 3 providers and multiple models.

@nielsr I have debugged the problem further. The endpoints work as long as they are public. Both with and without scale to zero. If I secure the endpoint and request it without a token, a 401 is returned. So far so good. But if I pass a valid token, I get a 500. Do your integration tests work?

@nielsr Ok. This is really weird now. For 2 hours I got 401 from the UI creating new endpoints and deleting existing ones (which costed me 12$) or even showing existing instances. Now the instance is visible again. And the Endpoint ist working with toking. So I got a last question: Are you fixing things in production without customer feedback and what kind of availability and stability can I expect from dedicated endpoints? Are they ready for production (>99,9% availability)?

Hi,

Yes they should be ready for production (they aim to make putting ML models in production easier with a few clicks). I appreciate your feedback, I’m not part of the Inference Endpoints team but will forward your feedback to them.

The APIs for the evaluate library are also down for five days.

Just go here and see the runtime errors: evaluate-metric (Evaluate Metric)