Arkestro

Arkestro Product Status

TinyMCE Message Disruption

Tuesday, November 28, 2023

Impact

From 02:42 UTC November 28, 2023 to 03:29 UTC November 28, 2023 Arkestro experienced disruptions within the main Arkestro platform due to a failure in a third party plugin vendor (TinyMCE). This disruption caused warning messages in various locations throughout the platform that read “This domain is not registered with Tiny Cloud. To continue using TinyMCE, a registered domain is required, starting 2024.” User functionality was impeded until they closed the alert message.

Duration (47 minutes)

All times in UTC

02:42: Warning message appears in various locations on the Arkestro platform
02:52: Arkestro team identifies potential cause of the disruption
03:10: Fix is implemented
03:29: Fix is confirmed and disruptive alert message are no longer present in production

Workarounds

During the period of disruption, Users could close the TinyMCE alert messages and proceed utilizing the platform.

What are we changing?

We are improving and providing more monitoring on third party tooling to prevent future disruptions.

Closing

Arkestro identified an issue and implemented a fix within an hour. We are adding more monitoring of our third party tooling to prevent similar issues in the future.

We understand that when incidents happen, proactivity is critical to helping all of our customers make the best buying decisions faster. We strive to continuously improve our processes to help our customers achieve their best procurement cycles.

Elevated Sigma Load Times for AWS US

Wednesday, November 8, 2023

Impact

Sigma, a 3rd party visualization tool, experienced an AWS outage beginning November 8, 2023 20:39 UTC through November 8, 2023 21:03. We experienced elevated Sigma Load Times for AWS US for 24 minutes in the Arkestro Insights product due to a failure in Sigma. This outage was limited to the Insights product; other products including the Procurement Execution Platform (PEP) were unaffected.

Duration (24 minutes)

All times in UTC

20:39: Sigma reported an outage
https://status.sigmacomputing.com/incidents/db5ydck3zcxy
20:46: Outage reported
21:03: Confirmed impacts resolved (we are back in service here)
21:57: Sigma resolved incident
22:00: Incident resolved

Workarounds

None

What are we changing?

We are providing more routine training and improving on how we monitor application status and communicate outages to our customers to provide faster updates.

Closing

We understand that when incidents happen, proactivity is critical to helping all of our customers make the best buying decisions faster. We strive to continuously improve our processes to help our customers achieve their best procurement cycles.

Connectivity (DNS) Outage

Wednesday, November 8, 2023

Impact

Our 3rd party hosting provider experienced a Domain Name Service (DNS) outage beginning at November 8, 2023 44:19 UTC through November 8, 2023 14:55. We experience a full 11 minute outage due to connectivity to all Arkestro application products including Arkestro Predictive Procurement Orchestration and Arkestro Insights. www.arkestro.com was not impacted.

Duration (11 minutes)

All times in UTC

14:19: Heroku reported an outage https://status.heroku.com/incidents/2603
15:44: Outage reported from 1 location
15:47: Confirmed multiple locations
15:55: Confirmed impacts resolved (we are back in service here)
16:00: Heroku DNS issue identified (see above)
16:27: Heroku update is still waiting on their 3rd party DNS
16:58: Heroku confirms the 3rd party applied a fix – monitoring
17:04: Heroku resolves incident
17:05: Incident resolved

Workarounds

None

What are we changing?

We will be migrating from Heroku to AWS for 3rd party hosting including DNS. Specifically in the first half of 2024. Much of our DNS will be changed to cache longer than 30 minutes to both improve performance and reduce impact surface area for incidents like this.

Closing

We understand that when incidents happen, proactivity is critical to helping all of our customers make the best buying decisions faster. We strive to continuously improve our processes to help our customers achieve their best procurement cycles.

Bid Request Submission Outage

Wednesday, November 1, 2023

Impact

From 02:04a UTC Wednesday November 1, 2023 to 14:42p UTC Wednesday November 1, 2023, Arkestro experienced an outage in supplier users being able to submit bid responses. This outage was limited to Arkestro for Suppliers. Sourcing and other products such as Arkestro Insights or Arkestro Sourcing for Buyers were not impacted.

Duration (12 hours 38 minutes)

All times in UTC

02:04: Feature Enhanced Request Scheduler enabled in Arkestro Production
13:06: First customer reported outage
14:42: Feature Enhanced Request Scheduler disabled in Arkestro Production
15:28: Root cause analysis completed, remediation steps taken
17:00: Testing of remediation occurred and passed.
21:13: Messaging posted internally
22:00: Messaging posted externally

Workarounds

None

What are we changing?

We are improving and providing more routine training on how we utilize our release management software and embedding safeguards into our quality assurance processes to avoid future outages of this significance.

Closing

We understand that when incidents happen, proactivity is critical to helping all of our customers make the best buying decisions faster. We strive to continuously improve our processes to help our customers achieve their best procurement cycles.

Sigma/Snowflake SaaS Outage

Tuesday, October 3, 2023

Impact

From 15:02 UTC Tuesday October 3, 2023 to 18:54 UTC Tuesday October 3, 2023, Arkestro experienced elevated error rates in our Arkestro Insights product due to a failure in two third-party vendors (Sigma and Snowflake). This outage was limited to the Insights product; other products including the Procurement Execution Platform (PEP) were unaffected and zero sourcing events were affected.

Duration

All times in UTC

15:02: Elevated error rates reported for Arkestro Insights 
16:41: Sigma (A SaaS vendor used by Arkestro Insights) acknowledges an outage publicly on their status page.
17:43: Sigma posts update 1 (Researching a fix)
17:45: In conversation with our Sigma representative, they confirm a patch is live but needs a rollback in Snowflake in our account to take effect
17:49: Communicated with Snowflake to initiate rollback of their changes from the Arkestro account
18:28: Snowflake rollback complete and Arkestro Insights is functioning normally
18:45: Sigma posts update 2 (Waiting on Snowflake to rollback a change)
18:54: Sigma confirms that the recovery is complete and stable

Workarounds

None

What are we changing?

We are providing more routine training and improving on how we monitor application status and communicate outages to our customers.

Closing

We understand that when incidents happen, proactivity is critical to helping all of our customers make the best buying decisions faster. We strive to continuously improve our processes to help our customers achieve their best procurement cycles.