Visit surechem label

Monday, 8 April 2013

SureChem and SciBite in BioInform

BioInform, a newsletter focused on life sciences software and content solutions, has a write-up on our recent partnership with SciBite. We're looking into getting a reprint, but for any subscribers out there, you can read the article here.

Friday, 5 April 2013

SureChemDirect API Outage - 03/04/2013

"Anything that can go wrong, will go wrong" - Murphy’s Law (Edward Aloysius Murphy, Jr.)

No matter how well built software is, at some point, something will go wrong. First off, we would like to apologise to any customers whose work was affected. Secondly, we would like to explain what happened. We believe that being transparent with our clients is the best way to ensure they understand that if we do encounter problems, we learn from them and work to make sure they won't happen again.

Around 14:00 GMT on April 2, SureChem experienced a system outage, which completely shut down our API infrastructure.

Our first step was to try and boot the API back up, but all of our API instances were unresponsive. After a careful diagnosis we detected that the problem was our API instances running out of disk space due to an unexpected increase in our (already large) search index.

Because we were synchronising new data onto the API instances when we ran out of disk space, the search index became corrupted. At this point our main priority shifted to resuming limited, but stable, service to make sure our clients would be able use our services again.

As a result, SureChemOpen and SureChemDirect experienced a service interruption of about 3 hours.

During and after the event we assessed what took place. The main problem was our failure to deal with the unexpected growth in the data from one of our third party vendors. Because of that we have looked at our own architecture and put in place some changes to ensure this doesn't happen again:

  • Data synchronisation was improved so that if it fails to copy the entire search index to one of the API instances it won't try and copy to any of the others
  • We’ve reviewed our storage architecture and are working to improve it and make it more resilient
  • We directly contacted our SureChemDirect customers about this outage, and thankfully no one was too adversely affected. We hope this was the case for you as well, but if not, again please accept our apologies if your work was interrupted by this event

Comments are welcome!

Tuesday, 2 April 2013

Update to our PubChem deposition - another .5M structures unique to SureChem

We recently updated our structure data deposited with PubChem, our first update since the initial deposition last December. Overall, the number of SureChem structures increased by just under 1 million, to 9.3 million. This is a fairly large number for three months' time elapsed, and is due in part to the fact that we added structures extracted from images for the years 2007-2011, as the original deposition only had structures extracted from images for 2012.

What's most interesting, and gratifying, about this deposition is the relatively high proportion of structures unique to the SureChem corpus: out of 979,214 newly added structures, more than 50% (508,155) were solely from SureChem. This should be good news to open drug discovery advocates.

The update also led to modest dips in the number of unique structures from the other sources, which makes sense of course, as there will always be overlap. It's most noticeable with SCRIPDB, since this is basically comprised of structure data from the USPTO complex work units. We are also gradually processing the complex work unit data, though we are filtering out a lot of erroneous structures that unfortunately characterise that  particular data source.

With the latest SureChem update, the number of structures from patents in PubChem now stands at more than 15 million.

For the truly curious, here is a comparison of the various patent chemistry sources in PubChem before and after our latest update, courtesy of the ever-industrious Chris Southan:

Wednesday, 27 March 2013

Digital Science collaboration with SciBite

Digital Science recently entered into a strategic partnership with SciBite. This opens up some interesting possibilities for SureChem in terms of expanding beyond patents and chemistry to other scientific full text sources and into biology. For more info, check out the post on the Digital Science website.

Monday, 4 March 2013

II-SDV Conference, 14-16 April 2013, Nice

Nicko and I will be exhibiting at the International Information Conference on Search, Data Mining and Visualization (II-SDV) from 14th to 16th April in Nice, France. 

Come by to learn about the newly released SureChemDirect: an API with a Pipeline Pilot collection and a Data Feed product and check out our free SureChemOpen patent chemistry search tool.

I will be giving a short product review on Monday 15th after lunch so you can find out about what's the latest at SureChem.

Click here for details of the program.

We look forward to seeing you there!


Wednesday, 16 January 2013

ReadCube partners with Open Access publishers Frontiers

Our friends over at ReadCube are expanding their reach. A new partnership with Open Access publisher Frontiers will enable readers to access their content view ReadCube's web reader. For more info, check out the Digital Science blog post.

Thursday, 13 December 2012

PubChem deposition 'slice and dice'

Patent chemistry maven Chris Southan has expanded his analysis of the SureChem deposition into PubChem. Those interested in how the SureChem data compares to other patent chemistry, as well as non-patent sources, in PubChem, should venture over to his latest blog post.