Green Solvent Metric on Solvent Predictor

In the spirit of contributing to Peter Murray-Rust’s initiative to collect Green Chemistry information, Andrew Lang and I have added a green solvent metric for 28 of the 72 solvents we include in our Solvent Selector service. The scale represents the combined contributions for Safety, Health and Environment (SHE) as calculated by ETH Zurich.

For example consider the following Ugi reaction solvent selection. Using the default thresholds, 6 solvents are proposed and 5 have SHE values. Assuming there are no additional selection factors, a chemist might start with ethyl acetate with a SHE value of 2.9 rather than acetonitrile with a value of 4.6.

Individual values of Safety, Health and Environment for each solvent are available from the ETH tool. We are just including the sum of the three out of convenience.

Note that the license for using the data from this tool requires citing this reference:
Koller, G., U. Fischer, and K. Hungerbühler, (2000). Assessing Safety, Health and Environmental Impact during Early Process Development. Industrial & Engineering Chemistry Research 39: 960-972.
Posted in Chemistry. Comments Off

The Reaction Attempts Solvent Selector

The ONS Solubility Challenge and the Reaction Attempts project have now been integrated with code written by Andrew Lang to the point that recommendations for solvents are just a click away.

First use the Reaction Attempts Explorer either using the drop-down menus or substructure search as described previously. When a reaction of interest is identified just click on the the link for “Optimal Solvent Prediction”.
The service will then provide a summary of solubility measurements and predictions, organized by the default criteria of minimum 0.3 M solubility of reactants, maximum 0.03 M solubility of the product and maximum solvent boiling point of 100 C. Liquid reactants (or reactants with melting points within 15 C of room temperature) are excluded since these generally have a high enough solubility in most solvents.
In the case of the Ugi reaction in this example, only the solubility of boc-glycine and the product are considered.

The results are color-coded. In this case 14 solvents are coded green, indicating that all criteria were met. The fifteenth solvent is coded yellow, indicating that one of the criteria was not met – in this case the boiling point of 205 C is outside of the limit of 100 C. High boiling point solvents are not optimal for quickly obtaining the product as a dry solid after filtering. This criterion can be changed in the input fields at the top of the page. It is also possible to change the number of times the product is washed there. This will only change the estimated yield, which is based on carrying out the reaction at the concentration of the least soluble reactant, up to 1 M.

Three columns are generated for the product and each reactant. The column on the right is the average of all measurements, as recorded in the SolubilitiesSum Spreadsheet. The middle column is a solubility prediction based on Abraham descriptors derived from experimental values, as described and used in the ONS Solubility Challenge book. The column on the left contains predictions from the Abraham001 model, which is based on calculated molecular descriptors only.
The numbers in bold represent the best solubility value available for each solvent. If a measurement is known, that will be the number used. If no measurement is available, the experimental Abraham descriptor model is used. If neither of these are available the predictions from the Abraham001 model are used by default.
From the list of solvents in the green section we find ethanol and acetonitrile. Both of these solvents were tried (as mixtures with methanol) in the optimization of this reaction (Bradley et al JoVE 2008) and provided good to intermediate results. THF was found to give low yields for this reaction and it scores at #51 in the yellow section, with a high solubility of the product accounting for the missed criterion.
One should keep in mind that this is just a tool to flag potentially interesting solvents. Common chemical sense needs to be used as well. For example, acetone and butanone are listed in the green section but these are incompatible with the Ugi reaction since they would compete with the aldehyde.
Note that the predictive models are way off in some cases. For example the Abraham001 model dramatically underestimates the solubilities of boc-glycine in the green section, while the measured Abraham descriptor model does much better for these cases. We will prioritize our next solubility measurements to try to improve the models – or at least understand what types of compounds are most likely to yield useful solubility estimates from these models.
In addition to being called from the Reaction Attempts Explorer, the Solvent Selector can be used for any compounds that have ChemSpider IDs. Simply separate the CSIDs with the pipe character:
After modifying the criteria and hitting update, the new criteria are conveniently represented in the URL in this format, making sharing a specific search with anyone easy:
It is even possible to use the service listing just one compound’s CSID – this is useful for quickly comparing the measured solubilities with predictions from both models:
Posted in Chemistry. Comments Off

Berkeley Open Science Summit 2010 Notes

I just returned from the Open Science Summit held at Berkeley July 29-31, 2010.

There certainly was an impressive list of presenters as well as attendees. Many of the talks were quite good, although several on the last day were more about closed collaborations than Open Science. During these presentations the assumption that patents are required to exploit discoveries in health care was repeated. This was in sharp contrast to the second day’s session on gene patents, where IP protection was shown to stifle innovation and the exploitation of discoveries.
A refreshing exception to this pattern on the last day was Andrew Hessel’s presentation on the Pink Army Cooperative. Andrew’s strategy to cure cancer is based on the idea of customizing drugs for each individual affected by the disease. Since each drug is only applicable to one individual, the approach of expensive clinical trials doesn’t apply. Since he is not interested in generating a profit from selling the drugs, IP protection also doesn’t apply and allows him to make every part of the drug design process, including genetic analysis, publicly available. It wasn’t clear if such an approach would be legal in the US but he did mention going to another country if necessary. Although he didn’t currently have cancer, he did indicate that he might have need of this technology one day by pulling out a pack of cigarettes in the middle of his talk.
Unfortunately my panel on Open Data was canceled at the last minute due to time management problems (see FF discussion on how it happened). However, I did have a chance to generally catch up with old friends (Carmen Drahl, Joanna Scott, Cameron Neylon, Jack Park).
I also discussed some promising collaborations with several people:
1) CoLab. I spoke at length with DJ Sprouse and Casey Stark about their system for scientific collaboration. We will try to represent one solubility experiment from the ONS Challenge notebook and one organic synthesis experiment from the UsefulChem notebook to see how the information can be represented within CoLab. There may be some opportunities to visualize raw data in new ways – perhaps using non-Java tools to interact with JCAMP-DX spectra.
2) IPzero Principles. I continued a conversation with Lisa Green started with John Wilbanks and Thinh Nguyen at Creative Commons about coming up with a series of simple recommendations for ensuring that an Open Notebook can effectively prevent the patenting of inventions within an area of interest to the Open Science community.
3) Open Chemistry Reactions. I had the chance to discuss our Reaction Attempts database with Peter Murray-Rust over breakfast on Saturday. He also showed me how he is using Oscar to extract chemical reaction information from various documents. Peter suggested that we pool together our data for a demonstration in September at the London Science Online Conference. Reaction Attempts will cover the reactions done in the UsefulChem and the Todd group’s Open Notebooks. Peter will extract information from both patents and Acta Crystallographica.
4) ChemTaverna. I was pleased to learn from Carole Goble that Taverna is extending its coverage to cheminformatics applications with the ChemTaverna project. I had just mentioned that we would be interested in revisiting Taverna for creating virtual libraries of organic compounds and filtering them based on predicted solubilities in various solvents. This would allow us to contribute cheminformatics workflows to MyExperiment. Carole put me in touch with the project leader Peter Li at the University of Manchester.
Posted in Chemistry. Comments Off

General Transparent Solubility Prediction using Abraham Descriptors

Making solubility estimations for most organic compounds in a wide range of solvents freely available has always been a main long term objective for the Open Notebook Science Solubility Challenge. With current expertise and technology, it should be as easy to obtain a solubility estimate as it is now to get driving directions off the web.

Obviously this won’t be attained purely by exhaustive measurements, although we have been focused on strategic measurements over the past two years. In parallel, we have been constantly evaluating the various solubility models out there for suitability.
Although there are several solubility models available for non-aqueous solvents, our additional requirement for transparent model building has proved surprisingly difficult to satisfy.
From this search, the Abraham solubility model [Abraham2009] floated to the top, with an important factor being that Abraham has made available extensive compilations of descriptors for solutes and solvents. In addition the algorithms used to convert solubility measurements to Abraham descriptors (a minimum of 5 different solvents per solute) has allowed us to generate our own Abraham descriptors automatically simply by recording new measurements into our SolSum Google Spreadsheet. These can be obtained in real time as well.
This approach permitted us to provide predictions for a limited number of solvents in a wide range of solvents and we have included these predictions in the past two editions (2nd and 3rd) of the ONS Challenge Solubility Book.
Coming at the problem from a different approach, Andrew Lang has also been trying to predict solubility using only open molecular descriptors, mainly relying on the CDK. Since our most commonly used solvent has been methanol, Andy recently generated a web service to predict solubility in that solvent.
By combining these two approaches, Andy has now created a modeling system that can not only generally predict solubility in a wide range (70+) of solvents – but it can also provide related data that can be used for modeling other phenomena such as intestinal absorption of a drug or crossing the blood-brain barrier.[Stovall 2007]
The idea is to use a Random Forest approach using freely available descriptors to predict the Abraham descriptors for any solute. A separate service then generates predicted solubilities for a wide range of solvents based on these Abraham descriptors. I’m using the term “freely available” because – although the CDK descriptors and VCCLab services are open – the model requires 2 descriptors only available from ChemSpider (ultimately from ACD/Labs).
Here is an example with benzoic acid. As long as the common name resolves to a single entry on ChemSpider, it is enough to enter it and it automatically populates the rest of the fields, which are then used by the service to generate the Abraham descriptors.
Hitting the prediction link above will automatically populate the second service and generate predicted solubilities for over 70 solvents.
This approach of allowing people to access these components separately can be useful. It can be instructive to manually play with the Abraham descriptors directly to see how predicted solubilities are affected. There are also situations where one has experimentally determined Abraham descriptors for a solute and bypassing the descriptor prediction step is required.
However, for those who prefer to cut to the chase, a convenient web service is available where the common name (or SMILES) of the solute is entered and the list of available solvents appears as a drop down menu.
Now here is where I think the real payoff comes for accelerating science with openness. Andy has also created a web service that returns the predicted solubility in molar as a number from common names (or SMILES) for solute and solvent via the URL. For example click this for benzoic acid in methanol. The advantage here is that solubility prediction can be easily integrated as a web service call from intuitive interfaces such as a Google Spreadsheet to enable even non-programmers to make use of the data. Notice that the web service provided in the fourth column for the average of measured solubility values enables an easy way to explore the accuracy of specific predictions.
Such web services could also be integrated with data from ChemSpider or custom systems. If those who use these services feed back their processed data to the open web, it could take us a step closer to automated reaction design. For example consider the custom application to select solvents for the Ugi reaction. Model builders could also use the web services for predicted and measured solubility directly.
A while back we explored using Taverna for MyExperiment to create virtual libraries of SMILES. Unfortunately we ran into issues with getting the applications developed on Macs to run on our PCs. This might be worth revisiting as a means of filtering virtual libraries through different thresholds of predicted solubility.
Andy has described his model in detail in a fully transparent way – the model itself, how it was generated and the entire dataset can be found here. We would welcome improvements of the model as well as completely new models based on our dataset using only freely available tools.
It should be noted that when I use term “general” it refers to the ability for the model to generate a number for most compounds listed in ChemSpider. Obviously compounds that most closely resemble the training set are more likely to generate better estimates. Because of our synthetic objectives using the Ugi reaction we have mainly focused on collecting solubility data for carboxylic acids, aldehydes and amides either from new measurements or from the literature.
Another important point concerns the main intended application of the model: organic synthesis. Generally the range of interest for such applications is about 0.01 – 3M. This might be very different for other applications – such as the aqueous solubility of a drug, where distinctions between much lower solubilities may be important.
For a typical organic synthesis, a solubility of 0.001M or 0.005M will probably translate as effectively insoluble. This might be a desired property for a product intended to be isolated by filtration. On the other end of the scale knowing that a solubility is either 4M or 6M will not usually have an impact on reaction design. It is enough to know that a reactant will have good solubility in a particular solvent.
Given the above considerations for intended applications and the likelihood that the current model is far from optimized, the predictions should be used cautiously. We suggest that the model is best used as a “flagging device”. For example, if a reaction is to be carried out at 0.5M, one may place a threshold at 0.4M for the predicted values of reactants during solvent selection, with the recognition that a predicted 0.4M may be an actual 0.55M. A similar threshold approach can be used for the product, where in this case the lowest solubility is desired. A practical example of this is the shortlisting of solvents candidates for the Ugi reaction.
Another example of flagging involves identifying the outliers in the model. These can be inspected for experimental errors and possibly remeasured. Alternatively outliers may shed light on the limitations of the model. For example we have found that the solubility of solutes with melting points near room temperature can be greatly underestimated by the current model. This may be an opportunity to develop other models which incorporate melting point or enthalpy of fusion.[Rohani 2008]
Although it is possible that better models and more data will improve the accuracy of the predictions, this can be true only if the training set is accurate enough. Based on conversations I’ve had with researchers who deal with solubility, reading modeling papers and our own experience with the ONS Challenge I am starting to suspect that much of the available data just isn’t accurate enough for high precision modeling. Models using data from the literature are especially vulnerable I think. Take a look at this unsettling comparison between new measurements and literature values (not to mention the model) for common compounds.[Loftsson 2006] Here is a subset:
I have also made the point in detail for the aqueous solubility of EGCG. Could this be the reason that so many different solubility models using different physical chemistry principles have evolved and continue to co-exist?
The situation reminds me a lot of the discussions taking place in the molecular docking community.[Bissantz 2010] The differences in calculated binding energies are often small in comparison with the uncertainties involved. But docking can still be used as one tool among others to find drug candidates by flagging a collection of compounds above a certain threshold binding energy.
Posted in Chemistry. Comments Off

Resveratrol Thesis on Reaction Attempts

A few days ago Andrew Lang suggested to Dustin Sprouse that he submit his thesis to the Reaction Attempts database. Like many undergraduates Dustin put in a lot of time and effort in doing experiments and writing up his results but didn’t have quite enough time to obtain all that would have been required for a traditional publication.

A thesis is an unusual document within the context of scientific communication. Unlike a peer reviewed paper, it may contain a large number of “failed experiments” and a substantial amount of speculation. Although it is not quite as detailed as lab notebook, there is often plenty of raw data and details about how failed or ambiguous experiments proceeded.
In Dustin’s case we felt that there was enough information provided to include his thesis in Reaction Attempts. In addition, his thesis was accepted by Nature Precedings, thus providing a convenient means of citation.
The first component of the Reaction Attempts project is to quickly abstract the most basic information from synthetic organic chemistry reactions. This includes the ChemSpiderIDs and SMILES from the reactants and target products and brief notes about conditions and outcomes. We are especially interested in failed or ambiguous experiments because these have almost no chance of being communicated and indexed in the traditional systems. When attempting to carry out a reaction, it can be just as useful to know what doesn’t work – and more specifically how it doesn’t work.
The second component of the project is dissemination. Because the information is encoded semantically, it can be automatically converted to both human and machine readable formats.
One human interface consists of a PDF book (also as a hard copy), with the option of selected reactions specified by listing CSIDs of reactants in the URL. For example Dustin’s reactions can be presented selectively here. We also have a Reaction Explorer, where reactants or products can be selected from a dropdown menu or via a substructure search.
We also provide live XML feeds so that others can create applications easily from machine readable data. For example one could create reaction chains automatically, which will occur whenever we enter reactions from multi-step syntheses like Dustin’s – based on the synthesis of resveratrol.
I know that Peter Murray-Rust has been very active in automatically abstracting information from chemistry theses. It would be interesting to see how that approach would work for this thesis, especially with the failed experiments. Reducing a page or two of text into only the most salient bits of information manually required a level of judgement that I imagine would be tricky to do automatically.
Posted in Chemistry. Comments Off

Secrecy in Astronomy and the Open Science Ratchet

Probably because of the visibility of the GalaxyZoo project, I think several of my colleagues and I have been under the impression that astronomy is a somewhat more open field than chemistry or molecular biology. It was easy to rationalize such a position because patents are not an issue, as they clearly are in fields which rely more on invention than discovery. However, after reading “The Case for Pluto” by Alan Boyle, I am left with a much different impression.

This book does an excellent job of covering the recent debate over Pluto’s designation as a true planet. A key trigger for this debate has been the discovery of dwarf planets with sizes very close to that of Pluto. However, these discoveries did not occur without controversy.

The story of the controversy regarding the discovery of Haumea is a particularly good example (starts on p. 108 of the book – a good summary also on Wikipedia). Starting in December 2004 Michael Brown at Caltech discovered a series of new dwarf planets. Instead of immediately reporting his team’s discoveries, he worked in secrecy until July 20, 2005 when he posted an online abstract indicating the discoveries would be announced at a conference that September. However, on July 27, 2005 a Spanish team led by José Luis Ortiz Moreno filed a claim with the Minor Planet Center for priority in discovering one of these dwarf planets. This forced Brown’s hand in disclosing his team’s other discoveries within days – much sooner than he had anticipated.

Apparently this stirred up a great controversy in the community and officially no name was associated with the discovery, although the Spanish team’s telescope at Sierra Nevada Observatory was recognized as the location of the discovery. However, Brown was allowed to select the name Haumea for the dwarf planet.

Even though the Minor Planet Center accepted Moreno’s submission, most reports seem to side with Brown. The main argument is no less than academic fraud on Moreno’s part because he accessed public telescope logs and found some of Brown’s data. It was as simple as Googling the identifier that Brown inserted in his public abstract.
If Moreno had hacked into a private computer from Brown’s team I can understand fraud. But is it fraud to access public databases? We chemists do that all the time – reading abstracts from upcoming conferences to try to glean what our competitors are up to. That hasn’t stopped anyone from submitting a paper or patent.
Secrecy only works if everyone competing follows the same rules. If there is a rule that planet discoveries must be made at conferences or by formal publication then this could not have happened. Moreno’s submission to the Minor Planet Center should have been rejected if such a rule existed. If there is a rule that telescope logs should not be accessed then why make them public and indexed on Google?
Now there may exist field specific conventions. I don’t know what they are in the case of discoveries such as these but here is an interesting quote from Michael Brown’s Wikipedia page:

When asked about this online activity, Ortiz responded with an email to Brown that suggested Brown was at fault for “hiding objects,” and said that “the only reason why we are now exchanging e-mail is because you did not report your object.”[3] Brown says that this statement by Ortiz contradicts the accepted scientific practice of analyzing one’s research until one is satisfied that it is accurate, then submitting it to peer review prior to any public announcement. However, the MPC only needs precise enough orbit determination on the object in order to provide discovery credit, and Ortiz et al. not only provided the orbit, but “precovery” images of the body in 1957 plates.

It seems to me that there is a clash of what are the conventions in the field. Certainly the Minor Planet Center did not recognize the convention of peer review before public disclosure. They only required sufficient proof for the discovery.

One way to look at this story is that Moreno acted more openly than Brown by disclosing information before peer review. This action forced Brown to disclose scientific results much more quickly than he had anticipated.

In a sense this is a type of Open Science Ratchet. The actions of scientists that are most open set the pace for everyone else working on that particular project, regardless of their views on how secretive science should be.

Imagine how the scenario would have played out if one of the groups had used an Open Notebook. On December 28, 2004 everyone with a stake in the search for planets would have had the opportunity to know that a very significant find had been made. There were still details to work out – and the Brown group might not be the first to do all the calculations to completely characterize the discovery. Certainly it would affect what other researchers did – even if they were completely opposed to the concept of Open Science.

Essentially secrecy in this context is an all-or-nothing gamble. Everyone is free to not disclose their work until after peer reviewed publication. In some cases the discoverer will get full credit for the discovery and the complete analysis. But in other cases another group working in parallel will publish first and leave nothing to claim.
As scientists become more open, it is likely that their ability to claim sole priority for all aspects of a discovery will be reduced. However, they will retain priority for the observations and calculations that they made first.
The more open the science, the faster it happens. And because of the Open Science Ratchet, a few Open Scientists scattered across various fields could have a larger hand than expected in speeding up science.
Posted in Chemistry. Comments Off

Methanol Solubility Prediction Model 4 for Ugi reactions in the literature

Since non-aqueous solubility measurements have not become part of the standard characterization of organic compounds, it is not surprising that all the data we have for Ugi products originate from measurements that we made on our own compounds.

Since methanol is our most common solvent, Andrew Lang has collected the measurements that we have with values from the literature for a range of compounds, including our Ugi products, to generate a web service returning a predicted solubility based on a submitted SMILES string. The model (Model 4) was derived from a Random Forest algorithm, using molecular descriptors supplied by the CDK and VCC.

It would be nice to be able to test the model’s ability to predict what will happen if a Ugi reaction is carried out in methanol. Although the actual solubility of Ugi products in the literature is typically not reported, reading the experimental sections in papers can still provide some validation of the model in some cases.

For example, consider the following Ugi products synthesized recently by Lezinska (Tetrahedron 2010)


Note that these images represent the azide group not following the octet rule. It is necessary to represent the structure SMILES without charges because the CDK and VCC web services used by the model do not process charges correctly. Stereochemistry also cannot be used and this can be removed from the SMILES simply by deleting slashes. Thus for the two molecules above the SMILES to be submitted to the prediction web service are:

O=C(NC1CCCCC1)C(Cc2ccc(C)cc2)N(c4ccccc4C(=O)c3ccccc3)C(=O)C(Cc5ccccc5)N=N#N
AND
O=C(NC1CCCCC1)C(C(=O)c2ccccc2)N(Cc3ccc(C)cc3)C(=O)C(C)CCN=N#N

The predicted methanol solubilities are respectively 0.004 M and 0.03 M.

Now if we look at the details in the experimental section, both of these Ugi products were synthesized in methanol at a limiting reactant concentration of about 0.1 M. Even though this is much more dilute than the usual 0.5-2.0 M generally recommended for Ugi reactions (Domling 2000), the products still precipitate and can be filtered off. This is consistent with the predicted solubilities above and the model would have suggested ahead of time that methanol might be a good solvent for isolation of the products by precipitation.

So far these are just anecdotal results but it does illustrate that solubility models can be evaluated without explicit determination of solubility in the literature.

Posted in Chemistry. Comments Off

Reaction Attempts Explorer

Two months ago I reported on the Reaction Attempts project and the availability of the summary as a physical or electronic (PDF) book. The basic idea behind the project is to collect organic chemistry reaction attempts reported in Open Notebooks. This would include not only successful experiments but also those which could be categorized as failed, ambiguous, in progress, etc.

The book was organized with reactants listed alphabetically. In this way one could browse through summaries of the types of reactions being attempted by different researchers on a reactant of interest. There might be information there (what to do or what to avoid) of some use for a planned reaction. At the very least one could contact the researcher to initiate a discussion about work that had not yet been published in the traditional system.

Andrew Lang has just created a web-based tool to explore the Reaction Attempts database in much more sophisticated ways.

Here are some scenarios of how one could use it. On the left hand side of the page is a dropdown menu containing an alphabetically sorted list of all the reactants and products in the database. Lets select furfurylamine.


This immediately informs us that there are 230 reactions involving furfurylamine and it lists the schemes for all these reactions upon scrolling down. That’s still a bit hard to process so a second dropdown menu appears populated with a list of other reactants or products involved with furfurylamine.

We now select boc-glycine and that narrows our search to 145 reactions.

Selecting benzaldehyde from the third dropdown menu narrows the search further to 61 reactions.

The final dropdown menu contains a short list of only isocyanides and thus all represent attempted Ugi reactions. Selecting t-butyl isocyanide gives us 56 reactions.

That means that these same 4 components were reacted together 56 times. Looking at the various reaction summaries will show that some of these are duplicates for reproducibility and others vary concentration and solvent and the effect on yield is included. This particular reaction was in fact the subject of a paper on the optimization of a Ugi reaction using an automated liquid handler.

Now here is where the design of the Explorer comes in handy. We might want to ask if the reaction proceeds as well with the other isocyanides. All we have to do is switch the final dropdown menu to ask what happens when we go from t-butyl to n-butyl isonitrile. There is a single attempt of this reaction and it is “failed” in the sense that no precipitate was obtained from the reaction mixture. This doesn’t mean that the reaction didn’t take place – it might be that the Ugi product was too soluble. We can quickly inspect that the concentration and solvent are in line with conditions that allowed precipitation of the t-butyl derivative.

OK lets see what happens with n-pentyl isocyanide.

It looks like it behaves just like n-butyl isocyanide: another single non-precipitation event. What about benzyl isocyanide?

This time we do get the Ugi product from a single attempt. Note the lower yield compared to the t-butyl isocyanide under similar conditions.

What about with cyclohexyl isocyanide?

This time we hit an experiment in progress. A precipitate was obtained but it was not characterized. We can click on the link to the lab notebook page (EXP232) to learn more about how long it took for the precipitate to appear but there are not enough data to draw a definite conclusion about the successs of the reaction. However, based on the results from the other precipitates in this series it is probably encouraging enough to repeat and characterize the product.

There are other sources of information here. Clicking on the image of the Ugi product takes us to its ChemSpider entry. In this case the only associated data relates to this reaction attempt.

Lets look at another scenario: reactions involving aminoacetaldehyde dimethyl acetal.

In this case we find the intersection of two Open Notebooks. The first reaction comes from Michael Wolfle from the Todd group.

The second comes from Khalid Mirza from the Bradley group.

In order to learn more about the nature of the overlap we can use the substructure search capabilities of the Reaction Explorer. Simply click on the image of the acetal and the ChemSpider entry pops up. Now click on the copy button next to the SMILES for the compound.

Paste the SMILES into the SMARTS box of the Reaction Explorer.

We get 13 reaction attempts for this query – the two we found earlier and the rest corresponding to attempts by Michael Wolfle to synthesize praziquanamine.

We learn that one connection between these two notebooks involves different attempts at synthesizing praziquantel.

Hopefully this demonstrates the value of abstracting organic chemistry reaction attempts from Open Notebooks into a machine readable format. Contributions to the database require only the ChemSpider IDs of the reactants and product and a link to the relevant lab notebook page. Reaction schemes are automatically generated by the system. More on the Reaction Attempts project here.

Posted in Chemistry. Comments Off

IGERT NSF panel on Digital Science

On May 24, 2010 I was part of a panel in Washington for the NSF IGERT annual meeting. As I mentioned previously, it is encouraging to find that funding agencies are paying more attention to the role of new forms of scholarship and dissemination of scientific information.

My co-panelists included Janet Stemwedel, who talked about the role of blogging in an academic career, Moshe Pritzker, who made a case for using video to communicate protocols in life sciences and Chris Impey, who demonstrated applications of clickers and Second Life in the classroom.

We only had 10 minutes each to speak so the presentations were basically highlights of what is possible. Still, it was enough to stimulate a vigorous discussion with the audience. There was a bit of controversy about the examples I used to demonstrate the limitations of peer review in chemistry. People can misinterpret what we are trying to do with ONS – it certainly doesn’t include bringing down the peer review system (not that we could anyway). But we have to face the situation that peer review does not validate all the data and statements in a paper. It operates at a much higher level of abstraction. Providing transparency to the raw data should work in a synergistic way with the existing system.

My favorite part of the conference was easily Seth Shulman’s talk on the “Telephone Gambit“. Ever since reading his book, I have been using the story of how carefully reading Bell’s lab notebook has forced us to revise the generally accepted notion of how the telephone was invented. Seth’s presentation was truly captivating because he explained not only what was done but also what motives were at work to deceive and obfuscate. This cautionary tale is still very much relevant to science and invention today – and highlights how transparency can mitigate against this type of outcome.

Posted in Chemistry. Comments Off

Use of ONS to protect Open Research: the case of the Ugi approach to Praziquantel

As we were collecting reactions from The Synaptic Leap for the Reaction Attempts project, Andrew Lang noticed that there might be a quick synthetic route to praziquantel via a Ugi reaction. I researched it further and found a paper (Kim et al 1998) where Ugi product 1 was indeed converted to racemic praziquantel via the Pictet-Spegler cyclization.


Using Beilstein Crossfire the only synthesis of 1 I found involves a multi-step amidation strategy. But this compound should be accessible in one step from commercially available starting materials via a Ugi reaction (shown above). Since all the starting materials are liquids we have some flexibility with solvent choice. Khalid first tried it in methanol EXP258 a few weeks ago but did not get a precipitate. He was going to monitor it by NMR next to see if the problem was high solubility of the Ugi product or with the reaction itself.

It was therefore with great interest that I read Mat Todd’s report this morning on The Synaptic Leap that a German patent had been issued on this Ugi strategy to praziquantel. (TSL didn’t provide a means of leaving a comment so I edited the page – which made me the author of that post but actually Mat wrote it)

I have often mentioned during my talks that Open Notebook Science could be used not only in a defensive manner to claim academic priority – but also as an offensive tactic to block patent applications. A company attempting to prevent the commercial exploitation of rival inventions has a few options. Where applicable, it can buy up an existing patent pool with the intention of sitting on it. For new inventions, it can do research and try to file patents before their competitors. But this is a costly process and it may make more sense to simply publish the inventions to create disclosed prior art, thereby blocking patent applications of their competitors.

But – as I and many others have discussed – the current publication system is not optimally suited for the purpose of simply disclosing and communicating science. Not only is it generally slow but the traditional article format requires a narrative of some sort – rarely can single experiments be published. This means that much (if not most) of research done by an individual or group will never be disclosed.

For these reasons I think that keeping an easily discoverable Open Notebook for projects designed to block patent submission by competitors makes a lot of sense – both economically and from a workflow perspective. Since researchers already have to keep a lab notebook, making it public doesn’t impose the added time that writing an article or patent will require.

In this specific example of praziquantel we were too late. But if we had recorded this experiment a few years ago it might have worked to block Domling’s patent. Now, it isn’t clear to me that EXP258 would have been enough to do that. The strategy to make praziquantel via a Ugi reaction was clearly stated but the experiment was not conclusive. However, since Domling reported that methanol worked I am sure that we would have had the “reduced to practice” evidence in the notebook shortly.

Above I used a company as an example of a party motivated to disclose inventions to protect their interests. In our case it would not be a company but rather the entire Open Science community. It is in our best interest to keep our scientific territory as unencumbered by patents as possible. Keeping Open Notebooks might be one of the simplest means of ensuring that.

Consider a humanitarian organization that might want to manufacture praziquantel. I haven’t researched it but presumably the Domling patent was filed in a number of countries beside Germany. In order to consider using the Ugi strategy, the organization would now have to deal with the patent holder. This might be the factor that makes this route untenable. Patents have proven to be problematic for humanitarian aid – even in the simple case of providing food.

But all is not lost. In addition to offering a simple 2-step synthesis of praziqantel, the Ugi route offers an easy way to make large libraries of analogs. Optimally we would like to work with someone who has experience with docking praziquantel. It might be interesting to screen not only the praziquantel analogs but also the uncyclized Ugi products themselves. When we did this for malarial enoyl reductase inhibitors (D-EXP005) we found that we did not need to cyclize to obtain compounds predicted to bind. This ultimately led to active compounds.

Posted in Chemistry. Comments Off

This week on Chemistry World…

1 June 2010: Have something to say about an article you’ve read on Chemistry World this week? Leave your comments below…

This week’s stories…

Basic research bill backed in US
US bill that boosts science funding passes on third attempt after Democrats employ unusual procedural tactic

Universities face hard years ahead
Funding cuts to universities across Europe as a result of the economic crisis will impact teaching and research quality for years to come, says report

Structural order gained over conducting polymer
Researchers have used copper as both catalyst and template to gain structural control over an important conducting polymer

Liquid marbles detect gases
Scientists use porous properties of liquid marbles to develop gas sensors

Instant insight: Cosmic dust as chemical factories
Daren Caruana and Katherine Holt discuss how electrochemistry could be the missing link to understanding chemistry in space

Posted in Chemistry. Comments Off

Nobelium

This week’s Chemistry in its element podcast: science writer Brian Clegg on an unfortunate element that is neither use nor ornament, but is gifted with an interesting naming story

 

Posted in Chemistry. Comments Off

Gadolinium – a hero of the periodic table

In this week’s chemistry in its element podcast, Simon Cotton, from Uppingham School in the UK, tells us how gadolinium could save the environment, and even your life!

F

Posted in Chemistry. Comments Off

Chemistry World’s round-up of money and molecules

Big chemistry news this week was the announcement of the first synthetic cell, which could provide a basis for designing organisms from scratch.

Understandably the news has caused some controversy in the media, with sceptics concerned for the future of humanity and even research rivals worried that if the technology is patentable, other research groups will lose out on a piece of the pie.

The research could have enormous commercial value in the future for applications in biofuels and chemical synthesis through chemical biology and should be viewed as another step towards a greater understanding of science.

PHARMACEUTICALS

Cheap cancer drugs say Asda

So from creating synthetic cells to destroying cancerous ones…
In a world first supermarket Asda has announced that it will permanently sell privately prescribed cancer treatment drugs on a ‘not for profit’ basis in the UK, which could save patients thousands of pounds.

With a post code lottery on cancer funding dictating how much money is allocated to the treatment of each cancer patient, and the variation in cancer drugs available on the NHS depending on where you live, sufferers also have to deal with pharmacy mark-ups that can cripple patients’ finances.

Cancer affects nearly 300,000 people every year in the UK and the cost of treatment is too much for many sufferers. According to Asda, some privately prescribed cancer drugs are being sold with a 76 per cent mark-up in some high street stores.

This move will see prices of drugs like Iressa (gefitinib) – licensed to treat lung cancer – fall in Adsa stores to £2167.71 compared with other high street stores such as Superdrug that sell it for £3253.56.

Asda is urging patients to shop around when buying privately prescribed cancer drugs, claiming that 63 per cent of people were unaware that prices vary between pharmacies.

Asda has called for industry to follow its lead and end the high price mark-ups on cancer drugs and is working with suppliers to negotiate further discounts on trade prices of privately prescribed cancer drugs that it can then pass onto the customer.

Aspen to acquire Sigma Pharmaceuticals

African drug giant Aspen Pharmacare Holdings Ltd. has offered to buy leading Australian pharma firm Sigma Pharmaceuticals for A$1.49 billion (£850 million) in order to expand into Australia. The offer works out at A$0.60 per share and net debt of A$785 million.

The proposal is subject to conditions such as regulatory approval, and unanimous recommendation by the Sigma board – the company has confirmed that the approach has been made and is currently considering the offer.

Genzyme pays up

US firm Genzyme, the largest maker of genetic disease medicines, has agreed to pay $175 million (£121.5 million) in unlawful profits from the sale of products made at its Allston, Massachusetts, plant to the US federal government.

During an inspection in 2009, manufacturing quality at the Allston plant was found to be inadequate resulting in production delays, critical shortages of medically necessary products to consumers and drugs contaminated with metal, fibre, rubber and glass particles. These findings violated US Food and Drug Administration (FDA) regulations. Genzyme also suspended manufacturing of some of its products due to viral contamination in one of its bioreactors.

Genzyme has agreed to make improvements to its manufacturing processes at Allston, starting with an independent inspection of the plant that will recommend changes and result in an improvement plan subject to FDA approval. If the approved plan is not met, Genzyme will have to pay a substantial fine. In addition, Genzyme will have to move its vial filling operations to another plant or risk paying further disgorgement fines in the future.

INDUSTRY

Shin-Etsu’s new leadership

Japan’s most profitable chemical company, Shin-Etsu Chemical, has announced a change of leadership. Chihiro Kanagawa, former president, will become chairman, a position that has been vacant for over 15 years, and the former vice president, Shunzo Mori, will become president.

Kanagawa joined the firm in 1962, becoming president in 1990 and steering the company through some bold moves that have resulted in the company’s expansion over the years. Shunzo Mori is 74 and been at the company since 1963. He trained as a mechanical engineer and has worked his way up the company.

Shin-Etsu has extended profits and developed new areas of business, expanding its semiconductor silicon business by building on the strength of products such as silicone resins, synthetic quartz, rare earth magnets, cellulose derivatives and photoresists. PVC output has also increased and record earnings have been reported year on year.

The plans for the future include increasing sales in developing markets such as China and investing in improvements to accommodate environmental challenges.

Borouge and Linde Group get cracking

The Linde Group – a world leading gases and engineering company and Borouge – a leading provider of innovative plastic solutions – have signed a $1.1 billion contract confirming that Linde will build a 1.5 million tonnes per year (t/y) ethane cracker at Borouge’s production site in Ruwais, Abu Dhabi, in the UAE.

This deal comes hot on the heals of the inauguration of the world’s largest ethane cracker operated by Ras Laffan Olefins Co., that took place earlier this month.

The new cracker is the third of its kind to be built by Linde for Borouge in the last decade and will complement the existing crackers at the plant. Once the construction is complete, the Borouge site will be the largest ethane cracking complex in the world.

It signals a milestone in the growth of the company and is hoped to have a great impact on the automotive and advanced packaging markets in the Middle East and Asia.

SMEs pay less for chemicals

Small and medium enterprises (SMEs) in the chemicals sector are set to pay less in administrative charges following a decision by the European Commission. Small firms will pay less in fees to the European Chemicals Agency (ECHA) in connection with Classification, Labelling and Packaging Regulations (CLP) due to a reduction in levies.

The fees apply when a company asks for an alternative name for a substance or requests harmonised classification or labelling for substances.

Microenterprises will have a 90 per cent reduction in fees, small businesses will see a 60 per cent reduction, whilst medium size businesses will see a 30 per cent reduction and all companies that comply with CLP regulations will be able to work in their own language as the ECHA has now translated its guidance documents.

In addition to reduced fees SMEs will also be able to gain assistance with Registration, evaluation, authorisation and restriction of chemicals (Reach) regulations and CLP regulations.

And finally….

It seems that if you are a Chartered Chemical Engineer in the UK and Ireland, you can sit back and smile smugly. Results from the IChemE 2010 UK and Ireland Salary Survey reveal that the median salary for a Chartered Chemical Engineer is now £60,400 per year compared to £57,500 in 2008 even in this economic climate. Indeed a Chartered Chemical Engineer aged 30-39 will typically earn £8500 a year more than a non-chartered chemical engineer.

Is it time for a change in career we ask ourselves….

Mike Brown

Posted in Chemistry. Comments Off

ASMS: Anthrax attacks

Ever since the infamous US anthrax attacks of 2001, where envelopes containing anthrax spores were mailed to a number of media outlets and two US Senators, there has been a push to develop new ways of determining the severity of anthrax infections.

John Barr, of the US Centers for Disease Control and Prevention (CDC), has developed a new, more sensitive way of monitoring the level of infection in a victim. This is keenly important as the symptoms for anthrax infection start off looking much like a cold or the flu, but can then lead to a subject deteriorating rapidly – often leading to death, even after treatment. According to Barr some 40 per cent of the victims of the 2001 anthrax letters died.

The Bacillus anthracis bacterium produces two different t toxins, the oedema factor and the lethal factor. Barr has developed a way of detecting both of these using a liquid chromatography – mass spectrometry (LC-MS) approach that can provide earlier diagnosis than any other technique. This is particularly important as providing antibiotics at an early stage in the infection can increase the odds of survival.

His method, which uses an antibody purification step to extract the toxins, can detect the toxins at concentrations as low as 25pg/ml in about two hours. If the antibody extraction step is left for around 16 hours, that detection limit can fall as low as 5 pg/ml.

The progression of the infection tends to go through a brief remission, and the changes in lethal factor levels correlate with the clinical symptoms – and during remission other methods that rely on detecting the bacteria themselves often fail during this stage.

Barr believes his results should enable clinicians to predict the clinical outcome of an infection, which could prove immensely important as there have recently been a number of anthrax poisoning cases in Scotland, after heroin addicts injected themselves with anthrax-contaminated spores.

Matt Wilkinson

Posted in Chemistry. Comments Off

Smoking could be good for you – if you get the message

Fancy a smoke? No, it’s my last one and I need to get an urgent message to HQ…

Sadly, this line is yet to appear in a spy film, but thanks to George Whitesides and his group at Harvard University, US, it might one day. The group has had another stab at ‘infochemistry’ – using chemical means to convey a message or information without the need for an electrical power supply.

Avid readers of this blog will remember that in June of last year the group first mooted the idea of using ‘infofuses’ soaked in alkali metal solutions to transmit coloured light messages as they burned, and then the follow-up using a microfluidic device with a series of droplets passing by windows in the device to let light through – using intensity, colour and polarisation to encode more information than standard on-off digital signals.

This time, the team have developed their ‘infofuse’ idea. One of the major drawbacks of the original system was the fact that the fuses tended to go out if they were in contact with a surface, and also burned really fast – to keep a message like an SOS call or suchlike repeating for 24 hours would need 2.5km of fuse.

The answers sound simple and almost obvious – use a slower burning fuse and keep most of it lifted off the surface. But it’s never quite as easy as all that. Keeping the fuses off the surface was quite simple – crimping them into a tent-like shape held enough of the nitrocellulose far enough away from whatever surface the fuse was resting on to stop it sinking all the heat and putting out the flame.

But the timing problem required a more considered approach – simply using a slow burning fuse was no good – it would take hours to transmit the message, and most slow burning materials don’t burn hot enough to stimulate thermal emission of the alkali metal ions. What was needed was a combination – a slow burning ‘master’ fuse, with a series of fast ‘slave’ fuses sticking out of it. As the master fuse smoulders up to each slave fuse, it ignites and rapidly transmits its message.

This gives a compact system that can repeat a single, fast message over a long period, or transmit several different messages one after the other. The slow fuse is made from cotton soaked in sodium nitrate – similar to the ‘slow match’ used to ignite gunpowder charges in early matchlock firearms. However, the team showed that one could equally use a cigarette as the slow match – much less conspicuous if you’re an undercover agent…

Phillip Broadwith

Reference: C Kim, S W Thomas III and G M Whitesides, Angew. Chem. Int. Ed., 2010, DOI:
10.1002/anie.201001582

Posted in Chemistry. Comments Off

ASMS: Forget Vioxx, eat chocolate?

After sitting through a number of incredibly technical presentations today at ASMS I came across a fantastic poster presented by Shunyan Mo of the University of Illinois College of Pharmacy, US. Using an ultrafiltration LC-MS (liquid chromatography – mass spectrometry) assay, Mo and co-workers have shown that certain flavinoids found in cocoa selectively inhibit the cyclooxygenase-2 (Cox-2) enzyme and therefore could have anti-inflammatory effects.

As discussed in this Chemistry World article, Cox inhibitors such as naproxen play a vital role in the treatment of pain and inflammation, but they do have some side effects. To reduce these side effects, a number of pharmaceutical companies developed selective Cox-2 inhibitors, but unfortunately many of these were linked to an increased risk of blood clotting, heart attack and stroke. In 2004, those risks caused a huge embarrassment for Merck & Co., after it was forced to withdraw its blockbuster Cox-2 inhibitor Vioxx (rofecoxib) costing the company in the region of $4.75 billion (£3.3 billion) in legal settlements on top of the billions of dollars of lost sales.

But now, Mo has shown that eating chocolate might help reduce inflammation, and has identified using MS-MS experiments that two oxidation products of the abundant cocoa fatty acid, linoleic acid,  9-hydroxy-10,12-octadecadienoic acid (9-HODE) and 13-hydroxy-9,11-octadecadienoic acid (13-HODE) strongly and selectively inhibit Cox-2.

Perhaps unsurprisingly, the research was funded by US-confectionery company Hershey and the US National Institutes of Health (NIH).

So next time you need to reach for an anti-inflammatory, it might be worth reaching for a bar of chocolate instead – just don’t blame me if you put on a few pounds!

Matt Wilkinson

Posted in Chemistry. Comments Off

Setac Europe 2010: ‘It’ll all come out in the wash’

Anyone else recognise this saying? My parents used it a lot while I was growing up when I’d taken a course of action that, while not ideal, wasn’t going to cause any lasting damage.

In the case of silver nanoparticles in textiles, however, it seems it probably will come out in the wash – disappear down our domestic waste pipes and into our environment, with no guarantee that lasting damage won’t be done.

It has been predicted that 12-49 per cent of the silver nanoparticles produced globally end up in textiles, as antimicrobials in socks for example. And in a first step towards figuring out whether this practice poses an environmental risk, Bernd Nowack and his team at EMPA in Switzerland have assessed whether or not these particles remain embedded in the textiles when they are washed in a washing machine.

Their key finding was that different textiles behave very differently, some release 20 per cent of their silver particles in the first wash after purchase where as others release hardly anything. The conclusion the team has drawn from this is that how the manufacturers have embedded the particles is very important. ‘Companies have possibilities to design safe nanotextiles that release only small amounts of silver,’ said Nowack.

Other, more predictable, findings include that less particles are released the second time the item of clothing is washed and that the mechanical stress of the washing machine aids their release.

As well as trying to get textile companies to change their ways, the team also plan to consider both the environmental fate and toxicology of the released particles.

Until they do, maybe I should be thinking about more than my nose before buying these sweet-smelling socks next time.

To learn more: the work was published in the journal Environmental Science and Technology in September last year, and was well covered by the press at the time (see here, here and here).

Nina Notman

Posted in Chemistry. Comments Off

OpenSciNY Open Notebook Science Talk

On May 14, 2010 I presented on Open Notebook Science at the OpenSciNY conference at the New York University Bobst Library. I introduced the topic by telling a few stories about how new forms of communication are affecting how we think about concepts like “scientific precedent”, “peer review”, “scientific publishing” and “scientific scholarship”. At the end I spoke about archiving Open Notebook Science projects and showed the physical copies of both the Reaction Attempts and ONS Solubility Challenge books.

Margaret Smith did a wonderful job of organizing the conference with a very interesting line-up of speakers: Heather Joseph, Antony Williams, Elizabeth Brown and David Hogg. We formed break-out sessions at the end to discuss with the attendees concepts around Open Science. I was part of the session on Promoting Open Science.

The tone at this and other similar conferences I have attended recently is probably best described as cautiously optimistic and focused on what is possible. The Open Science movement – at least as far as it is reflected by the people I know – does not seem to be driven by zealots or idealists trying to get everyone to drink the cool-aid. It is just a bunch of people who see opportunities to do things in better ways as new tools become available – and they can’t find a credible reason not to do them.

Check here on FriendFeed for updates about links to recordings, slides, etc.

My presentation below:

Posted in Chemistry. Comments Off

The Scientist Article on Electronic Lab Notebooks

Amber Dance has written an article in The Scientist (2010-05-01) Digital Upgrade: How to choose your lab’s next electronic lab notebook. This is basically a quick overview of different Electronic Lab Notebooks (ELNs) that should be helpful for people researching what is currently available in that space.

There was some coverage of Open Notebook Science and Steve Koch and I were quoted. Ironically my contribution appeared in the “Cons” section :)

Pros

  • The format is unconstrained—you can set up any categories, and as many users and pages, as you want—and fast to set up.
  • Open notebooking attracts collaborators. Koch counts three collaborations that wouldn’t have happened if he weren’t on OpenWetWare. And his students build professional networks well before they author a paper.

Cons

  • Wikis were not designed with scientific data in mind. For example, it’s hard to make a table, Koch says.
  • Open notebook science “does limit where you can send your work,” says Jean-Claude Bradley, a chemist at Drexel University in Philadelphia, who also uses an open wiki notebook. His lab sticks to journals that accept preprints.
  • Posting online voids international patent rights, although US patents are still possible.

In my opinion, one of the biggest “Pros” wasn’t listed in that section: the free cost. (That was mentioned elsewhere though) When you see the costs of some of these other commercial systems, that has to be a factor for many people trying to make a decision.

If privacy is an issue wikis can certainly be made private, although I’m not sure if that is possible on OpenWetWare. It can be done for $5/month on Wikispaces, the wiki we use for lab notebooks – although then it wouldn’t be Open Notebook Science.

Concerning Steve’s Con of wikis being difficult to use to store data, that is true. However combining the use of a wiki with Google Spreadsheets has completely resolved that issue for us. With our ability to automatically export an archive of the notebook (as HTML) and spreadsheets (as XLS) into an integrated archive, the two platforms operate essentially as if they were a single system.

Posted in Chemistry. Comments Off

Visualizing Chemistry in Second Life ACS Talk Recording

The American Chemical Society has processed the recording of the talk that Andrew Lang and I gave at the Spring 2010 ACS meeting in San Francisco on March 23, 2010:
Visualizing Chemistry in Second Life

Posted in Chemistry. Comments Off

ChemSpider SyntheticPages

I recently mentioned the Reaction Attempts project, which aims to collect organic chemistry experiments – especially those that are “failed”, in progress or somehow incomplete.

For reactions where the desired product has been obtained and fully characterized, ChemSpider SyntheticPages also offers a very convenient publication vehicle. As I mentioned previously there is a need for enabling the publication of single experiments, especially when these are unlikely to become part of a traditional article.

We are in the process of submitting suitable reactions from the UsefulChem project to CS|SP. This will require some re-formatting of procedures and characterization data as they currently appear in the lab notebook.

Here is an example of one of our Ugi reactions: SyntheticPage 406 (UCEXP176C)


A nice feature of these pages is the automatic rendering of 2D structures upon hovering on top of chemical names.


Here are a few more reasons to use ChemSpider SyntheticPages:

* ChemSpider SyntheticPages takes you directly to a procedure. When you get a hit – you get a procedure.
* ChemSpider SyntheticPages provides information that may not generally be found elsewhere, such as frequently encountered problems, trouble-shooting tips, the number of times the reaction has been carried out, scale-variation etc.
* ChemSpider SyntheticPages is the only interactive chemistry database. Information is constantly updated and validated by comments from the user community (Peer Review in the Public Domain™).
* ChemSpider SyntheticPages can provide you with the most up-to-date method, we aim for 95% of submissions to be processed within 48 hours of submission.
* ChemSpider SyntheticPages is free of charge.

[Disclaimer: I am a member of the editorial group at CS|SP]

Posted in Chemistry. Comments Off

The Synaptic Leap Experiments on Reaction Attempts

Andrew Lang and I recently reported on the first edition of the Reaction Attempts book and database. Part of the motivation for this was to structure the experiments from the UsefulChem project in both a machine readable format and one that could be browsed as a physical copy. However, we also had in mind the easy integration of other open experiments, especially those labeled as “failed”, since these are unlikely to be found by searching conventional reaction archives.

As a demonstration, we have added a series of experiments from The Synaptic Leap, which Michael Wolfle (working as a post-doc with Mat Todd) has posted. All of these reactions involve intermediates in the synthesis of praziquantel, which is a major focus of the Todd group. One group of these reactions involved the attempted synthesis of praziquanamine via a Pictet-Spengler cyclization. Most of these are failed attempts and one successful one.

Adding these experiments to Reaction Attempts was very simple – since the minimum information required is the ChemSpiderIDs (CSIDs) of all the reactants and the product, which a hyperlink to more details. We also added a few more details provided by Michael – such as the solvent, reaction conditions and outcome.

Andy has provided a simple mechanism to pull up all Reaction Attempts for a given reactant with the following url structure:

http://showme.physics.drexel.edu/onsc/databook/ucdatabook.php?reactants=9099925

The number at the end is the CSID for the reactant. Multiple reactants can be pulled from the database by adding more CSIDs separated by commas.

Successful runs in Reaction Attempts are identified with a green check mark:


Again the main idea here is not to exhaustively abstract all pertinent information for an experiment. Rather it is to connect up researchers who are working on similar reactions. Since it requires so little effort to come up with the minimum required information we are hoping to get contributions from other sources.

We will focus next on coming up with more sophisticated ways to retrieve information – such as substructure searching or by reaction type, solvent, etc. We will also periodically publish hard copies of future Reaction Attempts editions.

Posted in Chemistry. Comments Off

NMR integration web service expanded

The ONS Challenge has extensively used a web service created by Andrew Lang to automatically calculate solubility from NMR spectra. One of the constraints of the service was that the JCAMP-DX file had to be deposited in a special folder on a server at Drexel.

Andy has now modified the script so that the JCAMP-DX file can be located anywhere on the internet. I have prepared a modified Google Spreadsheet to serve as a template for SAMS calculations (Semi-Automated Measurement of Solubility). Simply enter the url to the JCAMP-DX file in the appropriate column and fill in the ppm ranges and corresponding hydrogen numbers for the solvent and solute, and molecular weight and density data. (The predicted density of solids can be found on Chemspider). The concentration of the solute will then be automatically calculated based on an assumption of volume additivity.

The web service (which handles baseline correction) could be used for any other purpose involving the integration of spectra. Just make a copy of the Google Spreadsheet and modify.

Note that the JCAMP-DX files must be in XY format. If your instrument saves spectra in a compressed format they must be converted to XY. The desktop version of Robert Lancashire’s JSpecView can be used to carry out the conversion.

This template spreadsheet also features a service in a cell to display the NMR spectrum by simply clicking on the link inside the cell. This is very handy because it obviates the need to create an HTML file which must normally accompany the JCAMP-DX file for viewing. Being able to quickly view a spectrum from a particular row within the Google Spreadsheet makes tracking data provenance very intuitive and errors easy to spot.

Posted in Chemistry. Comments Off

Reaction Attempts Book Edition 1 and UsefulChem Archive

I am pleased to report that Andrew Lang and I have published the first edition of the Reaction Attempts book. It currently contains most of the Ugi reactions from the UsefulChem project and is associated with an April 27, 2010 snapshot archive of the entire UsefulChem project, including NMR spectra, spreadsheets, images and the entire lab notebook from Wikispaces.


At 582 pages the printing cost from LuLu amounts to $26.28. Not meant to replace electronic searches, it should prove to be a handy reference book for the lab to quickly browse through what was attempted for a given reactant, what the outcome was and the researcher involved.

We are hoping to include reaction attempts from other groups in future editions. More details can be found in the preface, reproduced below:

Reaction Attempts First Edition

Data Source: the UsefulChem project

Introduction

Open Notebook Science (ONS) refers to the practice of making the full contents of a laboratory notebook and all associated raw data files available in near real time.[1] This represents an opportunity for everyone to benefit from work in progress in an open research group. However, in order to make use of the information, it must be easily discoverable. A simple strategy to increase discoverability is redundancy over multiple communication platforms.

In another project – the Open Notebook Science Solubility Challenge[2] – we published non-aqueous solubility data in the form of physical and downloadable (PDF) books.[3] Although it is possible to search the solubility database using web query interfaces, exploration of a Google Spreadsheet, an XML feed, etc.[4], having a physical copy in the laboratory has proved to be very convenient in several instances. A similar format for reactions will also be useful.

The UsefulChem Project

UsefulChem started in 2005 as an organic chemistry Open Notebook Science project with a main goal of discovering new anti-malarial agents that can be prepared by simple and cheap syntheses.[5] Most of the reactions on UsefuChem are Ugi reactions, which involve the mixing of an amine, aldehyde, carboxylic acid and isonitrile in a solvent at room temperature generally for a few hours to days.[6] The multicomponent design of the Ugi reaction and the simple reaction conditions make it ideal for exploring large virtual libraries and selecting compounds of interest to make.[7]

Isolation of the Ugi products can be immensely simpler, cheaper and readily scalable if they precipitate in pure form from the reaction mixture. To this end, much of the research in the UsefulChem project focuses on reaction conditions that lead to this outcome.[8] This is in fact the origin of the ONS Solubility Challenge discussed above.[9]

The Reaction Attempts Database

In order to look for patterns in the reaction conditions which led to Ugi product precipitation, the CombiUgiResults Google Spreadsheet was set up.[10] Reactions indexed there can be sorted by precipitation outcome, solvent, reactant, concentration, etc. and links to the laboratory notebook pages can be followed for full details. However, this sheet is designed specifically for Ugi reactions and contains columns specifically for the aldehyde, amine, carboxylic acid and isonitrile.

In order to enable the tracking of other types of reactions, the information in the CombiUgiResults sheet was reformatted into two other sheets: ReactionAttempts[11] (containing reagents and reactants) and RXIDsReactionAttempts[12] (containing reaction conditions and results, such as solvent, concentration of limiting reactant, appearance of a precipitate, yield, etc.). The two sheets are connected via the use of a common ReactionID. This format permits the representation of any type of reaction, with an unlimited number of reactants and products.[13]

By definition, any Open Notebook Science project in a work in progress. The listing of a reaction in this database only means that the researcher attempted or is in the process of attempting it. Whatever the situation, a link to the laboratory notebook page is provided, where the most recent information is available. The philosophy used here is that partial information is always better than no information at all. Thus a researcher investigating the prior use a particular reactant in a Ugi reaction might find the report that a precipitate was obtained in methanol helpful for designing their own reactions, even if the characterization of the precipitate is still pending. At the very least, knowing that a certain researcher has at least attempted a similar reaction is enough information for initiating a discussion, which may lead to valuable insights.

Reaction Attempts on Chemspider

Although SMILES[14] are provided in the spreadsheets, the primary key to identify compounds is the ChemSpider ID (CSID)[15]. This allows us to render molecule images in the book automatically. In the case of the ONS Solubility Challenge book[3], use of the CSID enables a convenient way to calculate various descriptors for displaying values in the book.

In addition, the compounds in the Reaction Attempts database are indexed on ChemSpider as two Data Sources: ReactantsAttemptedReactions and ProductsAttemptedReactions[13]. In this way a substructure search for either reactants or products will identify indexed molecules. Clicking on the Syntheses tab in the ChemSpider record for a selected molecule will then reveal a list of hyperlinks to the relevant laboratory notebook pages.

Organization of the Book

In keeping with the layout of the ONS Solubility Challenge Book, the reactants are listed in alphabetical order. Each entry displays the list of reactions where the reactant was used. This includes a scheme with all reactants and product as well as key metadata: the researcher, reaction type, solvent, limiting reactant concentration, observation of a precipitate, comments and a reference (links to the laboratory notebook page).

In this edition, only Ugi reactions are included. The reaction schemes are laid out in the following order: carboxylic acid, amine, aldehyde and isonitrile. This should allow for easy comparison between schemes within a given record. Reactions where the Ugi product was isolated and characterized are marked with a green check and the percent yield is noted. Since the Ugi products do not have simple common names, they are not included as separate entries. However, all reactions where the synthesis of a specific Ugi product was attempted can be found by looking up the entries for any of the four reactants.

Although this compilation is not exhaustive, it does cover the vast majority of reactions in the UsefulChem project to date. Future editions will include other reactions from UsefulChem and other sources.

Archive

This edition is linked to the UsefulChem data archive (ZIP)[16], (DVD)[17] and interactive hosted archive format[18], ReactionAttempts (XLS)[19] and RXIDsReactionAttempts(XLS)[20] taken on 2010-04-27.

References

1. Open Notebook Science Wikipedia Entry http://en.wikipedia.org/wiki/Open_Notebook_Science
2. Open Notebook Science Solubility Challenge Wiki http://onschallenge.wikispaces.com
3. Bradley, J.-C. First Edition of ONS Solubility Challenge Book UsefulChem Blog (2009)
http://usefulchem.blogspot.com/2009/12/first-edition-of-ons-solubility.html
4. Open Notebook Science Solubility Challenge List of Experiments page http://onschallenge.wikispaces.com/list+of+experiments
5. UsefulChem Wiki http://usefulchem.wikispaces.com
6. Ugi Reaction Wikipedia Entry http://en.wikipedia.org/wiki/Ugi_reaction
7. Dömling, A., & Ugi, I. (2000). Multicomponent Reactions with Isocyanides. Angewandte Chemie International English Edition, 39(18), 3168-3210. http://www3.interscience.wiley.com/journal/73500473/abstract.
8. UsefulChem List of Experiments http://usefulchem.wikispaces.com/All+Reactions
9. Bradley, J.-C. Open Notebook Science Challenge UsefulChem Blog (2008)
http://usefulchem.blogspot.com/2008/09/open-notebook-science-challenge.html
10. CombiUgiResults Google Spreadsheet http://spreadsheets.google.com/ccc?key=plwwufp30hfpUERhse9y5Kw
11. ReactionAttempts Google Spreadsheet
http://spreadsheets.google.com/ccc?key=0Ak1R8T6wt4YQdG9NejNLcDNUMkVBVURGM01TR0NxdXc
12. RXIDsReactionAttempts Google Spreadsheet
http://spreadsheets.google.com/ccc?key=0Ak1R8T6wt4YQdGVENVFMWjdzaGd2REJTTnA4RG5vblE
13. Bradley, J.-C. Reaction Attempts on ChemSpider UsefulChem Blog (2010)
http://usefulchem.blogspot.com/2010/03/reaction-attempts-on-chemspider.html
14. SMILES Wikipedia Entry http://en.wikipedia.org/wiki/Simplified_molecular_input_line_entry_specification
15. ChemSpider Web Site http://www.chemspider.com/
16. UC archive Drexel server (ZIP) http://showme.physics.drexel.edu/usefulchem/archives/usefulchem2010-04-27.zip
17. UC archive on lulu.com (DVD) http://www.lulu.com/product/dvd/usefulchem-archive/10791847
18. UC interactive hosted format http://showme.physics.drexel.edu/usefulchem/archives/usefulchem2010-04-27/All%20Reactions.html
19. Bradley, J.-C.; Lang, A.. Reaction Attempts Reactants and Products. UsefulChem. 2010-04-27.

(Archived by WebCite® at http://www.webcitation.org/5pIsFEbT9)
20. Bradley, J.-C.; Lang, A.. Reaction Attempts RXIDs. UsefulChem. 2010-04-27.
(Archived by WebCite® at http://www.webcitation.org/5pIs2eh62)
Posted in Chemistry. Comments Off