Search Immortality Topics:

Page 4«..3456..1020..»


Category Archives: Machine Learning

AI meets green: The future of environmental protection with ChatGPT – EurekAlert

image:

Graphical abstract.

Credit: Eco-Environment & Health

A recent study introduce a novel paradigm combining ChatGPT with machine learning (ML) to significantly ease the application of ML in environmental science. This approach promises to bridge knowledge gaps and democratize the use of complex ML models for environmental sustainability.

The rapid growth of environmental data presents a significant challenge in analyzing complex pollution networks. While ML has been a pivotal tool, its widespread adoption has been hindered by a steep learning curve and a significant knowledge gap among environmental scientists.

A new study(doi: https://doi.org/10.1016/j.eehl.2024.01.006), published in Eco-Environment & Health on February 3, 2024, has developed a groundbreaking approach that merges ChatGPT with machine learning to streamline its use in environmental science..

This research introduces a user-friendly framework, aptly named "ChatGPT + ML + Environment," designed to democratize the application of machine learning in environmental studies. By simplifying the complex processes of data handling, model selection, and algorithm training, this paradigm empowers environmental scientists, regardless of their computational expertise, to leverage machine learning's full potential. The method involves using ChatGPT's intuitive conversational interface to guide users through the intricate steps of machine learning, from initial data analysis to the interpretation of results.

Highlights A new paradigm of ChatGPT + Machine learning (ML) + Environment is presented. The novelty and knowledge gaps of ML for decoupling the complexity of environmental big data are discussed. The new paradigm guided by GPT reduces the threshold of using Machine Learning in environmental research. The importance of secondary training for using ChatGPT + ML + Environment in the future is highlighted.

Lead researcher Haoyuan An states, "This new paradigm not only simplifies the application of ML in our field but also opens up untapped potential for environmental research, making it accessible to a broader range of scientists without the need for deep technical knowledge."

The integration of ChatGPT with ML can dramatically lower the barriers to employing advanced data analysis in environmental science, allowing for more efficient pollution monitoring, policy-making, and sustainability research. It marks a significant step toward more informed environmental decision-making and the potential for groundbreaking discoveries in the field.

###

References

DOI

10.1016/j.eehl.2024.01.006

Original Source URL

https://doi.org/10.1016/j.eehl.2024.01.006

Funding information

This work was financially supported by the National Key R&D Program of China (No. 2023YFF0614200), National Natural Science Foundation of China (Nos. 22222610, 22376202, 22193051), and the Chinese Academy of Sciences (Nos. ZDBS-LY-DQC030, YSBR-086). D. L. acknowledges the support from the Youth Innovation Promotion Association of CAS.

About Eco-Environment & Health

Eco-Environment & Health (EEH) is an international and multidisciplinary peer-reviewed journal designed for publications on the frontiers of the ecology, environment and health as well as their related disciplines. EEH focuses on the concept of "One Health" to promote green and sustainable development, dealing with the interactions among ecology, environment and health, and the underlying mechanisms and interventions. Our mission is to be one of the most important flagship journals in the field of environmental health.

Eco-Environment & Health

Not applicable

A new ChatGPT-empowered, easy-to-use machine learning paradigm for environmental science

3-Feb-2024

The authors declare that they have no competing interests

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.

Link:
AI meets green: The future of environmental protection with ChatGPT - EurekAlert

Posted in Machine Learning | Comments Off on AI meets green: The future of environmental protection with ChatGPT – EurekAlert

Untangling Truths and Myths of Machine Learning – TechiExpert.com

In the tech world, people often use words like AI and Machine Learning like they mean the same thing, but they dont. This mix-up causes problems and especially when businesses try to use Machine Learning in their operations.

Sure, AI sounds impressive. It conjures images of futuristic robots and advanced intelligence. But here is the truth: most of what we call AI today is not really that intelligent. It is mostly about doing math and guessing what might happen, rather than thinking like a person.

The problem arises when businesses buy into the hype without understanding what they are getting into. They hear AI and think it is a magic bullet that will solve all their problems. But the reality is far from it. Many Machine Learning projects never make it past the modeling phase, let alone into actual deployment where the value lies.

Take self-driving cars, for example. A few years ago, they were touted as the future of transportation, but now? They are more like this decades jetpackcool in theory, but far from reality. Why? Because we did not realize how hard it would be to put these things into action.

And it is not just self-driving cars. Across industries, businesses struggle to deploy Machine Learning models because they lack the proper infrastructure or simply dont understand the value.

But here is the deal: things can be different. If businesses plan well and know what Machine Learning can and cant do, they can use these models successfully and get good results. It is not simple, but it can happen.

So let us stop making AI sound fancy and start thinking about what is important: using Machine Learning to actually help businesses in a practical way.

Go here to read the rest:
Untangling Truths and Myths of Machine Learning - TechiExpert.com

Posted in Machine Learning | Comments Off on Untangling Truths and Myths of Machine Learning – TechiExpert.com

Putting AI into the hands of people with problems to solve – MIT News

As Media Lab students in 2010, Karthik Dinakar SM 12, PhD 17 and Birago Jones SM 12 teamed up for a class project to build a tool that would help content moderation teams at companies like Twitter (now X) and YouTube. The project generated a huge amount of excitement, and the researchers were invited to give a demonstration at a cyberbullying summit at the White House they just had to get the thing working.

The day before the White House event, Dinakar spent hours trying to put together a working demo that could identify concerning posts on Twitter. Around 11 p.m., he called Jones to say he was giving up.

Then Jones decided to look at the data. It turned out Dinakars model was flagging the right types of posts, but the posters were using teenage slang terms and other indirect language that Dinakar didnt pick up on. The problem wasnt the model; it was the disconnect between Dinakar and the teens he was trying to help.

We realized then, right before we got to the White House, that the people building these models should not be folks who are just machine-learning engineers, Dinakar says. They should be people who best understand their data.

The insight led the researchers to develop point-and-click tools that allow nonexperts to build machine-learning models. Those tools became the basis for Pienso, which today is helping people build large language models for detecting misinformation, human trafficking, weapons sales, and more, without writing any code.

These kinds of applications are important to us because our roots are in cyberbullying and understanding how to use AI for things that really help humanity, says Jones.

As for the early version of the system shown at the White House, the founders ended up collaborating with students at nearby schools in Cambridge, Massachusetts, to let them train the models.

The models those kids trained were so much better and nuanced than anything I couldve ever come up with, Dinakar says. Birago and I had this big Aha! moment where we realized empowering domain experts which is different from democratizing AI was the best path forward.

A project with purpose

Jones and Dinakar met as graduate students in the Software Agents research group of the MIT Media Lab. Their work on what became Pienso started in Course 6.864 (Natural Language Processing) and continued until they earned their masters degrees in 2012.

It turned out 2010 wasnt the last time the founders were invited to the White House to demo their project. The work generated a lot of enthusiasm, but the founders worked on Pienso part time until 2016, when Dinakar finished his PhD at MIT and deep learning began to explode in popularity.

Were still connected to many people around campus, Dinakar says. The exposure we had at MIT, the melding of human and computer interfaces, widened our understanding. Our philosophy at Pienso couldnt be possible without the vibrancy of MITs campus.

The founders also credit MITs Industrial Liaison Program (ILP) and Startup Accelerator (STEX) for connecting them to early partners.

One early partner was SkyUK. The companys customer success team used Pienso to build models to understand their customers most common problems. Today those models are helping to process half a million customer calls a day, and the founders say they have saved the company over 7 million pounds to date by shortening the length of calls into the companys call center.

The difference between democratizing AI and empowering people with AI comes down to who understands the data best you or a doctor or a journalist or someone who works with customers every day? Jones says. Those are the people who should be creating the models. Thats how you get insights out of your data.

In 2020, just as Covid-19 outbreaks began in the U.S., government officials contacted the founders to use their tool to better understand the emerging disease. Pienso helped experts in virology and infectious disease set up machine-learning models to mine thousands of research articles about coronaviruses. Dinakar says they later learned the work helped the government identify and strengthen critical supply chains for drugs, including the popular antiviral remdesivir.

Those compounds were surfaced by a team that did not know deep learning but was able to use our platform, Dinakar says.

Building a better AI future

Because Pienso can run on internal servers and cloud infrastructure, the founders say it offers an alternative for businesses being forced to donate their data by using services offered by other AI companies.

The Pienso interface is a series of web apps stitched together, Dinakar explains. You can think of it like an Adobe Photoshop for large language models, but in the web. You can point and import data without writing a line of code. You can refine the data, prepare it for deep learning, analyze it, give it structure if its not labeled or annotated, and you can walk away with fine-tuned, large language model in a matter of 25 minutes.

Earlier this year, Pienso announced a partnership with GraphCore, which provides a faster, more efficient computing platform for machine learning. The founders say the partnership will further lower barriers to leveraging AI by dramatically reducing latency.

If youre building an interactive AI platform, users arent going to have a cup of coffee every time they click a button, Dinakar says. It needs to be fast and responsive.

The founders believe their solution is enabling a future where more effective AI models are developed for specific use cases by the people who are most familiar with the problems they are trying to solve.

No one model can do everything, Dinakar says. Everyones application is different, their needs are different, their data is different. Its highly unlikely that one model will do everything for you. Its about bringing a garden of models together and allowing them to collaborate with each other and orchestrating them in a way that makes sense and the people doing that orchestration should be the people who understand the data best.

See the article here:
Putting AI into the hands of people with problems to solve - MIT News

Posted in Machine Learning | Comments Off on Putting AI into the hands of people with problems to solve – MIT News

AI shows promise but remains limited for heart/stroke care – Idaho Business Review

Artificial intelligence has the potential to change many aspects of cardiovascular care, but not right away, a new report says.

Existing AI and machine-learning digital tools are promising, according to the scientific statement from the American Heart Association. Such tools already have shown they can help screen patients and guide researchers in developing new treatments. The report was published Wednesday in the journal Circulation.

But, the authors said, research hasnt shown that AI-based tools improve care enough to justify their widespread use.

There is an urgent need to develop programs that will accelerate the education of the science behind AI/machine learning tools, thus accelerating the adoption and creation of manageable, cost-effective, automated processes, Dr. Antonis Armoundas, who led the statement writing committee, said in a news release. He is a principal investigator at the Cardiovascular Research Center at Bostons Massachusetts General Hospital.

We need more AI/machine learning-based precision medicine tools to help address core unmet needs in medicine that can subsequently be tested in robust clinical trials, said Armoundas, who also is an associate professor of medicine at Harvard Medical School.

The report is the AHAs first scientific statement on artificial intelligence. It looks at the state of research into AI and machine learning in cardiovascular medicine and suggests what may be needed for safe, effective widescale use.

Here, we present the state of the art, including the latest science regarding specific AI uses from imaging and wearables to electrocardiography and genetics, Armoundas said.

AI can analyze data and make predictions, typically for narrowly defined tasks. Machine learning uses mathematical models and algorithms to detect patterns in large datasets that may not be evident to human observers alone. Deep learning, a subfield of machine learning, is used in image recognition and interpretation.

Researchers have used such technologies to analyze electronic health records to compare the effectiveness of tests and treatments, and, more recently, to create models that inform care decisions.

The report notes several ways digital tools might help cardiovascular patients.

Imaging, for example, is important for diagnosing heart attacks and strokes. AI and machine-learning tools could address inconsistencies in human interpretation and relieve overburdened experts.

AI has helped automate analysis of electrocardiograms, which measure the hearts electrical activity, by identifying subtle results that human experts might not see.

And with implantable and wearable technologies providing steady streams of health information, AI could help remotely monitor patients and spot when something is amiss.

But the report also spells out many challenges and limits.

With imaging, for example, broad use of AI and machine learning for interpreting tests is challenging because the data available to study is limited. Researchers also need to prove AI technology works in each area where it will be used.

With implantable and wearable tech, the research gaps include ways to identify which patients and conditions may be best for AI- and machine learning-enabled remote monitoring. Other issues include how to address cost-effectiveness, privacy, safety and equitable access.

More broadly, protocols on how information is organized and shared are critical, the report says, and potential ethical, legal and regulatory issues need to be addressed.

And while AI algorithms have enhanced the ability to interpret genetic variants and abnormalities, the writing committee warned of limits. Such algorithms, the committee wrote, still require training on human-derived data that can be error-prone and inaccurate.

Excerpt from:
AI shows promise but remains limited for heart/stroke care - Idaho Business Review

Posted in Machine Learning | Comments Off on AI shows promise but remains limited for heart/stroke care – Idaho Business Review

Transfer learning with graph neural networks for improved molecular property prediction in the multi-fidelity setting – Nature.com

We start with a brief review of transfer learning and a formal description of our problem setting. This is followed by a section covering the preliminaries of graph neural networks (GNNs), including standard and adaptive readouts, as well as our supervised variational graph autoencoder architecture. Next, we formally introduce the considered transfer learning strategies, while also providing a brief overview of the frequently used approach for transfer learning in deep learning a two stage learning mechanism consisting of pre-training and fine-tuning of a part or the whole (typically non-geometric) neural network14. In Results section, we perform an empirical study validating the effectiveness of the proposed approaches relative to the latter and state-of-the-art baselines for learning with multi-fidelity data.

Let ({{{{{{{mathcal{X}}}}}}}}) be an instance space and (X={{x}_{1},ldots,{x}_{n}}subset {{{{{{{mathcal{X}}}}}}}}) a sample from some marginal distribution ({rho }_{{{{{{{{mathcal{X}}}}}}}}}). A tuple ({{{{{{{mathcal{D}}}}}}}}=({{{{{{{mathcal{X}}}}}}}},{rho }_{{{{{{{{mathcal{X}}}}}}}}})) is called a domain. Given a specific domain ({{{{{{{mathcal{D}}}}}}}}), a task ({{{{{{{mathcal{T}}}}}}}}) consists of a label space ({{{{{{{mathcal{Y}}}}}}}}) and an objective predictive function (f:{{{{{{{mathcal{X}}}}}}}}to {{{{{{{mathcal{Y}}}}}}}}) that is unknown and needs to be learnt from training data given by examples (({x}_{i},{y}_{i})in {{{{{{{mathcal{X}}}}}}}}times {{{{{{{mathcal{Y}}}}}}}}) with i=1,,n. To simplify the presentation, we restrict ourselves to the setting where there is a single source domain ({{{{{{{{mathcal{D}}}}}}}}}_{S}), and a single target domain ({{{{{{{{mathcal{D}}}}}}}}}_{T}). We also assume that ({{{{{{{{mathcal{X}}}}}}}}}_{T}subseteq {{{{{{{{mathcal{X}}}}}}}}}_{S}), and denote with ({{{{{{{{mathcal{D}}}}}}}}}_{S}={({x}_{{S}_{1}},{y}_{{S}_{1}}),ldots,({x}_{{S}_{n}},{y}_{{S}_{n}})}) and ({{{{{{{{mathcal{D}}}}}}}}}_{T}={({x}_{{T}_{1}},{y}_{{T}_{1}}),ldots,({x}_{{T}_{m}},{y}_{{T}_{m}})}), the observed examples from source and target domains. While the source domain task is associated with low-fidelity data, the target domain task is considered to be sparse and high-fidelity, i.e., it holds that mn.

(54,55). Given a source domain ({{{{{{{{mathcal{D}}}}}}}}}_{S}) and a learning task ({{{{{{{{mathcal{T}}}}}}}}}_{S}), a target domain ({{{{{{{{mathcal{D}}}}}}}}}_{T}) and learning task ({{{{{{{{mathcal{T}}}}}}}}}_{T}), transfer learning aims to help improve the learning of the target predictive function fT in ({{{{{{{{mathcal{D}}}}}}}}}_{T}) using the knowledge in ({{{{{{{{mathcal{D}}}}}}}}}_{S}) and ({{{{{{{{mathcal{T}}}}}}}}}_{S}), where ({{{{{{{{mathcal{D}}}}}}}}}_{S}, ne , {{{{{{{{mathcal{D}}}}}}}}}_{T}) or ({{{{{{{{mathcal{T}}}}}}}}}_{S}, ne , {{{{{{{{mathcal{T}}}}}}}}}_{T}).

The goal in our problem setting is, thus, to learn the objective function fT in the target domain ({{{{{{{{mathcal{D}}}}}}}}}_{T}) by leveraging the knowledge from low-fidelity domain ({{{{{{{{mathcal{D}}}}}}}}}_{S}). The main focus is on devising a transfer learning approach for graph neural networks based on feature representation transfer. We propose extensions for two different learning settings: transductive and inductive learning. In the transductive transfer learning setup considered here, the target domain is constrained to the set of instances observed in the source dataset, i.e., ({{{{{{{{mathcal{X}}}}}}}}}_{T}subseteq {{{{{{{{mathcal{X}}}}}}}}}_{S}). Thus, the task in the target domain requires us to make predictions only at points observed in the source task/domain. In the inductive setting, we assume that source and target domains could differ in the marginal distribution of instances, i.e., ({rho }_{{{{{{{{{mathcal{X}}}}}}}}}_{S}}, ne , {rho }_{{{{{{{{{mathcal{X}}}}}}}}}_{T}}). For both learning settings, we assume that the source domain dataset is significantly larger as it is associated with low-fidelity simulations/approximations.

Here, we follow the brief description of GNNs from8. A graph G is represented by a tuple (G=({{{{{{{mathcal{V}}}}}}}},{{{{{{{mathcal{E}}}}}}}})), where ({{{{{{{mathcal{V}}}}}}}}) is the set of nodes (or vertices) and ({{{{{{{mathcal{E}}}}}}}}subseteq {{{{{{{mathcal{V}}}}}}}}times {{{{{{{mathcal{V}}}}}}}}) is the set of edges. Here, we assume that the nodes are associated with feature vectors xu of dimension d for all (uin {{{{{{{mathcal{V}}}}}}}}). The graph structure is represented by A, the adjacency matrix of a graph G such that Auv=1 if ((u,v)in {{{{{{{mathcal{E}}}}}}}}) and Auv=0 otherwise. For a node (uin {{{{{{{mathcal{V}}}}}}}}) the set of neighbouring nodes is denoted by ({{{{{{{{mathcal{N}}}}}}}}}_{u}={v| (u,, v)in {{{{{{{mathcal{E}}}}}}}}vee (v,, u)in {{{{{{{mathcal{E}}}}}}}}}). Assume also that a collection of graphs with corresponding labels ({{({G}_{i},{y}_{i})}}_{i=1}^{n}) has been sampled independently from a target probability measure defined over ({{{{{{{mathcal{G}}}}}}}}times {{{{{{{mathcal{Y}}}}}}}}), where ({{{{{{{mathcal{G}}}}}}}}) is the space of graphs and ({{{{{{{mathcal{Y}}}}}}}}subset {mathbb{R}}) is the set of labels. From now on, we consider that a graph G is represented by a tuple (XG,AG), with XG denoting the matrix with node features as rows and AG the adjacency matrix. The inputs of graph neural networks consist of such tuples, outputting predictions over the label space. In general, GNNs learn permutation invariant hypotheses that have consistent predictions for the same graph when presented with permuted nodes. This property is achieved through neighbourhood aggregation schemes and readouts that give rise to permutation invariant hypotheses. Formally, a function f defined over a graph G is called permutation invariant if there exists a permutation matrix P such that f(PXG,PAGP)=f(XG,AG). The node features XG and the graph structure (adjacency matrix) AG are used to first learn representations of nodes hv, for all (vin {{{{{{{mathcal{V}}}}}}}}). Permutation invariance in the neighbourhood aggregation schemes is enforced by employing standard pooling functions sum, mean, or maximum. As succinctly described in56, typical neighbourhood aggregation schemes characteristic of GNNs can be described by two steps:

$${{{{{{{{bf{a}}}}}}}}}_{v}^{(k)}={{{{{{{rm{AGGREGATE}}}}}}}}({{{{{{{{{bf{h}}}}}}}}}_{u}^{(k-1)}, | , uin {{{{{{{{mathcal{N}}}}}}}}}_{v}})quad ,{{mbox{and}}},quad \ {{{{{{{{bf{h}}}}}}}}}_{v}^{(k)}={{{{{{{rm{COMBINE}}}}}}}}({{{{{{{{bf{h}}}}}}}}}_{v}^{(k-1)},, {{{{{{{{bf{a}}}}}}}}}_{v}^{(k-1)})$$

(1)

where ({{{{{{{{bf{h}}}}}}}}}_{u}^{(k)}) is a representation of node (uin {{{{{{{mathcal{V}}}}}}}}) at the output of the kth iteration.

After k iterations the representation of a node captures the information contained in its k-hop neighbourhood. For graph-level tasks such as molecular prediction, the last iteration is followed by a readout (also called pooling) function that aggregates the node features hv into a graph representation hG. To enforce a permutation invariant hypotheses, it is again common to employ the standard pooling functions as readouts, namely sum, mean, or maximum.

Standard readout functions (i.e., sum, mean, and maximum) in graph neural networks do not have any parameters and are, thus, not amenable for transfer learning between domains. Motivated by this, we build on our recent work8 that proposes a neural network architecture to aggregate learnt node representations into graph embeddings. This allows for freezing the part of a GNN architecture responsible for learning effective node representations and fine-tuning the readout layer in small-sample downstream tasks. In the remainder of the section, we present a Set Transformer readout that retains the permutation invariance property characteristic of standard pooling functions. Henceforth, suppose that after completing a pre-specified number of neighbourhood aggregation iterations, the resulting node features are collected into a matrix ({{{{{{{bf{H}}}}}}}}in {{mathbb{R}}}^{Mtimes D}), where M is the maximal number of nodes that a graph can have in the dataset and D is the dimension of the output node embedding. For graphs with less than M vertices, H is padded with zeros.

Recently, an attention-based neural architecture for learning on sets has been proposed by Lee et al.57. The main difference compared to the classical attention model proposed by Vaswani et al.9 is the absence of positional encodings and dropout layers. As graphs can be seen as sets of nodes, we leverage this architecture as a readout function in graph neural networks. For the sake of brevity, we omit the details of classical attention models9 and summarise only the adaptation to sets (and thus graphs). The Set Transformer (ST) takes as input matrices with set items (in our case, graph nodes) as rows and generates graph representations by composing encoder and decoder modules implemented using attention:

$${{{{{{{rm{ST}}}}}}}}({{{{{{{bf{H}}}}}}}})=frac{1}{K}mathop{sum }limits_{k=1}^{K}{left[{{{{{{{rm{Decoder}}}}}}}}, left({{{{{{{rm{Encoder}}}}}}}}, left({{{{{{{bf{H}}}}}}}}right)right)right]}_{k}$$

(2)

where ({left[cdot right]}_{k}) refers to a computation specific to head k of a multi-head attention module. The encoder-decoder modules follow the definition of Lee et al.57:

$${{{{{{{rm{Encoder}}}}}}}}, left({{{{{{{bf{H}}}}}}}}right)=, {{{{{{{{rm{MAB}}}}}}}}}^{n}, left({{{{{{{bf{H}}}}}}}},, {{{{{{{bf{H}}}}}}}}right)$$

(3)

$${{{{{{{rm{Decoder}}}}}}}}, ({{{{{{{bf{Z}}}}}}}})={{{{{{{rm{FF}}}}}}}}left({{{{{{{{rm{MAB}}}}}}}}}^{m}, left({{{{{{{rm{PMA}}}}}}}}, ({{{{{{{bf{Z}}}}}}}}),, {{{{{{{rm{PMA}}}}}}}}, ({{{{{{{bf{Z}}}}}}}})right)right)$$

(4)

$${{{{{{{rm{PMA}}}}}}}}({{{{{{{bf{Z}}}}}}}})={{{{{{{rm{MAB}}}}}}}}({{{{{{{bf{s}}}}}}}},, {{{{{{{rm{FF}}}}}}}}({{{{{{{bf{Z}}}}}}}}))$$

(5)

$${{{{{{{rm{MAB}}}}}}}}({{{{{{{bf{X}}}}}}}},, {{{{{{{bf{Y}}}}}}}})={{{{{{{bf{A}}}}}}}}+{{{{{{{rm{FF}}}}}}}}({{{{{{{bf{A}}}}}}}})$$

(6)

$${{{{{{{bf{A}}}}}}}} ={{{{{{{bf{X}}}}}}}}+{{{{{{{rm{Multi}}}}}}}}{{{{{{{rm{Head}}}}}}}}({{{{{{{bf{X}}}}}}}},, {{{{{{{bf{Y}}}}}}}},, {{{{{{{bf{Y}}}}}}}}).$$

(7)

Here, H denotes the node features after neighbourhood aggregation and Z is the encoder output. The encoder is a chain of n classical multi-head attention blocks (MAB) without positional encodings. The decoder component consists of a pooling by multi-head attention block (PMA) (which uses a learnable seed vector s within a multi-head attention block to create an initial readout vector) that is further processed via a chain of m self-attention modules and a linear projection block (also called feedforward, FF). In contrast to typical set-based neural architectures that process individual items in isolation (most notably deep sets58), the presented adaptive readouts account for interactions between all the node representations generated by the neighbourhood aggregation scheme. A particularity of this architecture is that the dimension of the graph representation can be disentangled from the node output dimension and the aggregation scheme.

We start with a review of variational graph autoencoders (VGAEs), originally proposed by Kipf and Welling59, and then introduce a variation that allows for learning of a predictive model operating in the latent space of the encoder. More specifically, we propose to jointly train the autoencoder together with a small predictive model (multi-layer perceptron) operating in its latent space by including an additional loss term that accounts for the target labels. Below, we follow the brief description of6.

A variational graph autoencoder consists of a probabilistic encoder and decoder, with several important differences compared to standard architectures operating on vector-valued inputs. The encoder component is obtained by stacking graph convolutional layers to learn the parameter matrices and that specify the Gaussian distribution of a latent space encoding. More formally, we have that

$$q({{{{{{{bf{Z}}}}}}}}, | , {{{{{{{bf{X}}}}}}}},, {{{{{{{bf{A}}}}}}}})=mathop{prod }limits_{i=1}^{N}q({{{{{{{{bf{z}}}}}}}}}_{i}, | , {{{{{{{bf{X}}}}}}}},{{{{{{{bf{A}}}}}}}})quad ,{{mbox{and}}},quad q({{{{{{{{bf{z}}}}}}}}}_{i}, | , {{{{{{{bf{X}}}}}}}},, {{{{{{{bf{A}}}}}}}})={{{{{{{mathcal{N}}}}}}}}({{{{{{{{bf{z}}}}}}}}}_{i}, | , {{{{{{{{boldsymbol{mu }}}}}}}}}_{i},,{{mbox{diag}}},({{{{{{{{boldsymbol{sigma }}}}}}}}}_{i}^{2})),$$

(8)

with =GCN,n(X,A) and (log {{{{{{{boldsymbol{sigma }}}}}}}}={{{mbox{GCN}}}}_{sigma,n}({{{{{{{bf{X}}}}}}}},{{{{{{{bf{A}}}}}}}})). Here, GCN,n is a graph convolutional neural network with n layers, X is a node feature matrix, A is the adjacency matrix of the graph, and ({{{{{{{mathcal{N}}}}}}}}) denotes the Gaussian distribution. Moreover, the model typically assumes the existence of self-loops, i.e., the diagonal of the adjacency matrix consists of ones.

The decoder reconstructs the entries in the adjacency matrix by passing the inner product between latent variables through the logistic sigmoid. More formally, we have that

$$p({{{{{{{bf{A}}}}}}}}, | , {{{{{{{bf{Z}}}}}}}})=mathop{prod }limits_{i=1}^{N}mathop{prod }limits_{j=1}^{N}p({{{{{{{{bf{A}}}}}}}}}_{ij}, | , {{{{{{{{bf{z}}}}}}}}}_{i},{{{{{{{{bf{z}}}}}}}}}_{j})quad ,{{mbox{and}}},quad p({{{{{{{{bf{A}}}}}}}}}_{ij}=1, | , {{{{{{{{bf{z}}}}}}}}}_{i},{{{{{{{{bf{z}}}}}}}}}_{j})=tau ({{{{{{{{bf{z}}}}}}}}}_{i}^{top }{{{{{{{{bf{z}}}}}}}}}_{j}),$$

(9)

where Aij are entries in the adjacency matrix A and () is the logistic sigmoid function. A variational graph autoencoder is trained by optimising the evidence lower-bound loss function that can be seen as the combination of a reconstruction and a regularisation term:

$$tilde{{{{{{{mathcal{L}}}}}}}}({{{{{{mathbf{X}}}}}}},, {{{{{{mathbf{A}}}}}}})=underbrace{{{mathbb{E}}_{q({{{{{{mathbf{Z}}}}}}} mid {{{{{{mathbf{X}}}}}}},{{{{{{mathbf{A}}}}}}})} left[ log p({{{{{{mathbf{A}}}}}}} mid {{{{{{mathbf{Z}}}}}}}) right]}}_{{{{{{{mathcal{L}}}}}}}_{{{{{{{rm{RECON}}}}}}}}} - underbrace{{{{{mbox{KL}}}}} left[ q({{{{{{mathbf{Z}}}}}}} | {{{{{{mathbf{X}}}}}}},{{{{{{mathbf{A}}}}}}}) parallel p({{{{{{mathbf{Z}}}}}}}) right]}_{{{{{{{mathcal{L}}}}}}}_{{{{{{{rm{REG}}}}}}}}}$$

(10)

where KL[q()p()] is the Kullback-Leibler divergence between the variational distribution q() and the prior p(). The prior is assumed to be a Gaussian distribution given by (p({{{{{{{bf{Z}}}}}}}})={prod }_{i}p({{{{{{{{bf{z}}}}}}}}}_{i})={prod }_{i}{{{{{{{mathcal{N}}}}}}}}({{{{{{{{bf{z}}}}}}}}}_{i}, | , 0,, {{{{{{{bf{I}}}}}}}})). As the adjacency matrices of graphs are typically sparse, instead of taking all the negative entries when training one typically performs sub-sampling of entries with Aij=0.

We extend this neural architecture by adding a feedforward component operating on the latent space and account for its effectiveness via the mean squared error loss term that is added to the optimisation objective. More specifically, we optimise the following loss function:

$${{{{{{{mathcal{L}}}}}}}}({{{{{{{bf{X}}}}}}}},, {{{{{{{bf{A}}}}}}}},, {{{{{{{bf{y}}}}}}}})=tilde{{{{{{{{mathcal{L}}}}}}}}}({{{{{{{bf{X}}}}}}}},, {{{{{{{bf{A}}}}}}}})+frac{1}{N}mathop{sum }limits_{i=1}^{N}parallel nu ({{{{{{{{bf{Z}}}}}}}}}_{i})-{{{{{{{{bf{y}}}}}}}}}_{i}{parallel }^{2},$$

(11)

where (Z) is the predictive model operating on the latent space embedding Z associated with graph (X, A), y is the vector with target labels, and N is the number of labelled instances. Figure2 illustrates the setting and our approach to transfer learning using supervised variational graph autoencoders.

We note that our supervised variational graph autoencoder resembles the joint property prediction variational autoencoder (JPP-VAE) proposed by Gmez-Bombarelli et al.39. Their approach has been devised for generative purposes, which we do not consider here. The main difference to our approach, however, is the fact that JPP-VAE is a sequence model trained directly on the SMILES60 string representation of molecules using recurrent neural networks, a common approach in generative models61,62. The transition from traditional VAEs to geometric deep learning (graph data) in the first place, and then to molecular structures is not a trivial process for at least two reasons. Firstly, a variational graph autoencoder only reconstructs the graph connectivity information (i.e., the equivalent of the adjacency matrix) and not the node (atom) features, according to the original definition by Kipf and Welling. This is in contrast to traditional VAEs where the latent representation is directly optimised against the actual input data. The balance between reconstruction functions (for the connectivity, and node features respectively) is thus an open question in geometric deep learning. Secondly, for molecule-level tasks such as prediction and latent space representation, the readout function of the variational graph autoencoders is crucial. As we have previously explored in8 and further validate in Results section, standard readout functions such as sum, mean, or maximum lead to uninformative representations that are similar to completely unsupervised training (i.e., not performing well in transfer learning tasks). Thus, the supervised or guided variational graph autoencoders presented here are also an advancement in terms of graph representation learning for modelling challenging molecular tasks at the multi-million scale.

In the context of quantum chemistry and thedesign of molecular materials, the most computationally demanding task corresponds to the calculation of energy contribution that constitutes only a minor fraction of total energy, while the majority of the remaining calculations can be accounted for via efficient proxies28. Motivated by this, Ramakrishnan et al.28 have proposed an approach known as -machine learning, where the desired molecular property is approximated by learning an additive correction term for a low-fidelity proxy. For linear models, an approach along these lines can be seen as feature augmentation where instead of the constant bias term one appends the low-fidelity approximation as a component to the original representation of an instance. More specifically, if we represent a molecule in the low-fidelity domain via ({{{{{{{bf{x}}}}}}}}in {{{{{{{{mathcal{X}}}}}}}}}_{S}) then the representation transfer for ({{{{{{{{mathcal{D}}}}}}}}}_{T}) can be achieved via the feature mapping

$${Psi }_{{{{{{{{rm{Label}}}}}}}}}({{{{{{{bf{x}}}}}}}})=parallel ({, f}_{S}({{{{{{{bf{x}}}}}}}}),, {{{{{{{bf{x}}}}}}}})$$

(12)

where (, ) denotes concatenation in the last tensor dimension and fS is the objective prediction function associated with the source (low-fidelity) domain ({{{{{{{{mathcal{D}}}}}}}}}_{S}) defined in Overview of transfer learning and problem setting section. We consider this approach in the context of transfer learning for general methods (including GNNs) and standard baselines that operate on molecular fingerprints (e.g., support vector machines, random forests, etc.). A limitation of this approach is that it constrains the high-fidelity domain to the transductive setting and instances that have been observed in the low-fidelity domain. A related set of methods in the drug discovery literature called high-throughput fingerprints34,35,36,37 function in effectively the same manner, using a vector of hundreds of experimental single-dose (low-fidelity) measurements and optionally a standard molecular fingerprint as a general molecular representation (i.e., not formulated specifically for transductive or multi-fidelity tasks). In these cases, the burden of collecting the low-fidelity representation is substantial, involving potentially hundreds of experiments (assays) that are often disjoint, resulting in sparse fingerprints and no practical way to make predictions about compounds that have not been part of the original assays. In drug discovery in particular it is desirable to extend beyond this setting and enable predictions for arbitrary molecules, i.e., outside of the low-fidelity domain. Such a model would enable property prediction for compounds before they are physically synthesised, a paradigm shift compared to existing HTS approaches. To overcome the transductive limitation, we consider a feature augmentation approach that leverages low-fidelity data to learn an approximation of the objective function in that domain. Then, transfer learning to the high-fidelity domain happens via the augmented feature map

$${Psi }_{{{{{{{{rm{(Hybrid, label)}}}}}}}}}({{{{{{{bf{x}}}}}}}})=left{begin{array}{ll}parallel ({, f}!_{!S}({{{{{{{bf{x}}}}}}}}),, {{{{{{{bf{x}}}}}}}})quad &,{{mbox{if}}},quad {{{{{{{bf{x}}}}}}}}in {{{{{{{{mathcal{X}}}}}}}}}_{S},\ parallel ({tilde{, f}}!_{S}({{{{{{{bf{x}}}}}}}}),, {{{{{{{bf{x}}}}}}}})quad &,{{mbox{otherwise}}},end{array}right.$$

(13)

where ({tilde{f}}_{S}) is an approximation of the low-fidelity objective function fS. This is a hybrid approach that allows extending to the inductive setting with a different treatment between instances observed in the low-fidelity domain and the ones associated with the high-fidelity task exclusively. Another possible extension that treats all instances in the high-fidelity domain equally is via the map (Predictedlabel) that augments the input feature representation using an approximate low-fidelity objective (({tilde{f}}!!_{S})), i.e.,

$${Psi }_{({{{{{{{rm{Predicted}}}}}}}}, {{{{{{{rm{label}}}}}}}})}({{{{{{{bf{x}}}}}}}})=!!parallel ({tilde{f}}_{S}({{{{{{{bf{x}}}}}}}}),, {{{{{{{bf{x}}}}}}}})$$

(14)

Our final feature augmentation amounts to learning a latent representation of molecules in the low-fidelity domain using a supervised autoencoder (see Supervised variational graph autoencoders section), then jointly training alongside the latent representation of a model that is being fitted to the high-fidelity data. This approach also lends itself to the inductive setting. More formally, transfer learning in this case can be achieved via the feature mapping

$${Psi }_{{{{{{{{rm{Embeddings}}}}}}}}}({{{{{{{bf{x}}}}}}}})=!!parallel ({psi }_{S}({{{{{{{bf{x}}}}}}}}),, {psi }_{T}({{{{{{{bf{x}}}}}}}}))$$

(15)

where S(x) is the latent embedding obtained by training a supervised autoencoder on low-fidelity data ({{{{{{{{mathcal{D}}}}}}}}}_{S}), and T(x) represents the latent representation of a model trained on the sparse high-fidelity task. Note that S(x) is fixed (the output of the low-fidelity model which is trained separately), while T (x) is the current embedding of the high-fidelity model that is being learnt alongside S (x) and can be updated.

Supervised pre-training and fine-tuning is a transfer learning strategy that has previously proven successful for non-graph neural networks in the context of energy prediction for small organic molecules. In its simplest form, and as previously used by Smith et al.14, the strategy consists of first training a model on the low-fidelity data ({{{{{{{{mathcal{D}}}}}}}}}_{S}) (the pre-training step). Afterwards, the model is retrained on the high-fidelity data ({{{{{{{{mathcal{D}}}}}}}}}_{T}), such that it now outputs predictions at the desired fidelity level (the fine-tuning step). For the fine-tuning step, certain layers of the neural network are typically frozen, which means that gradient computation is disabled for them. In other words, their weights are fixed to the values learnt during the pre-training step and are not updated. This technique reduces the number of learnable parameters, thus helping to avoid over-fitting to a smaller high-fidelity dataset and reducing training times. Formally, we assume that we have a low-fidelity predictor ({tilde{f}}_{S}) (corresponding to pre-training) and define the steps required to re-train or fine-tune ablank model ({tilde{f}}_{{T}_{0}}) (in domain ({{mathcal{T}}}))into a high-fidelity predictor ({tilde{f}}_{T})

$${{{{{{{{bf{W}}}}}}}}}_{S}=,{{mbox{Weights}}},({tilde{f}}_{S})quad (,{{mbox{Extract weights of pre-trained model}}},{tilde{f}}_{S})$$

(16)

$${{{{{{{{bf{W}}}}}}}}}_{S}=,{{mbox{Freeze}}},({{{{{{{{bf{W}}}}}}}}}_{{S}_{{{{{{{{rm{GCN}}}}}}}}}},ldots )quad (,{{mbox{Freeze components,e.g.}}},{{{{{{{rm{GCN}}}}}}}},{{mbox{layers}}},)$$

(17)

$${tilde{f}}_{{T}_{0}}={{{{{{{{bf{W}}}}}}}}}_{S}quad (,{{mbox{Assign weights of}}},{tilde{f}}_{S},{{mbox{to a blank model}}},{tilde{f}}_{{T}_{0}})$$

(18)

where ({tilde{f}}_{{T}_{0}}) is fine-tuned into ({tilde{f}}_{T}). As a baseline, we define a simple equivalent to the neural network in Smith et al., where we pre-train and fine-tune a supervised VGAE model with the sum readout and without any frozen layers. This is justified by GNNs having a small number of layers to avoid well-known problems such as oversmoothing. As such, the entire VGAE is fine-tuned and the strategy is termed (TuneVGAE):

$${{{{{{mathbf{W}}}}}}}_S={{{{mbox{Freeze}}}}}(varnothing) qquad ({{mbox{No component is frozen}}})$$

(19)

$${tilde{f}}_{{T}_{0}} !!!={{{{{{{{bf{W}}}}}}}}}_{S}qquad (,{{mbox{Assign initial weights}}},)$$

(20)

$${Psi }_{{{{{left({{{{{rm{Tune}}}}}}; {{{{{rm{VGAE}}}}}}right)}}}}}({{{{{{{bf{x}}}}}}}})={tilde{f}}_{T}({{{{{{{bf{x}}}}}}}})qquad (,{{mbox{The final model is the fine-tuned}}},{tilde{f}}_{!!T})$$

(21)

Standard GNN readouts such as the sum operator are fixed functions with no learnable parameters. In contrast, adaptive readouts are implemented as neural networks, and the overall GNN becomes a modular architecture composed of (1) the supervised VGAE layers and (2) an adaptive readout. Consequently, there are three possible ways to freeze components at this level: (i) frozen graph convolutional layers and trainable readout, (ii) trainable graph layers and frozen readout, and (iii) trainable graph layers and trainable readout (no freezing). After a preliminary study on a representative collection of datasets, we decided to follow strategy (i) due to empirically strong results and overall originality for transfer learning with graph neural networks. More formally, we have that

$${{{{{{{{bf{W}}}}}}}}}_{S}=,{{mbox{Freeze}}},({{{{{{{{bf{W}}}}}}}}}_{{S}_{{{{{{{{rm{GCN}}}}}}}}}})qquad (,{{mbox{Freeze all}}},{{{{{{{rm{GCN}}}}}}}},{{mbox{layers}}},)$$

(22)

$${tilde{f}}_{{T}_{0}} !!!={{{{{{{{bf{W}}}}}}}}}_{S}qquad (,{{mbox{Assign initial weights}}},)$$

(23)

$${Psi }_{left({{{{mathrm{Tune}}}};{{{mathrm{readout}}}}}right)}({{{{{{{bf{x}}}}}}}})={tilde{f}}_{T}({{{{{{{bf{x}}}}}}}})quad (,{{mbox{The final model is the fine-tuned}}},{tilde{f}}_{T})$$

(24)

For drug discovery tasks, low-fidelity (LF) data consists of single-dose measurements (SD, performed at a single concentration) for a large collection of compounds. The high-fidelity (HF) data consists of dose-response (DR) measurements corresponding to multiple different concentrations that are available for a small collection of compounds (see Fig.1, top). In the quantum mechanics experiments, we have opted for the recently-released QMugs dataset with 657K unique drug-like molecules and 12 quantum properties. The data originating from semi-empirical GFN2-xTB simulations act as the low-fidelity task, and the high-fidelity component is obtained via density-functional theory (DFT) calculations (B97X-D/def2-SVP). The resulting multi-fidelity datasets are defined as datasets where SMILES-encoded molecules are associated with two different measurements of different fidelity levels.

As modelling large-scale high-throughput screening data and transfer learning in this context are understudied applications, a significant effort was made to carefully select and filter suitable data from public (PubChem) and proprietary (AstraZeneca) sources, covering a multitude of different settings. To this end, we have assembled several multi-fidelity drug discovery datasets (Fig.1, top) from PubChem, aiming to capture the heterogeneity intrinsic to large-scale screening campaigns, particularly in terms of assay types, screening technologies, concentrations, scoring metrics, protein targets, and scope. This has resulted in 23 multi-fidelity datasets (Supplementary Table1) that are now part of the concurrently published MF-PCBA collection29. We have also curated 16 multi-fidelity datasets based on historical AstraZeneca (AZ) HTS data (Supplementary Table2), the emphasis now being put on expanding the number of compounds in the primary (1 million+) and confirmatory screens (1000 to 10,000). The search, selection, and filtering steps, along with the naming convention are detailed in Supplementary Notes5 and29. As the QMugs dataset contains a few erroneous calculations, we apply a filtering protocol similar to the drug discovery data and remove the values that diverge by more than 5 standard deviations, which removes just over 1% of the molecules present. The QMugs properties are listed in Supplementary Table3. For the transductive setting, we selected a diverse and challenging set of 10K QMugs molecules (Supplementary Notes5.1), which resembles the drug discovery setting.

While methods to artificially generate multi-fidelity data with desired fidelity correlations have recently been proposed63, we did not pursue this direction as remarkably large collections of real-world multi-fidelity data are available, covering a large range of fidelity correlations and diverse chemical spaces. Furthermore, the successful application of such techniques to molecular data is yet to be demonstrated.

Further information on research design is available in theNature Portfolio Reporting Summary linked to this article.

Read more:
Transfer learning with graph neural networks for improved molecular property prediction in the multi-fidelity setting - Nature.com

Posted in Machine Learning | Comments Off on Transfer learning with graph neural networks for improved molecular property prediction in the multi-fidelity setting – Nature.com

AI that designs and runs networks might not be far off – Light Reading

Artificial intelligence (AI) can already be unleashed to write sonnets in the style of Shakespeare or music that evokes Beethoven. But what if an AI could bypass the Open Systems Interconnection (OSI) model, the decades-old system for conceptualizing network design, and come up with a better air interface than anything a human has produced? If it sounds like pure science fiction, think again.

Deep within the darkest laboratories in Sweden, where Ericsson pioneers wireless research, work has already started on potentially taking AI to this next level in network design. "You could let the algorithm figure out a better way," said Erik Ekudden, Ericsson's chief technology officer, at a recent press event in London. Companies such as Qualcomm and Picocom, a small developer of silicon for small cells, are also thought to be exploring the possibilities. It all raises the prospect of network technologies designed entirely by machines, beyond the comprehension of the world's smartest scientists.

As it exists today, the OSI model imagines the network in seven layers, starting with the physical device links and moving all the way up to customer-facing applications. None of this is especially scientific, but it allows even the cleverest specialists to make sense of the whole shebang rather than just understanding their own contributions. "We've done that to make it intelligible to us," said Gabriel Brown, a principal analyst with Heavy Reading (a Light Reading sister company) on a recent Telecoms.com podcast. "But an AI-native thing doesn't have to have those limitations on it."

The technology could feasibly work by combining the advanced pattern-recognition principles of generative AI and large language models with the time series data found in a radio link. "You can start making relations between that," Brown told the Telecoms.com podcast. "You use the same technologies, the same computing ideas, to develop a much more efficient system."

Trusting your AI

Scrapping the OSI model, though, would inevitably conjure alarming thoughts of AI-created technologies that no person can understand, and subsequent Armageddon if the AI goes haywire. "If you apply new AI technologies to rebuild or build a new system, of course it would have to be not a black box but a very open box so that we can check what really goes on," said Ekudden, emphasizing the need for what he calls "trustworthy AI."

Keen to demonstrate a commitment to AI transparency, Ericsson this month added an "Explainable AI" feature to its latest software products. The basic idea is to show a telco how the AI-powered technology reached the conclusions it did. Ekudden, though, sounds unimpressed with broader government efforts in this trustworthiness area. "Current regulation, even after the UK summit, is not very helpful," he said.

Held in November, that summit featured Rishi Sunak, the UK's prime minister, in conversation with Elon Musk, naturally spotlighting generative AI and social media. But non-generative AI has already been used heavily to optimize networks, Ekudden points out. "We cannot go back."

Network designs that go far beyond what people have accomplished would be revolutionary, akin to an AI that fooled literary critics into thinking it were a human novelist with a fresh and unique style, or one that made other scientific breakthroughs. But Ericsson's CTO plays down any likelihood generative AI can produce something of major value.

"It depends on how good or bad a job we have done as humans, because the beauty of generative AI is that it really mimics humans very well," he said. With today's networks now optimized to a high level, humans are not even the best reference point for newer forms of AI. "Machines are already doing that better," explained Ekudden. "The kind of data-driven machine-learning capabilities that we have employed to build the best coding scheme, the best OSI stack, are pretty good. If you want some level of generative AI to outperform that, you really need to do a good job at generative AI."

He is not the only human doubting AI will have much impact anytime soon. "Broadly, these AI systems work by pattern recognition," said William Webb, the chief technology officer of Access Partnership, a consulting company, and a former director at UK regulatory body Ofcom. "They get trained on thousands to millions of examples which are already labelled as 'good' or 'bad.' They learn what patterns lead to 'good' and to 'bad' and can then influence future operation. But there isn't much labelled data so it's hard to understand what the AI would be trained on," he told Light Reading by email.

Webb is also dubious because the sheer quantity of network variables would require that a huge data set be used for training purposes. "There are good uses of AI in telecom networks, but it's not clear this is one of them," he said.

When machines give the orders

A far more realistic scenario in the next few years is that networks designed and installed by humans will be manageable without them. Much like carmakers, telecom players now refer to five levels of automation. Under definitions established by the TM Forum, a telecom standards group, Level 1 denotes "assisted operations and maintenance," while Level 5 is a "fully autonomous network."

Those may be technically possible in just a few years' time, according to Ekudden. But he sounds unconvinced they should be widely deployed, likening them to "robots on the streets" and self-driving cars outside controlled areas. "Unless you do that in a responsible way, so you are actually creating risks, I don't think it is a good idea to do it, and the same is true for networks," he said.

Nevertheless, Ericsson has already applied AI tools to automate parts of its managed services unit. Back in 2019, before that had been merged with other units to form the current cloud software and services business group, Peter Laurin, then Ericsson's managed services head, held AI responsible for some of the 8,000 job cuts at his unit in the previous year, more than a fifth of the former total.

Many big telcos have also been moving quickly to automate operations and technical activities. Shankar Arumugavelu, the chief information officer of Verizon, is already eyeing the transition to Level 4 capability described by the TM Forum as a "highly autonomous network" and he evidently believes technology is not the main barrier. "Today, some of the key decisions that are being made by humans are we comfortable letting that go and having the machine make that decision?" he said at a recent press briefing organized by the TM Forum. "I think that is the bridge we have to cross."

The transfer of decision-making responsibilities to AI would stoke obvious ethical concerns and threaten to make humans entirely redundant in this part of the telco business. But Arumugavelu envisages a set-up in which engineers act on the insights and recommendations of the AI. "Work goes to the people rather than people going to the work," he said. "This is the machine I am talking about that is sending and directing work to groups."

Headcount has fallen dramatically at Verizon and other large telcos in the last decade, as data-gathering by Light Reading has illustrated, although job cuts can be attributed in many cases to merger activity, the sale of assets and other, more mundane, efficiency measures. Yet Verizon has been able to grow annual sales by 2% since 2018, despite cutting more than 39,000 jobs or 27% of the total over that period.

A big question, though, is whether job cuts on the technology side will do much to boost profits. Scott Petty, the chief technology officer of Vodafone, thinks not. "That's not a massive driver of opex or costs in the organization," he said at the same TM Forum event, citing energy, leases and maintenance of software and equipment as much bigger expenses. "People is an important cost, but it is not the most important in the cost of a network."

Read more here:
AI that designs and runs networks might not be far off - Light Reading

Posted in Machine Learning | Comments Off on AI that designs and runs networks might not be far off – Light Reading