Search Immortality Topics:

Page 110«..1020..109110111112..120130..»


Category Archives: Machine Learning

Automated Machine Learning is the Future of Data Science – Analytics Insight

As the fuel that powers their progressing digital transformation endeavors, organizations wherever are searching for approaches to determine as much insight as could reasonably be expected from their data. The accompanying increased demand for advanced predictive and prescriptive analytics has, thus, prompted a call for more data scientists capable with the most recent artificial intelligence (AI) and machine learning (ML) tools.

However, such highly-skilled data scientists are costly and hard to find. Truth be told, theyre such a valuable asset, that the phenomenon of the citizen data scientist has of late emerged to help close the skills gap. A corresponding role, as opposed to an immediate substitution, citizen data scientists need explicit advanced data science expertise. However, they are fit for producing models utilizing best in class diagnostic and predictive analytics. Furthermore, this ability is incomplete because of the appearance of accessible new technologies, for example, automated machine learning (AutoML) that currently automate a significant number of the tasks once performed by data scientists.

The objective of autoML is to abbreviate the pattern of trial and error and experimentation. It burns through an enormous number of models and the hyperparameters used to design those models to decide the best model available for the data introduced. This is a dull and tedious activity for any human data scientist, regardless of whether the individual in question is exceptionally talented. AutoML platforms can play out this dreary task all the more rapidly and thoroughly to arrive at a solution faster and effectively.

A definitive estimation of the autoML tools isnt to supplant data scientists however to offload their routine work and streamline their procedure to free them and their teams to concentrate their energy and consideration on different parts of the procedure that require a more significant level of reasoning and creativity. As their needs change, it is significant for data scientists to comprehend the full life cycle so they can move their energy to higher-value tasks and sharpen their abilities to additionally hoist their value to their companies.

At Airbnb, they continually scan for approaches to improve their data science workflow. A decent amount of their data science ventures include machine learning and numerous pieces of this workflow are tedious. At Airbnb, they use machine learning to build customer lifetime value models (LTV) for guests and hosts. These models permit the company to improve its decision making and interactions with the community.

Likewise, they have seen AML tools as generally valuable for regression and classification problems involving tabular datasets, anyway, the condition of this area is rapidly progressing. In outline, it is accepted that in specific cases AML can immensely increase a data scientists productivity, often by an order of magnitude. They have used AML in many ways.

Unbiased presentation of challenger models: AML can rapidly introduce a plethora of challenger models utilizing a similar training set as your incumbent model. This can help the data scientist in picking the best model family. Identifying Target Leakage: In light of the fact that AML builds candidate models amazingly fast in an automated way, we can distinguish data leakage earlier in the modeling lifecycle. Diagnostics: As referenced prior, canonical diagnostics can be automatically created, for example, learning curves, partial dependence plots, feature importances, etc. Tasks like exploratory data analysis, pre-processing of data, hyper-parameter tuning, model selection and putting models into creation can be automated to some degree with an Automated Machine Learning system.

Companies have moved towards enhancing predictive power by coupling huge data with complex automated machine learning. AutoML, which uses machine learning to create better AI, is publicized as affording opportunities to democratise machine learning by permitting firms with constrained data science expertise to create analytical pipelines equipped for taking care of refined business issues.

Including a lot of algorithms that automate that writing of other ML algorithms, AutoML automates the end-to-end process of applying ML to real-world problems. By method for representation, a standard ML pipeline consists of the following: data pre-processing, feature extraction, feature selection, feature engineering, algorithm selection, and hyper-parameter tuning. In any case, the significant ability and time it takes to execute these strides imply theres a high barrier to entry.

In an article distributed on Forbes, Ryohei Fujimaki, the organizer and CEO of dotData contends that the discussion is lost if the emphasis on AutoML systems is on supplanting or decreasing the role of the data scientist. All things considered, the longest and most challenging part of a typical data science workflow revolves around feature engineering. This involves interfacing data sources against a rundown of wanted features that are assessed against different Machine Learning algorithms.

Success with feature engineering requires an elevated level of domain aptitude to recognize the ideal highlights through a tedious iterative procedure. Automation on this front permits even citizen data scientists to make streamlined use cases by utilizing their domain expertise. More or less, this democratization of the data science process makes the way for new classes of developers, offering organizations a competitive advantage with minimum investments.

Here is the original post:
Automated Machine Learning is the Future of Data Science - Analytics Insight

Posted in Machine Learning | Comments Off on Automated Machine Learning is the Future of Data Science – Analytics Insight

Googles AutoML Zero lets the machines create algorithms to avoid human bias – The Next Web

It looks like Googles working on some major upgrades to its autonomous machine learning development language AutoML. According to a pre-print research paper authored by several of the big Gs AI researchers, AutoML Zero is coming, and its bringing evolutionary algorithms with it.

AutoML is a tool from Google that automates the process of developing machine learning algorithms for various tasks. Its user-friendly, fairly simple to use, and completely open-source. Best of all, Googles always updating it.

In its current iteration, AutoML has a few drawbacks. You still have to manually create and tune several algorithms to act as building blocks for the machine to get started. This allows it to take your work and experiment with new parameters in an effort to optimize what youve done. Novices can get around this problem by using pre-made algorithm packages, but Googles working to automate this part too.

Per the Google teams pre-print paper:

It is possible today to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks. We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space.

Despite the vastness of this space, evolutionary search can still discover two-layer neural networks trained by backpropagation. These simple neural networks can then be surpassed by evolving directly on tasks of interest, e.g. CIFAR-10 variants, where modern techniques emerge in the top algorithms, such as bilinear interactions, normalized gradients, and weight averaging.

Moreover, evolution adapts algorithms to different task types: e.g., dropout-like techniques appear when little data is available.

In other words: Googles figured out how to tap evolutionary algorithms for AutoML using nothing but basic math concepts. The developers created a learning paradigm in which the machine will spit out 100 randomly generated algorithms and then work to see which ones perform the best.

After several generations, the algorithms become better and better until the machine finds one that performs well enough to evolve. In order to generate novel algorithms that can solve new problems, the ones that survive the evolutionary process are tested against various standard AI problems, such as computer vision.

Read: Why the quickest path to human-level AI may be letting it evolve on its own

Perhaps the most interesting byproduct of Googles quest to completely automate the act of generating algorithms and neural networks is the removal of human bias from our AI systems. Without us there to determine what the best starting point for development is, the machines are free to find things wed never think of.

According to the researchers, AutoML Zero already outperforms its predecessor and similar state-of-the-art machine learning-generation tools. Future research will involve setting a more narrow scope for the AI and seeing how well it performs in more specific situations using a hybrid approach that creates algorithms with a combination of Zeros self-discovery techniques and human-curated starter libraries.

Published April 14, 2020 20:00 UTC

See more here:
Googles AutoML Zero lets the machines create algorithms to avoid human bias - The Next Web

Posted in Machine Learning | Comments Off on Googles AutoML Zero lets the machines create algorithms to avoid human bias – The Next Web

Nothing to hide? Then add these to your ML repo, Papers with Code says DEVCLASS – DevClass

In a bid to make advancements in machine learning more reproducible, ML resource and Facebook AI Research (FAIR) appendage Papers With Code has introduced a code completeness checklist for machine learning papers.

It is based on the best practices the Papers with Code team has seen in popular research repositories and the Machine Learning Reproducibility Checklist which Joelle Pineau, FAIR Managing Director, introduced in 2019, as well as some additional work Pineau and other researchers did since then.

Papers with Code was started in 2018 as a hub for newly published machine learning papers that come with source code, offering researchers an easy to monitor platform to keep up with the current state of the art. In late 2019 it became part of FAIR to further accelerate our growth, as founders Robert Stojnic and Ross Taylor put it back then.

As part of FAIR, the project will get a bit of a visibility push since the new checklist will also be used in the submission process for the 2020 edition of the popular NeurIPS conference on neural information processing systems.

The ML code completeness checklist is used to assess code repositories based on the scripts and artefacts that have been provided within it to enhance reproducibility and enable others to more easily build upon published work. It includes checks for dependencies, so that those looking to replicate a papers results have some idea what is needed in order to succeed, training and evaluation scripts, pre-trained models, and results.

While all of these seem like useful things to have, Papers with Code also tried using a somewhat scientific approach to make sure they really are indicators for a useful repository. To verify that, they looked for correlations between the number of fulfilled checklist items and the star-rating of a repository.

Their analysis showed that repositories that hit all the marks got higher ratings implying that the checklist score is indicative of higher quality submissions and should therefore encourage researchers to comply in order to produce useful resources. However, they simultaneously admitted that marketing and the state of documentation might also play into a repos popularity.

They nevertheless went on recommending to lay out the five elements mentioned and link to external resources, which always is a good idea. Additional tips for publishing research code can be found in the projects GitHub repository or the report on NeurIPS reproducibility program.

More:
Nothing to hide? Then add these to your ML repo, Papers with Code says DEVCLASS - DevClass

Posted in Machine Learning | Comments Off on Nothing to hide? Then add these to your ML repo, Papers with Code says DEVCLASS – DevClass

Covid-19 Detection With Images Analysis And Machine Learning – Elemental

/* we have just two outputs positive and negative according to our directories */ int outputNum = 2;int numEpochs = 1;

/*This class downloadData() downloads the datastores the data in java's tmpdir 15MB download compressedIt will take 158MB of space when uncompressedThe data can be downloaded manually here

// Define the File PathsFile trainData = new File(DATA_PATH + "/covid-19/training");File testData = new File(DATA_PATH + "/covid-19/testing");

// Define the FileSplit(PATH, ALLOWED FORMATS,random)FileSplit train = new FileSplit(trainData, NativeImageLoader.ALLOWED_FORMATS, randNumGen);FileSplit test = new FileSplit(testData, NativeImageLoader.ALLOWED_FORMATS, randNumGen);

// Extract the parent path as the image labelParentPathLabelGenerator labelMaker = new ParentPathLabelGenerator();

ImageRecordReader recordReader = new ImageRecordReader(height, width, channels, labelMaker);

// Initialize the record reader// add a listener, to extract the namerecordReader.initialize(train);//recordReader.setListeners(new LogRecordListener());

// DataSet IteratorDataSetIterator dataIter = new RecordReaderDataSetIterator(recordReader, batchSize, 1, outputNum);

// Scale pixel values to 0-1DataNormalization scaler = new ImagePreProcessingScaler(0, 1);scaler.fit(dataIter);dataIter.setPreProcessor(scaler);

// Build Our Neural Networklog.info("BUILD MODEL");MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder().seed(rngseed).optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).updater(new Nesterovs(0.006, 0.9)).l2(1e-4).list().layer(0, new DenseLayer.Builder().nIn(height * width).nOut(100).activation(Activation.RELU).weightInit(WeightInit.XAVIER).build()).layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD).nIn(100).nOut(outputNum).activation(Activation.SOFTMAX).weightInit(WeightInit.XAVIER).build()).setInputType(InputType.convolutional(height, width, channels)).build();

MultiLayerNetwork model = new MultiLayerNetwork(conf);

// The Score iteration Listener will log// output to show how well the network is trainingmodel.setListeners(new ScoreIterationListener(10));

log.info("TRAIN MODEL");for (int i = 0; i < numEpochs; i++) {model.fit(dataIter);}

log.info("EVALUATE MODEL");recordReader.reset();

// The model trained on the training dataset split// now that it has trained we evaluate against the// test data of images the network has not seen

recordReader.initialize(test);DataSetIterator testIter = new RecordReaderDataSetIterator(recordReader, batchSize, 1, outputNum);scaler.fit(testIter);testIter.setPreProcessor(scaler);

/*log the order of the labels for later useIn previous versions the label order was consistent, but randomIn current verions label order is lexicographicpreserving the RecordReader Labels order is nolonger needed left in for demonstrationpurposes*/log.info(recordReader.getLabels().toString());

// Create Eval object with 10 possible classesEvaluation eval = new Evaluation(outputNum);

// Evaluate the networkwhile (testIter.hasNext()) {DataSet next = testIter.next();INDArray output = model.output(next.getFeatures());// Compare the Feature Matrix from the model// with the labels from the RecordReadereval.eval(next.getLabels(), output);}// show the evaluationlog.info(eval.stats());}

More here:
Covid-19 Detection With Images Analysis And Machine Learning - Elemental

Posted in Machine Learning | Comments Off on Covid-19 Detection With Images Analysis And Machine Learning – Elemental

Department Of Energy Announces $30 Million For Advanced AI & ML-Based Researches – Analytics India Magazine

The Department of Energy in the US has recently announced its initiative to provide up to $30 million for advanced research in artificial intelligence and machine learning. This fund can be used for both scientific investigation and the management of complex systems.

This initiative comprises two-fold strategies.

Firstly, focusing on the development of artificial intelligence and machine learning for predictive modelling and simulation focused on research across the physical sciences. The technologies ML and AI are considered to offer promising new alternatives to conventional programming methods for computer modelling and simulation. And, secondly, this fund will be utilised on essential ML and AI research for decision support in addressing complex systems.

Eventually, the potential applications could include cybersecurity, power grid resilience, and other complex processes where these emerging technologies can make or aid in creating business decisions in real-time.

When asked, Under Secretary for Science Paul Dabbar stated that both these technologies artificial intelligence and machine learning are among the most powerful tools we have today for both advancing scientific knowledge and managing our increasingly complex technological environment.

He further said, This foundational research will help keep the United States in the forefront as applications for ML and AI rapidly expand, and as we utilise this evolving technology to solve the worlds toughest challenges such as COVID-19.

The applications for this initiative will be open to DOE national laboratories, universities, nonprofits, and industry, and according to the peer review, the funding will be awarded.

According to DOE, the planned funding for the scientific machine learning for modelling and simulations topic will be up to $10 million in FY 2020 dollars for projects of two years in duration. On the other hand, the planned funding for the artificial intelligence and decision support for complex systems topic will be up to $20 million, with up to $7 million in FY 2020 dollars and out-year funding contingent on congressional appropriations.

comments

More here:
Department Of Energy Announces $30 Million For Advanced AI & ML-Based Researches - Analytics India Magazine

Posted in Machine Learning | Comments Off on Department Of Energy Announces $30 Million For Advanced AI & ML-Based Researches – Analytics India Magazine

How Microsoft Teams will use AI to filter out typing, barking, and other noise from video calls – VentureBeat

Last month, Microsoft announced that Teams, its competitor to Slack, Facebooks Workplace, and Googles Hangouts Chat, had passed 44 million daily active users. The milestone overshadowed its unveiling of a few new features coming later this year. Most were straightforward: a hand-raising feature to indicate you have something to say, offline and low-bandwidth support to read chat messages and write responses even if you have poor or no internet connection, and an option to pop chats out into a separate window. But one feature, real-time noise suppression, stood out Microsoft demoed how the AI minimized distracting background noise during a call.

Weve all been there. How many times have you asked someone to mute themselves or to relocate from a noisy area? Real-time noise suppression will filter out someone typing on their keyboard while in a meeting, the rustling of a bag of chips (as you can see in the video above), and a vacuum cleaner running in the background. AI will remove the background noise in real time so you can hear only speech on the call. But how exactly does it work? We talked to Robert Aichner, Microsoft Teams group program manager, to find out.

The use of collaboration and video conferencing tools is exploding as the coronavirus crisis forces millions to learn and work from home. Microsoft is pushing Teams as the solution for businesses and consumers as part of its Microsoft 365 subscription suite. The company is leaning on its machine learning expertise to ensure AI features are one of its big differentiators. When it finally arrives, real-time background noise suppression will be a boon for businesses and households full of distracting noises. Additionally, how Microsoft built the feature is also instructive to other companies tapping machine learning.

Of course, noise suppression has existed in the Microsoft Teams, Skype, and Skype for Business apps for years. Other communication tools and video conferencing apps have some form of noise suppression as well. But that noise suppression covers stationary noise, such as a computer fan or air conditioner running in the background. The traditional noise suppression method is to look for speech pauses, estimate the baseline of noise, assume that the continuous background noise doesnt change over time, and filter it out.

Going forward, Microsoft Teams will suppress non-stationary noises like a dog barking or somebody shutting a door. That is not stationary, Aichner explained. You cannot estimate that in speech pauses. What machine learning now allows you to do is to create this big training set, with a lot of representative noises.

In fact, Microsoft open-sourced its training set earlier this year on GitHub to advance the research community in that field. While the first version is publicly available, Microsoft is actively working on extending the data sets. A company spokesperson confirmed that as part of the real-time noise suppression feature, certain categories of noises in the data sets will not be filtered out on calls, including musical instruments, laughter, and singing. (More on that here: ProBeat: Microsoft Teams video calls and the ethics of invisible AI.)

Microsoft cant simply isolate the sound of human voices because other noises also happen at the same frequencies. On a spectrogram of speech signal, unwanted noise appears in the gaps between speech and overlapping with the speech. Its thus next to impossible to filter out the noise if your speech and noise overlap, you cant distinguish the two. Instead, you need to train a neural network beforehand on what noise looks like and speech looks like.

To get his points across, Aichner compared machine learning models for noise suppression to machine learning models for speech recognition. For speech recognition, you need to record a large corpus of users talking into the microphone and then have humans label that speech data by writing down what was said. Instead of mapping microphone input to written words, in noise suppression youre trying to get from noisy speech to clean speech.

We train a model to understand the difference between noise and speech, and then the model is trying to just keep the speech, Aichner said. We have training data sets. We took thousands of diverse speakers and more than 100 noise types. And then what we do is we mix the clean speech without noise with the noise. So we simulate a microphone signal. And then you also give the model the clean speech as the ground truth. So youre asking the model, From this noisy data, please extract this clean signal, and this is how it should look like. Thats how you train neural networks [in] supervised learning, where you basically have some ground truth.

For speech recognition, the ground truth is what was said into the microphone. For real-time noise suppression, the ground truth is the speech without noise. By feeding a large enough data set in this case hundreds of hours of data Microsoft can effectively train its model. Its able to generalize and reduce the noise with my voice even though my voice wasnt part of the training data, Aichner said. In real time, when I speak, there is noise that the model would be able to extract the clean speech [from] and just send that to the remote person.

Comparing the functionality to speech recognition makes noise suppression sound much more achievable, even though its happening in real time. So why has it not been done before? Can Microsofts competitors quickly recreate it? Aichner listed challenges for building real-time noise suppression, including finding representative data sets, building and shrinking the model, and leveraging machine learning expertise.

We already touched on the first challenge: representative data sets. The team spent a lot of time figuring out how to produce sound files that exemplify what happens on a typical call.

They used audio books for representing male and female voices, since speech characteristics do differ between male and female voices. They used YouTube data sets with labeled data that specify that a recording includes, say, typing and music. Aichners team then combined the speech data and noises data using a synthesizer script at different signal to noise ratios. By amplifying the noise, they could imitate different realistic situations that can happen on a call.

But audiobooks are drastically different than conference calls. Would that not affect the model, and thus the noise suppression?

That is a good point, Aichner conceded. Our team did make some recordings as well to make sure that we are not just training on synthetic data we generate ourselves, but that it also works on actual data. But its definitely harder to get those real recordings.

Aichners team is not allowed to look at any customer data. Additionally, Microsoft has strict privacy guidelines internally. I cant just simply say, Now I record every meeting.'

So the team couldnt use Microsoft Teams calls. Even if they could say, if some Microsoft employees opted-in to have their meetings recorded someone would still have to mark down when exactly distracting noises occurred.

And so thats why we right now have some smaller-scale effort of making sure that we collect some of these real recordings with a variety of devices and speakers and so on, said Aichner. What we then do is we make that part of the test set. So we have a test set which we believe is even more representative of real meetings. And then, we see if we use a certain training set, how well does that do on the test set? So ideally yes, I would love to have a training set, which is all Teams recordings and have all types of noises people are listening to. Its just that I cant easily get the same number of the same volume of data that I can by grabbing some other open source data set.

I pushed the point once more: How would an opt-in program to record Microsoft employees using Teams impact the feature?

You could argue that it gets better, Aichner said. If you have more representative data, it could get even better. So I think thats a good idea to potentially in the future see if we can improve even further. But I think what we are seeing so far is even with just taking public data, it works really well.

The next challenge is to figure out how to build the neural network, what the model architecture should be, and iterate. The machine learning model went through a lot of tuning. That required a lot of compute. Aichners team was of course relying on Azure, using many GPUs. Even with all that compute, however, training a large model with a large data set could take multiple days.

A lot of the machine learning happens in the cloud, Aichner said. So, for speech recognition for example, you speak into the microphone, thats sent to the cloud. The cloud has huge compute, and then you run these large models to recognize your speech. For us, since its real-time communication, I need to process every frame. Lets say its 10 or 20 millisecond frames. I need to now process that within that time, so that I can send that immediately to you. I cant send it to the cloud, wait for some noise suppression, and send it back.

For speech recognition, leveraging the cloud may make sense. For real-time noise suppression, its a nonstarter. Once you have the machine learning model, you then have to shrink it to fit on the client. You need to be able to run it on a typical phone or computer. A machine learning model only for people with high-end machines is useless.

Theres another reason why the machine learning model should live on the edge rather than the cloud. Microsoft wants to limit server use. Sometimes, there isnt even a server in the equation to begin with. For one-to-one calls in Microsoft Teams, the call setup goes through a server, but the actual audio and video signal packets are sent directly between the two participants. For group calls or scheduled meetings, there is a server in the picture, but Microsoft minimizes the load on that server. Doing a lot of server processing for each call increases costs, and every additional network hop adds latency. Its more efficient from a cost and latency perspective to do the processing on the edge.

You want to make sure that you push as much of the compute to the endpoint of the user because there isnt really any cost involved in that. You already have your laptop or your PC or your mobile phone, so now lets do some additional processing. As long as youre not overloading the CPU, that should be fine, Aichner said.

I pointed out there is a cost, especially on devices that arent plugged in: battery life. Yeah, battery life, we are obviously paying attention to that too, he said. We dont want you now to have much lower battery life just because we added some noise suppression. Thats definitely another requirement we have when we are shipping. We need to make sure that we are not regressing there.

Its not just regression that the team has to consider, but progression in the future as well. Because were talking about a machine learning model, the work never ends.

We are trying to build something which is flexible in the future because we are not going to stop investing in noise suppression after we release the first feature, Aichner said. We want to make it better and better. Maybe for some noise tests we are not doing as good as we should. We definitely want to have the ability to improve that. The Teams client will be able to download new models and improve the quality over time whenever we think we have something better.

The model itself will clock in at a few megabytes, but it wont affect the size of the client itself. He said, Thats also another requirement we have. When users download the app on the phone or on the desktop or laptop, you want to minimize the download size. You want to help the people get going as fast as possible.

Adding megabytes to that download just for some model isnt going to fly, Aichner said. After you install Microsoft Teams, later in the background it will download that model. Thats what also allows us to be flexible in the future that we could do even more, have different models.

All the above requires one final component: talent.

You also need to have the machine learning expertise to know what you want to do with that data, Aichner said. Thats why we created this machine learning team in this intelligent communications group. You need experts to know what they should do with that data. What are the right models? Deep learning has a very broad meaning. There are many different types of models you can create. We have several centers around the world in Microsoft Research, and we have a lot of audio experts there too. We are working very closely with them because they have a lot of expertise in this deep learning space.

The data is open source and can be improved upon. A lot of compute is required, but any company can simply leverage a public cloud, including the leaders Amazon Web Services, Microsoft Azure, and Google Cloud. So if another company with a video chat tool had the right machine learners, could they pull this off?

The answer is probably yes, similar to how several companies are getting speech recognition, Aichner said. They have a speech recognizer where theres also lots of data involved. Theres also lots of expertise needed to build a model. So the large companies are doing that.

Aichner believes Microsoft still has a heavy advantage because of its scale. I think that the value is the data, he said. What we want to do in the future is like what you said, have a program where Microsoft employees can give us more than enough real Teams Calls so that we have an even better analysis of what our customers are really doing, what problems they are facing, and customize it more towards that.

Read the original post:
How Microsoft Teams will use AI to filter out typing, barking, and other noise from video calls - VentureBeat

Posted in Machine Learning | Comments Off on How Microsoft Teams will use AI to filter out typing, barking, and other noise from video calls – VentureBeat