3.1.5 Introduction to Optimization

Part 1: Introduction to Optimization

Optimization is about finding the “best” solution for a given problem under a set of constraints. Formally, an optimization problem consists of decision variables (quantities we can adjust), an objective function (a mathematical measure of performance to maximize or minimize), and possibly constraints (restrictions on allowable solutions). The goal is to choose the decision variables so that the objective function is optimized (maximized or minimized) while satisfying all constraints. In other words, one seeks the optimal solution from all feasible solutions that meet the problem’s requirements. For example, training a deep neural network to minimize a loss function is an optimization problem (often non-convex), whereas a linear programming model to maximize profit under resource limits is another example of an optimization problem.

Optimal solutions and convexity: In optimization, a solution that yields the most extreme objective value (minimum for a minimization problem or maximum for a maximization problem) is called a global optimum. However, many complex problems have multiple local optima – solutions that are better than their immediate neighbors but not the best overall. A central concept in optimization theory is convexity. If the objective function and feasible region are convex, the problem is convex; this property guarantees that any local optimum is also a global optimum. Convex optimization problems (e.g. least-squares regression, linear programming) have nice properties that allow efficient algorithms to find the true optimum and avoid spurious local minima. On the other hand, non-convex problems (common in machine learning and combinatorial optimization) can have many local minima and are generally much harder – finding the global optimum in a non-convex problem is NP-hard in the worst case. This means algorithms may get “trapped” in a suboptimal solution.

Example of a convex function (left) vs. a non-convex function (right). The convex function has a single global minimum (red point), whereas the non-convex function exhibits two local minima (red) and one local maximum (orange).

Parts of an optimization formulation: To formulate a problem for optimization, one defines the decision variables, the objective function, and any constraints. For instance, consider a resource allocation task: the decision variables could be amounts of resources to allocate to each task, the objective function could be the total cost (to minimize) or total profit (to maximize), and constraints might include resource capacity limits or task requirements. The feasible region is the set of all variable assignments that satisfy the constraints. Optimization algorithms then search this feasible region for the optimal solution. In practice, if an exact optimum is too costly to find, one may seek a “good enough” solution within limited time. This trade-off leads to a rich variety of optimization methods, from exact solvers for simple structured problems to heuristic or approximate algorithms for complex ones.

Part 2: Core Optimization Techniques

Optimization techniques can be broadly categorized based on the problem structure (e.g. linear vs non-linear) and the method of search (e.g. gradient-based vs evolutionary). Here we introduce a few core techniques: linear programming as a fundamental optimization approach, gradient-based methods which are the workhorses of continuous optimization, and evolutionary algorithms which are powerful for global and heuristic search.

Linear Programming (LP)

Linear programming is a classical optimization technique where the objective function and all constraints are linear (affine) functions of the decision variables. A generic linear program can be written as: minimize (or maximize) c·x subject to A·x ≤ b (linear constraints) and possibly x ≥ 0. Despite the simplicity of linear expressions, LPs are extremely useful in practice for problems in logistics, manufacturing, finance, and more. One key property of LPs is that the feasible region is a convex polyhedron, and the objective is convex (or concave) linear, so the problem is convex – which guarantees a global optimum on that polyhedron.

Linear programs can be solved efficiently using well-developed algorithms. The Simplex algorithm, invented by George Dantzig, traverses the vertices of the feasible polyhedron to find the optimum. In practice Simplex is very fast for typical problems, though its worst-case complexity is exponential. Alternatively, interior-point methods treat the problem with a different approach and run in polynomial time. The bottom line is that even very large LP instances (with millions of variables and constraints) can often be solved to optimality with modern solvers. As a result, linear programming is a cornerstone of operations research. In fact, if an optimization problem can be formulated as a linear program (or transformed into one), we can usually solve it exactly. For example, finding the optimal production mix in a factory subject to resource constraints is a classic LP that can be solved to find the profit-maximizing plan. If a linear program has an optimal solution, these algorithms will find it efficiently.

Key point: Linear programming is powerful but limited to problems with linear structure. Many real-world problems are nonlinear or involve discrete choices (integers), which require more general techniques. However, understanding LP is a foundation – even nonlinear solvers often linearize or approximate problems as a series of linear programs.

Gradient-Based Optimization (First-Order Methods)

For continuous optimization problems with differentiable objective functions, gradient-based methods are the prevalent choice. These algorithms use information from the gradient (first derivative) of the objective to iteratively improve the solution. The prototypical example is gradient descent (for minimization problems). Gradient descent starts from an initial guess and repeatedly updates the variables in the direction of steepest descent (the negative gradient) to reduce the objective. Formally, one update step is:

$x_{new} = x_{current} - \eta \nabla f(x_{current}),$

where $\nabla f(x)$ is the gradient vector and $\eta$ is a small positive step size (learning rate). By moving opposite to the gradient, the algorithm decreases the function value. Gradient descent is a first-order, iterative optimization algorithm used to minimize a cost function. Over iterations, $f(x)$ gets closer to a (local) minimum. The process stops when changes become small or a maximum number of iterations is reached.

Gradient-based methods are efficient for high-dimensional problems and form the backbone of training algorithms in machine learning. However, they require the objective to be differentiable (or at least sub-differentiable) and are local search methods – they find a local minimum which may or may not be global, especially if the problem is non-convex. Variants like Stochastic Gradient Descent (SGD) sample a subset of data at each step (crucial for large-scale machine learning), and momentum methods or Adam optimizer introduce momentum and adaptive step sizes to improve convergence. There are also second-order methods (like Newton’s method) that use second derivatives (Hessian) to get more precise search directions, converging in fewer steps for well-behaved problems; but these are more computationally expensive per iteration and less used when the dimension is very large.

In summary, gradient-based optimization turns calculus into a practical tool for finding minima. It is widely used whenever we can compute gradients – for example, to adjust thousands or millions of parameters in a neural network to minimize prediction error. The method’s simplicity and efficiency make it a core technique, but one must be mindful of local minima and choose appropriate settings (learning rate, etc.) for it to work well.

Evolutionary Algorithms (Global Heuristic Methods)

Not all optimization problems have nice gradients or convex structure. For complex landscapes (particularly with many local optima, non-differentiable objectives, or combinatorial search spaces), evolutionary algorithms offer a powerful alternative. These are a class of metaheuristic optimization methods inspired by natural evolution. The idea is to maintain a population of candidate solutions that evolves over iterations (generations) to produce ever-better solutions. At each iteration, individuals in the population are evaluated for “fitness” (quality according to the objective), and the algorithm uses randomized processes analogous to biological evolution – selection, crossover, and mutation – to create a new generation of candidates.

A classic example is the Genetic Algorithm (GA). A genetic algorithm is a search heuristic that mimics the process of natural selection. It’s used to find optimal or near-optimal solutions by iteratively improving a set of candidate solutions according to the rules of evolution and natural genetics. In a GA, each candidate solution is encoded (for instance, as a binary string, analogous to a chromosome). The algorithm starts with an initial population (often random). Then it repeats the following steps:

Selection: Preferentially pick the fitter solutions in the current population to be parents.
Crossover (Recombination): Combine parts of two parent solutions to create offspring solutions (mixing their “genes”). This explores new points in the search space by blending existing good solutions.
Mutation: Randomly tweak some aspects of solutions (e.g. flip some bits in the string) to maintain genetic diversity and occasionally try entirely new solutions.
Replacement: Form a new population from some of the offspring (and sometimes some best parents) and discard others, thereby moving to the next generation.

Over many generations, good traits propagate and the population “evolves” towards better regions of the solution space. Evolutionary algorithms are stochastic (they incorporate randomness) and typically population-based, which helps them escape local optima – even if some candidates get stuck, others might find a path to a better region. This makes them useful for highly non-convex or discontinuous problems where gradient methods falter.

Beyond genetic algorithms, there are related evolutionary techniques like Evolution Strategies, Genetic Programming, and other bio-inspired heuristics like Particle Swarm Optimization and Ant Colony Optimization. All share the theme of iteratively improving a pool of solutions via some guided random search. They do not guarantee a true global optimum, but they often find very good solutions given enough time.

It’s worth noting that evolutionary algorithms can be computationally intensive – they might evaluate thousands or millions of candidate solutions. In practice, they are used when no faster, problem-specific method is available, or as a component of hybrid approaches. For example, an evolutionary algorithm might provide a good initial solution which is then fine-tuned by a gradient-based method (in a continuous problem), combining global search with local refinement.

Summary of techniques: In an optimization toolkit, linear programming solvers handle structured linear problems efficiently, gradient-based algorithms handle large smooth optimization problems (especially in machine learning), and evolutionary or other heuristics handle messy, multi-modal problems. Expert practitioners often try to exploit problem structure to choose the most effective method – or even reformulate problems to fit into solvable categories.

Part 3: Applications in Machine Learning and AI

Modern machine learning (ML) and artificial intelligence (AI) are fundamentally built on optimization techniques. Many ML problems are formulated as optimization of some cost or reward function. We highlight three focus areas where optimization is crucial: training deep neural networks, tuning hyperparameters, and reinforcement learning.

Optimizing Deep Learning (Training Neural Networks)

Deep learning involves neural networks with many layers of parameters (weights and biases) that need to be learned from data. Training a neural network is essentially an optimization problem: we define a loss function (also called a cost function) that measures the prediction error on training data, and then we minimize this loss with respect to the network’s parameters. This is typically done by gradient-based optimization (backpropagation computes gradients of the loss w.r.t. every parameter, and then an optimizer like Stochastic Gradient Descent or Adam updates the parameters iteratively).

This optimization is challenging because neural network loss surfaces are generally non-convex – they can have many local minima and saddle points. Surprisingly, in practice, gradient-based methods often find solutions that are sufficiently good (sometimes even global-like minima in terms of performance) despite the non-convexity. The choice of optimization algorithm (SGD vs Adam, etc.), learning rate schedule, and techniques like momentum or batch normalization greatly affect training efficiency and outcomes, making optimization a central research area in deep learning.

For example, consider training a convolutional neural network on images: the objective might be the cross-entropy loss between the network’s predictions and the true labels. The optimizer will adjust millions of weight parameters to minimize this loss. Each iteration, it nudges the weights in the direction that most reduces error on the current batch of images. Over many iterations (epochs), the network converges to a parameter configuration that hopefully generalizes to new data. The entire field of deep learning owes much of its success to effective optimization algorithms that can handle high-dimensional, non-convex problems at scale. Indeed, training a deep model can be seen as traversing a complex error surface – essentially a high-dimensional optimization landscape – to find a set of weights that yields low error. Techniques like learning rate decay, adaptive optimizers, and regularization are all about steering this optimization process toward better minima (that also generalize well).

From a theoretical perspective, since the loss is non-convex, we only know we’ve reached a local minimum when training converges. But empirically, local minima in large neural networks can be very good, and the difference between local and global minima may not matter as much as once feared. There is ongoing research into why first-order methods find good minima in deep learning, relating to properties of high-dimensional spaces and connectivity of minima. Regardless, the practical recipe remains: define a suitable loss function and use a robust optimization routine to minimize it on your training data.

Hyperparameter Tuning as Optimization

Hyperparameters are the settings of the learning process itself (not the model’s internal parameters, but external knobs). Examples of hyperparameters include the learning rate in training, network architecture choices (number of layers or units), regularization strength, or parameters of a model like the depth of a decision tree. Selecting good hyperparameters can make a huge difference in model performance, but these hyperparameters are often not amenable to direct gradient-based tuning because the relationship between hyperparameters and final model performance is complex and noisy. Thus, hyperparameter tuning is often posed as a black-box optimization problem: we have some function (the model’s validation accuracy or loss) that we want to maximize (accuracy) or minimize (loss) by varying the hyperparameters.

Simple methods for hyperparameter optimization include grid search (trying combinations in a fixed grid) and random search. Random search is surprisingly effective and often better than grid search when only a few hyperparameters really matter, since it explores more combinations overall. However, both can be inefficient if each model training is expensive. More advanced methods treat it as a Bayesian optimization problem: we build a surrogate model (like a Gaussian Process or Tree Parzen Estimator) that predicts performance from hyperparameters, and use it to decide which hyperparameter setting to try next. This approach attempts to find good hyperparameters in fewer trials by balancing exploration of new regions and exploitation of known good regions.

At the industry scale, hyperparameter tuning is massively important. For instance, Google has built an internal service called Vizier for black-box hyperparameter optimization. Vizier is used across Google to optimize hyperparameters of machine learning models, both for research and production models. Our implementation scales to service the entire hyperparameter tuning workload across Alphabet. Such a service can execute and coordinate millions of training trials, intelligently searching the hyperparameter space. Google’s hyperparameter tuning infrastructure has enabled engineers to automatically find better model configurations that improve products used by billions of people. In other words, rather than hand-tuning hyperparameters, engineers can rely on optimization algorithms to do it. This is essentially applying optimization at a higher level: the algorithm might use techniques like Gaussian process bandits, evolutionary search, or even reinforcement learning to decide which set of hyperparameters to try next, observe the result, and iterate.

Real-world example: In the development of a machine translation model, one might need to tune hyperparameters like the embedding dimension, learning rate, dropout rate, etc. Using a service like Vizier or open-source alternatives, you define a range or distribution for each hyperparameter and let the service suggest new trials. Over dozens or hundreds of trials, it might discover a combination that yields significantly higher translation accuracy. This automated optimization not only saves manual effort but can sometimes find non-intuitive configurations that a human would overlook.

In summary, hyperparameter tuning turns the trial-and-error of model selection into a principled optimization problem. As models and datasets grow, this becomes indispensable – companies like Google, Facebook, and Amazon have invested heavily in such AutoML and tuning services to squeeze the best performance out of their AI models.

Optimization in Reinforcement Learning (RL)

Reinforcement learning is a branch of AI where an agent learns to make decisions by interacting with an environment, aiming to maximize cumulative reward. At its core, reinforcement learning is also an optimization problem – albeit a somewhat different one. The objective is to find a policy (a mapping from states to actions) that maximizes the expected total reward the agent receives over time. This can be framed as solving a Markov Decision Process (MDP) for an optimal policy.

There are several approaches in RL, but many revolve around optimizing some objective: for example, policy gradient methods directly treat the policy’s performance as an objective function and use gradient ascent to improve the policy parameters. In such cases, one defines $J(\theta)$ as the expected reward of the policy with parameters $\theta$, and then computes $\nabla_\theta J(\theta)$ (often using the REINFORCE algorithm or variants) to update $\theta$ in the direction that increases expected reward. This is effectively doing stochastic gradient ascent on a very noisy objective (because reward outcomes are stochastic). Other methods like Q-learning or value iteration implicitly perform optimization by iteratively improving value function estimates toward the optimal Bellman equation solution.

A landmark example of optimization in RL is DeepMind’s AlphaGo system. AlphaGo was trained to play the game of Go at a superhuman level by combining deep learning and reinforcement learning. Specifically, AlphaGo employed two deep neural networks: a policy network that outputs move probabilities and a value network that predicts the winner from a given board state. Training AlphaGo was a multi-stage optimization process. First, the policy network was initialized via supervised learning on recorded expert human games (this is an optimization of a supervised loss – a cross-entropy to mimic expert moves). Then, that network was refined through reinforcement learning by playing games against older versions of itself and applying policy-gradient updates (to maximize win rate). Meanwhile, the value network was trained to minimize error in predicting game outcomes (another supervised regression optimization). These networks were then embedded in a Monte Carlo Tree Search procedure to effectively explore moves during play. The deep neural networks were trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. All those training phases required solving tough optimization problems (non-convex neural network training, essentially). The end result was a policy and value function that, when optimized together with tree search, produced a winning strategy. AlphaGo’s success illustrated how far optimization techniques (gradient descent, etc.) could be pushed – they optimized millions of parameters using millions of self-play games as training data, which was computationally intensive but feasible with large-scale infrastructure.

More generally, many RL breakthroughs (AlphaGo, AlphaZero, OpenAI Five for Dota, etc.) are enabled by powerful optimization: they turn gameplay experience into improvements in policy via stochastic gradient descent or evolutionary strategies. Some recent approaches even use evolutionary algorithms instead of gradient-based updates for RL policies (treating policy weights as individuals in a population). There is also the concept of reward shaping – crafting the reward function (objective) to make the optimization easier, which again ties back to understanding how the optimization landscape influences the learning process.

In summary, reinforcement learning solves an optimization problem – maximize reward – but does so in an interactive, data-driven way. It combines ideas from dynamic programming and numerical optimization. Without optimization algorithms under the hood (like policy gradients or Q-value iteration), an RL agent would not improve. With optimization, an agent can start from essentially random behavior and, through iterative improvement, end up with a highly tuned strategy that appears intelligent.

Part 4: Industrial Case Studies of Optimization

To cement these concepts, let’s look at a few high-impact use cases in industry from recent years, where optimization techniques played a pivotal role. We will revisit: Google’s hyperparameter tuning at scale, DeepMind’s AlphaGo training, and Tesla’s energy optimization software. Each showcases how theoretical ideas turn into practical systems solving real problems.

Google – Hyperparameter Tuning at Scale

As mentioned earlier, Google uses a system called Vizier for hyperparameter optimization. This system has been deployed extensively to improve models across Google’s products. One published example noted that Vizier was used to tune a machine translation model with millions of trials – something impossible to do manually. By leveraging Bayesian optimization and parallel experimentation, Google’s researchers and engineers can discover better model configurations faster. This has led to improvements in things like Google’s language translation quality, recommendation systems, and computer vision models, directly translating to a better experience for users.

Google has also integrated this optimization capability into cloud services (Google Cloud’s HyperTune), allowing external developers to use similar hyperparameter tuning technology. The impact of treating hyperparameter tuning as an optimization problem is clear: it automates a part of machine learning that used to rely on intuition or luck. According to the Vizier paper, automating hyperparameter search not only saved time but also uncovered novel combinations that yielded state-of-the-art results in research settings. In production, it meant services like search ranking or ad placement could be optimized more finely by tuning the underlying model parameters via large-scale experiments.

In essence, Google turned the art of model tuning into a science of optimization. This is emblematic of a broader trend in AutoML (automated machine learning), where methods like Vizier, Google’s AutoML, or open-source Optuna and Hyperopt aim to remove the human bottleneck in model improvement by systematically optimizing over choices.

DeepMind AlphaGo – Optimization in Training AI Agents

DeepMind’s AlphaGo project provides a vivid case study of multiple optimization layers working together. As described, the training of AlphaGo’s neural networks was one huge optimization endeavor: millions of gradient descent updates to minimize loss or maximize reward. But even beyond the training phase, consider how AlphaGo plays Go. The Monte Carlo Tree Search (MCTS) it uses to decide moves is itself an optimization process – it’s effectively doing an optimal planning in the game’s state space, guided by the value and policy networks. MCTS optimizes the selection of moves by balancing exploration of new moves and exploitation of known good moves (through a formula like Upper Confidence Bound applied to trees). This can be seen as optimizing the action selection at each turn of the game, albeit via search rather than gradient descent.

AlphaGo’s success in 2016 (defeating the world champion Lee Sedol) was a breakthrough made possible by computation and optimization power. It demonstrated that given enough training data (self-play games) and efficient optimization algorithms, a machine could master a very complex domain. Subsequently, AlphaGo Zero went even further by removing the human game data initialization – it learned entirely from scratch via self-play, which posed an even harder optimization problem (starting from random behavior). Yet, by the power of reinforcement learning optimization, it reached superhuman play, which suggests that the optimization landscape of games like Go, while enormous, could be navigated by clever algorithms given enough iterations.

In practical terms, AlphaGo’s methods have been transferred to other domains: DeepMind applied similar reinforcement learning optimization to problems like protein folding (AlphaFold uses optimization to fit models to biochemical data) and to controlling plasma in nuclear fusion reactors. Each time, the core idea is the same – use optimization algorithms to gradually improve a solution, whether it’s the weights of a network or the sequence of actions in a task, based on a defined objective.

AlphaGo also highlighted the importance of simulation optimization – using simulated experience to optimize behavior. This is now common in robotics and autonomous driving: companies train AI agents in simulated environments (which is essentially running an optimization loop in a safe virtual setting) before deploying them in the real world.

Tesla – Energy Optimization with AI

Tesla is known not just for electric vehicles but also for energy storage and management solutions. One interesting application of optimization is in Tesla’s energy products like the Powerwall (home battery) and large battery installations. Tesla developed an AI-driven software called Opticaster for optimizing the operation of these distributed energy resources. Opticaster’s goal is to automatically manage when to charge or discharge batteries, when to draw power from the grid or solar panels, etc., to achieve objectives like minimizing electricity bills, maximizing use of solar power, and maintaining backup power. This is a complex optimization problem involving forecasts (of energy usage, solar generation, electricity prices) and constraints (battery capacity, customer needs).

According to a Tesla Energy product manager, Opticaster... has been shipped to thousands of homes and businesses across the globe, minimizing customers’ utility bill expenses, increasing renewable consumption, and keeping the lights on when off-grid. In effect, Tesla is solving a real-time optimization problem for each customer: at any given time, decide the optimal power flow (from solar, battery, or grid) to meet demand at minimal cost and maximal reliability. This involves algorithms that likely use predictive optimization (like model predictive control, a form of repeated optimization that looks ahead to future conditions) and possibly machine learning forecasts plugged into optimization models.

For example, if electricity rates are lower at midday and higher in the evening, the software might charge the battery when rates are low and discharge when rates are high, thereby cutting costs. If a storm is expected (risking an outage), it might prioritize keeping the battery charged (optimizing for reliability). These decisions require weighing multiple objectives and constraints – a natural fit for optimization algorithms. Tesla’s system probably uses a mix of linear programming (or convex optimization for energy scheduling) and heuristic rules, all informed by AI predictions of usage patterns. The “intelligence” here is essentially optimization: the software continuously solves for the best action given the current state and predictions.

This kind of AI-powered optimization is increasingly common in energy systems (often called smart grid optimization or energy management systems). Utilities and companies use optimization to balance supply and demand, integrate renewables, and operate more efficiently. Tesla’s use case is just one high-profile example where operational decisions are automated through algorithms. The results include cost savings for consumers and better utilization of renewable energy – outcomes that are environmentally and economically beneficial.

These case studies underline how optimization is not just a theoretical exercise but a driving force behind cutting-edge technology. Google’s hyperparameter tuning shows optimization improving AI development itself; DeepMind’s AlphaGo shows optimization enabling AI to surpass human abilities in a complex task; Tesla’s Opticaster shows optimization making an impact on energy sustainability and cost savings. In each case, formulating the right objective and applying suitable algorithms led to transformative results.

Conclusion

Optimization is a unifying theme across science and engineering – whenever we have a goal and choices in how to achieve it, we can cast it as an optimization problem. In this introduction, we covered the foundations of optimization theory (objective functions, constraints, convexity), surveyed core techniques (linear programming for structured problems, gradient descent for continuous optimization, evolutionary algorithms for global search), and explored how these ideas are applied in practice (from deep learning to industrial AI systems).

For beginners, the key takeaway is that optimization provides a framework and tools to make the best possible decisions given a mathematical model. For semi-experts, it’s clear that choosing the right optimization method for the problem at hand is critical – and often the art of modeling is making your problem amenable to powerful solvers. As technology advances, optimization problems grow in scale and complexity, but so do the algorithms and computing power available to tackle them.

In future explorations, one can delve deeper into topics like duality and Karush–Kuhn–Tucker (KKT) conditions in optimization theory (fundamental for understanding constrained optimization optimality), advanced algorithms like interior-point methods and sequential quadratic programming, or emerging areas like optimization for quantum computing. The field of optimization continues to evolve, but its core promise remains: by mathematically defining what “best” means and leveraging computational methods, we can systematically find better solutions to the hardest problems.

References

W. Shi (2020) – “Optimization Stories: KKT Conditions.” Towards Data Science, Medium, Mar. 8, 2020. Explains fundamental optimization concepts and optimality conditions, with examples.
C. K. Williams (2014) – “Convex Optimization Problems (Lecture Slides).” University of Edinburgh, 2014. Highlights properties of convex optimization, notably that any local optimum is global.
H. Phillips (2023) – “A Simple Introduction to Gradient Descent.” Medium, May 17, 2023. Introduction to gradient descent algorithm and its role in machine learning optimization.
GeeksforGeeks (2023) – “Introduction to Optimization with Genetic Algorithm.” GeeksforGeeks, 2023. Tutorial on evolutionary algorithms and genetic algorithms, describing their mechanics inspired by natural selection.
D. Golovin et al. (2017) – “Google Vizier: A Service for Black-Box Optimization.” Proceedings of KDD 2017. Describes Google’s hyperparameter tuning service used to optimize ML models at scale across Alphabet.
D. Silver et al. (2016) – “Mastering the game of Go with deep neural networks and tree search.” Nature, vol. 529, pp. 484–489, Jan. 2016. The AlphaGo paper detailing the combination of supervised and reinforcement learning to optimize Go-playing neural networks.
S. Zhang (2021) – “Lessons Learned as an AI Product Manager at Tesla Energy.” BatteryBits (Volta Foundation) on Medium, Mar. 6, 2021. Discusses Tesla’s Opticaster software for energy optimization in power grids and batteries, aiming at cost reduction and reliability.

이 블로그 검색

7 A.I. Workers for WLB