AI Writing Tools

Explore the best AI Writing Tools — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • Smart speaker industry in South Korea

    Smart speaker industry in South Korea

    Smart speakers, or AI speakers, have been developed by multiple domestic electronics and telecommunications firms in South Korea. Since their introduction to the local market in 2016, they have been used by millions of people in the country. == Brands == === Google === In September 2018, Google Home (including the Google Home Mini) launched in South Korea. Running Google Assistant, it featured simultaneous recognition of two languages among a total of seven, including Korean. At launch, it could play music from Bugs!, in addition to YouTube. === Kakao === In November 2017, Kakao launched the Kakao Mini, featuring integrated KakaoTalk functionality. === KT === KT launched the GiGA Genie smart speaker in January 2017, using a Harman Kardon speaker. In November 2017, KT announced GiGA Genie LTE, a portable AI speaker with LTE support. They also released a mini speaker called GiGA Genie Buddy. In 2018, KT created a special version of GiGa Genie with a screen for use in hotels. On 29 April 2019, KT announced the GiGA Genie Table TV, a consumer-oriented smart speaker with a display. It featured paid TV access through Wi-Fi. Based on usage data from the hotel model, KT decided not to add a touchscreen. The Table TV also featured a limited-access "personalized-text-to-speech technology" which could use parents' voice recording inputs to read children books. In February 2022, KT began rolling out Amazon Alexa integration into its speakers for English support. === Naver === In August 2017, Naver announced the Wave smart speaker, operating on Clova. In October 2017, Naver launched the Friends smart speaker, which were designed based on Line characters. ==== LG Uplus ==== In December 2017, LG Uplus launched the Friends+ speaker with Naver, operating on U+ Home AI. === Samsung === In August 2018, Samsung announced the Samsung Galaxy Home in partnership with Spotify. The original size was delayed, while the Galaxy Home Mini appeared briefly as a bonus for Samsung Galaxy S20 preorders in South Korea in February 2020. === SK Telecom === SK Telecom launched the Nugu smart speaker in September 2016, using an Astell & Kern audio system. In August 2017, SKT released a portable speaker named Nugu mini. In July 2018, SKT launched the Nugu Candle, featuring expanded mood lighting. The first-generation Nugu was subsequently discontinued. On 18 April 2019, SKT released the NUGU Nemo AI, which featured a display and JBL stereo speaker. In August 2019, SKT collaborated with SM Entertainment, incorporating functions related to the agency's artists into Nugu. In January 2022, SKT showcased the NUGU Candle SE, introducing Alexa support. == Usage == In 2018, approximately 3 million people in South Korea used smart speakers. According to data from KT in 2018, the most common commands to its speakers were for controlling televisions. Based on a broader survey in 2017, music was selected as the most frequent use case. By 2018, smart speaker companies were partnering with reading and other education services, adding potential use-cases for children. By 2022, smart speakers were being utilized by the South Korean government. SKT, in partnership with 70 regional governments, distributed smart speakers to 12,000 senior citizens living alone. The government paid for monthly subscriptions to help seniors stay mentally engaged. Naver made an agreement with the Seoul Metropolitan Government to provide Clova CareCall, an automated health checkup program to hundreds of senior citizens living alone. KT's AI care service included an emergency dispatch call function and medication notifications. == Criticism == === Communication === In a survey of 300 users in 2017, approximately half reported having some type of communication issue with their smart speakers. === Privacy === South Korean smart speakers sparked privacy concerns when they were found to be collecting and documenting user audio data in 2019. The speaker companies responded that only a minority of data was collected and that it was anonymized. They stated that such recordings were collected for performance improvements.

    Read more →
  • Neural operators

    Neural operators

    Neural operators are a class of deep learning architectures designed to learn maps between infinite-dimensional function spaces. Neural operators represent an extension of traditional artificial neural networks, marking a departure from the typical focus on learning mappings between finite-dimensional Euclidean spaces or finite sets. Neural operators directly learn operators between function spaces; they can receive input functions, and the output function can be evaluated at any discretization. The primary application of neural operators is in learning surrogate maps for the solution operators of partial differential equations (PDEs), which are critical tools in modeling the natural environment. Standard PDE solvers can be time-consuming and computationally intensive, especially for complex systems. Neural operators have demonstrated improved performance in solving PDEs compared to existing machine learning methodologies while being significantly faster than numerical solvers. Neural operators have also been applied to various scientific and engineering disciplines such as turbulent flow modeling, computational mechanics, graph-structured data, and the geosciences. In particular, they have been applied to learning stress-strain fields in materials, classifying complex data like spatial transcriptomics, predicting multiphase flow in porous media, and carbon dioxide migration simulations. Finally, the operator learning paradigm allows learning maps between function spaces, and is different from parallel ideas of learning maps from finite-dimensional spaces to function spaces, and subsumes these settings as special cases when limited to a fixed input resolution. == Operator learning == Understanding and mapping relationships between function spaces has many applications in engineering and the sciences. In particular, one can cast the problem of solving partial differential equations as identifying a map between function spaces, such as from an initial condition to a time-evolved state. In other PDEs this map takes an input coefficient function and outputs a solution function. Operator learning is a machine learning paradigm to learn solution operators mapping the input function to the output function . Using traditional machine learning methods, addressing this problem would involve discretizing the infinite-dimensional input and output function spaces into finite-dimensional grids and applying standard learning models, such as neural networks. This approach reduces the operator learning to finite-dimensional function learning and has some limitations, such as generalizing to discretizations beyond the grid used in training. The primary properties of neural operators that differentiate them from traditional neural networks is discretization invariance and discretization convergence. Unlike conventional neural networks, which are fixed on the discretization of training data, neural operators can adapt to various discretizations without re-training. This property improves the robustness and applicability of neural operators in different scenarios, providing consistent performance across different resolutions and grids. == Definition and formulation == Architecturally, neural operators are similar to feed-forward neural networks in the sense that they are composed of alternating linear maps and non-linearities. Since neural operators act on and output functions, neural operators have been instead formulated as a sequence of alternating linear integral operators on function spaces and point-wise non-linearities. Using an analogous architecture to finite-dimensional neural networks, similar universal approximation theorems have been proven for neural operators. In particular, it has been shown that neural operators can approximate any continuous operator on a compact set. Neural operators seek to approximate some operator G : A → U {\displaystyle {\mathcal {G}}:{\mathcal {A}}\to {\mathcal {U}}} between function spaces A {\displaystyle {\mathcal {A}}} and U {\displaystyle {\mathcal {U}}} by building a parametric map G ϕ : A → U {\displaystyle {\mathcal {G}}_{\phi }:{\mathcal {A}}\to {\mathcal {U}}} . Such parametric maps G ϕ {\displaystyle {\mathcal {G}}_{\phi }} can generally be defined in the form G ϕ := Q ∘ σ ( W T + K T + b T ) ∘ ⋯ ∘ σ ( W 1 + K 1 + b 1 ) ∘ P , {\displaystyle {\mathcal {G}}_{\phi }:={\mathcal {Q}}\circ \sigma (W_{T}+{\mathcal {K}}_{T}+b_{T})\circ \cdots \circ \sigma (W_{1}+{\mathcal {K}}_{1}+b_{1})\circ {\mathcal {P}},} where P , Q {\displaystyle {\mathcal {P}},{\mathcal {Q}}} are the lifting (lifting the codomain of the input function to a higher dimensional space) and projection (projecting the codomain of the intermediate function to the output dimension) operators, respectively. These operators act pointwise on functions and are typically parametrized as multilayer perceptrons. σ {\displaystyle \sigma } is a pointwise nonlinearity, such as a rectified linear unit (ReLU), or a Gaussian error linear unit (GeLU). Each layer t = 1 , … , T {\displaystyle t=1,\dots ,T} has a respective local operator W t {\displaystyle W_{t}} (usually parameterized by a pointwise neural network), a kernel integral operator K t {\displaystyle {\mathcal {K}}_{t}} , and a bias function b t {\displaystyle b_{t}} . Given some intermediate functional representation v t {\displaystyle v_{t}} with domain D {\displaystyle D} in the t {\displaystyle t} -th hidden layer, a kernel integral operator K ϕ {\displaystyle {\mathcal {K}}_{\phi }} is defined as ( K ϕ v t ) ( x ) := ∫ D κ ϕ ( x , y , v t ( x ) , v t ( y ) ) v t ( y ) d y , {\displaystyle ({\mathcal {K}}_{\phi }v_{t})(x):=\int _{D}\kappa _{\phi }(x,y,v_{t}(x),v_{t}(y))v_{t}(y)dy,} where the kernel κ ϕ {\displaystyle \kappa _{\phi }} is a learnable implicit neural network, parametrized by ϕ {\displaystyle \phi } . In practice, one is often given the input function to the neural operator at a specific resolution. For instance, consider the setting where one is given the evaluation of v t {\displaystyle v_{t}} at n {\displaystyle n} points { y j } j n {\displaystyle \{y_{j}\}_{j}^{n}} . Borrowing from Nyström integral approximation methods such as Riemann sum integration and Gaussian quadrature, the above integral operation can be computed as follows: ∫ D κ ϕ ( x , y , v t ( x ) , v t ( y ) ) v t ( y ) d y ≈ ∑ j n κ ϕ ( x , y j , v t ( x ) , v t ( y j ) ) v t ( y j ) Δ y j , {\displaystyle \int _{D}\kappa _{\phi }(x,y,v_{t}(x),v_{t}(y))v_{t}(y)dy\approx \sum _{j}^{n}\kappa _{\phi }(x,y_{j},v_{t}(x),v_{t}(y_{j}))v_{t}(y_{j})\Delta _{y_{j}},} where Δ y j {\displaystyle \Delta _{y_{j}}} is the sub-area volume or quadrature weight associated to the point y j {\displaystyle y_{j}} . Thus, a simplified layer can be computed as v t + 1 ( x ) ≈ σ ( ∑ j n κ ϕ ( x , y j , v t ( x ) , v t ( y j ) ) v t ( y j ) Δ y j + W t ( v t ( y j ) ) + b t ( x ) ) . {\displaystyle v_{t+1}(x)\approx \sigma \left(\sum _{j}^{n}\kappa _{\phi }(x,y_{j},v_{t}(x),v_{t}(y_{j}))v_{t}(y_{j})\Delta _{y_{j}}+W_{t}(v_{t}(y_{j}))+b_{t}(x)\right).} The above approximation, along with parametrizing κ ϕ {\displaystyle \kappa _{\phi }} as an implicit neural network, results in the graph neural operator (GNO). There have been various parameterizations of neural operators for different applications. These typically differ in their parameterization of κ {\displaystyle \kappa } . The most popular instantiation is the Fourier neural operator (FNO). FNO takes κ ϕ ( x , y , v t ( x ) , v t ( y ) ) := κ ϕ ( x − y ) {\displaystyle \kappa _{\phi }(x,y,v_{t}(x),v_{t}(y)):=\kappa _{\phi }(x-y)} and by applying the convolution theorem, arrives at the following parameterization of the kernel integral operator: ( K ϕ v t ) ( x ) = F − 1 ( R ϕ ⋅ ( F v t ) ) ( x ) , {\displaystyle ({\mathcal {K}}_{\phi }v_{t})(x)={\mathcal {F}}^{-1}(R_{\phi }\cdot ({\mathcal {F}}v_{t}))(x),} where F {\displaystyle {\mathcal {F}}} represents the Fourier transform and R ϕ {\displaystyle R_{\phi }} represents the Fourier transform of some periodic function κ ϕ {\displaystyle \kappa _{\phi }} . That is, FNO parameterizes the kernel integration directly in Fourier space, using a prescribed number of Fourier modes. When the grid at which the input function is presented is uniform, the Fourier transform can be approximated using the discrete Fourier transform (DFT) with frequencies below some specified threshold. The discrete Fourier transform can be computed using a fast Fourier transform (FFT) implementation. == Training == Training neural operators is similar to the training process for a traditional neural network. Neural operators are typically trained in some Lp norm or Sobolev norm. In particular, for a dataset { ( a i , u i ) } i = 1 N {\displaystyle \{(a_{i},u_{i})\}_{i=1}^{N}} of size N {\displaystyle N} , neural operators minimize (a discretization of) L U ( { ( a i , u i ) } i = 1 N ) := ∑ i = 1 N ‖ u i − G θ ( a i ) ‖ U 2 {\displaystyle {\mathcal {L}}_{\mathca

    Read more →
  • Corpus of Linguistic Acceptability

    Corpus of Linguistic Acceptability

    Corpus of Linguistic Acceptability (CoLA) is a dataset the primary purpose of which is to serve as a benchmark for evaluating the ability of artificial neural networks, including large language models, to judge the grammatical correctness of sentences. It consists of 10,657 English sentences from published linguistics literature that were manually labeled either as grammatical or ungrammatical. == Public version == The publicly available version of CoLA contains 9,594 sentences that belong to training and development sets. It excludes 1,063 sentences reserved for a held-out test set.

    Read more →
  • Artificial intelligence content detection

    Artificial intelligence content detection

    Artificial intelligence detection software aims to determine whether some content (text, image, video, or audio) was generated using artificial intelligence (AI). This software is often unreliable. == Accuracy issues == Many AI detection tools have been shown to be unreliable in detecting AI-generated text. In a 2023 study conducted by Weber-Wulff et al., researchers evaluated 14 detection tools including Turnitin and GPTZero and found that "all scored below 80% of accuracy and only 5 over 70%." They also found that these tools tend to have a bias for classifying texts more as human than as AI, and that accuracy of these tools worsens upon paraphrasing. === False positives === In AI content detection, a false positive is when human-written work is incorrectly flagged as AI-written. Many AI detection platforms claim to have a minimal level of false positives, with Turnitin claiming a less than 1% false positive rate. However, later research by The Washington Post produced much higher rates of 50%, though they used a smaller sample size. False positives in an academic setting frequently lead to accusations of academic misconduct, which can have serious consequences for a student's academic record. Additionally, studies have shown evidence that many AI detection models are prone to give false positives to work written by people whose first language is not English, and also to neurodivergent people. In June 2023, Janelle Shane wrote that portions of her book You Look Like a Thing and I Love You were flagged as AI-generated. === False negatives === A false negative is a failure to identify documents with AI-written text. False negatives often happen as a result of a detection software's sensitivity level or because evasive techniques were used when generating the work to make it sound more human. False negatives are less of a concern academically, since they aren't likely to lead to accusations and ramifications. Notably, Turnitin stated they have a 15% false negative rate. == Text detection == For text, this is usually done to prevent alleged plagiarism, often by detecting repetition of words as telltale signs that a text was AI-generated (including hallucinations). Detection systems may also rely on stylistic and structural regularities associated with LLM output, such as unusually consistent grammar, formulaic transitions, repeated discourse markers, and recurring rhetorical templates. Some tools are designed less to establish authorship provenance than to flag prose that resembles common LLM-generated style patterns. They are often used by teachers marking their students, usually on an ad hoc basis. Following the release of ChatGPT and similar AI text generative software, many educational establishments have issued policies against the use of AI by students. AI text detection software is also used by those assessing job applicants, as well as online search engines, hiring, online moderation and publishing. Current detectors may sometimes be unreliable and have incorrectly marked work by humans as originating from AI while failing to detect AI-generated work in other instances. MIT Technology Review said that the technology "struggled to pick up ChatGPT-generated text that had been slightly rearranged by humans and obfuscated by a paraphrasing tool". AI text detection software has also been shown to discriminate against non-native speakers of English. Two students from the University of California, Davis, were referred to the university's Office of Student Success and Judicial Affairs (OSSJA) after their professors scanned their essays with positive results; the first with an AI detector called GPTZero, and the second with an AI detector integration in Turnitin. However, following media coverage, and a thorough investigation, the students were cleared of any wrongdoing. In April 2023, Cambridge University and other members of the Russell Group of universities in the United Kingdom opted out of Turnitin's AI text detection tool, after expressing concerns it was unreliable. The University of Texas at Austin opted out of the system six months later. In May 2023, a professor at Texas A&M University–Commerce used ChatGPT to detect whether his students' content was written by it, which ChatGPT said was the case. As such, he threatened to fail the class despite ChatGPT not being able to detect AI-generated writing. No students were prevented from graduating because of the issue, and all but one student (who admitted to using the software) were exonerated from accusations of having used ChatGPT in their content. In July 2023, a paper titled "GPT detectors are biased against non-native English writers" was released, reporting that GPTs discriminate against non-native English authors. The paper compared seven GPT detectors against essays from both non-native English speakers and essays from United States students. The essays from non-native English speakers had an average false positive rate of 61.3%. An article by Thomas Germain, published on Gizmodo in June 2024, reported job losses among freelance writers and journalists due to AI text detection software mistakenly classifying their work as AI-generated. In September 2024, Common Sense Media reported that generative AI detectors had a 20% false positive rate for Black students, compared to 10% of Latino students and 7% of White students. To improve the reliability of AI text detection, researchers have explored digital watermarking techniques. A 2023 paper titled "A Watermark for Large Language Models" presents a method to embed imperceptible watermarks into text generated by large language models (LLMs). This watermarking approach allows content to be flagged as AI-generated with a high level of accuracy, even when text is slightly paraphrased or modified. The technique is designed to be subtle and hard to detect for casual readers, thereby preserving readability, while providing a detectable signal for those employing specialized tools. However, while promising, watermarking faces challenges in remaining robust under adversarial transformations and ensuring compatibility across different LLMs. == Anti text detection == There is software available designed to bypass AI text detection. In practice, evasion may not require specialized bypass tools. Paraphrasing, style editing, and removal of repeated discourse markers can substantially reduce the effectiveness of detectors that rely on recognizable surface patterns. A study published in August 2023 analyzed 20 abstracts from papers published in the Eye Journal, which were then paraphrased using GPT-4.0. The AI-paraphrased abstracts were examined for plagiarism using QueText and for AI-generated content using Originality.AI. The texts were then re-processed through an adversarial software called Undetectable.ai in order to reduce the AI-detection scores. The study found that the AI detection tool, Originality.AI, identified text generated by GPT-4 with a mean accuracy of 91.3%. However, after reprocessing by Undetectable.ai, the detection accuracy of Originality.ai dropped to a mean accuracy of 27.8%. Some experts also believe that techniques like digital watermarking are ineffective because they can be removed or added to trigger false positives. "A Watermark for Large Language Models" paper by Kirchenbauer et al. (2023) also addresses potential vulnerabilities of watermarking techniques. The authors outline a range of adversarial tactics, including text insertion, deletion, and substitution attacks, that could be used to bypass watermark detection. These attacks vary in complexity, from simple paraphrasing to more sophisticated approaches involving tokenization and homoglyph alterations. The study highlights the challenge of maintaining watermark robustness against attackers who may employ automated paraphrasing tools or even specific language model replacements to alter text spans iteratively while retaining semantic similarity. Experimental results show that although such attacks can degrade watermark strength, they also come at the cost of text quality and increased computational resources. == Image, video, and audio detection == Several purported AI image detection software exist, to detect AI-generated images (for example, those originating from Midjourney or DALL-E). They are not completely reliable. Industry analyses have also noted that AI-driven image recognition systems often struggle in real-world environments, where inconsistent lighting, noise and variable visual inputs reduce detection reliability, a challenge highlighted in modern agricultural quality-control research. Others claim to identify video and audio deepfakes, but this technology is also not fully reliable yet either. Despite debate around the efficacy of watermarking, Google DeepMind is actively developing a detection software called SynthID, which works by inserting a digital watermark that is invisible to the human eye into the pixels of an image.

    Read more →
  • Condensation algorithm

    Condensation algorithm

    The condensation algorithm (Conditional Density Propagation) is a computer vision algorithm. The principal application is to detect and track the contour of objects moving in a cluttered environment. Object tracking is one of the more basic and difficult aspects of computer vision and is generally a prerequisite to object recognition. Being able to identify which pixels in an image make up the contour of an object is a non-trivial problem. Condensation is a probabilistic algorithm that attempts to solve this problem. The algorithm itself is described in detail by Isard and Blake in a publication in the International Journal of Computer Vision in 1998. One of the most interesting facets of the algorithm is that it does not compute on every pixel of the image. Rather, pixels to process are chosen at random, and only a subset of the pixels end up being processed. Multiple hypotheses about what is moving are supported naturally by the probabilistic nature of the approach. The evaluation functions come largely from previous work in the area and include many standard statistical approaches. The original part of this work is the application of particle filter estimation techniques. The algorithm's creation was inspired by the inability of Kalman filtering to perform object tracking well in the presence of significant background clutter. The presence of clutter tends to produce probability distributions for the object state which are multi-modal and therefore poorly modeled by the Kalman filter. The condensation algorithm in its most general form requires no assumptions about the probability distributions of the object or measurements. == Algorithm overview == The condensation algorithm seeks to solve the problem of estimating the conformation of an object described by a vector x t {\displaystyle \mathbf {x_{t}} } at time t {\displaystyle t} , given observations z 1 , . . . , z t {\displaystyle \mathbf {z_{1},...,z_{t}} } of the detected features in the images up to and including the current time. The algorithm outputs an estimate to the state conditional probability density p ( x t | z 1 , . . . , z t ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )} by applying a nonlinear filter based on factored sampling and can be thought of as a development of a Monte-Carlo method. p ( x t | z 1 , . . . , z t ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )} is a representation of the probability of possible conformations for the objects based on previous conformations and measurements. The condensation algorithm is a generative model since it models the joint distribution of the object and the observer. The conditional density of the object at the current time p ( x t | z 1 , . . . , z t ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )} is estimated as a weighted, time-indexed sample set { s t ( n ) , n = 1 , . . . , N } {\displaystyle \{s_{t}^{(n)},n=1,...,N\}} with weights π t ( n ) {\displaystyle \pi _{t}^{(n)}} . N is a parameter determining the number of sample sets chosen. A realization of p ( x t | z 1 , . . . , z t ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )} is obtained by sampling with replacement from the set s t {\displaystyle s_{t}} with probability equal to the corresponding element of π t {\displaystyle \pi _{t}} . The assumptions that object dynamics form a temporal Markov chain and that observations are independent of each other and the dynamics facilitate the implementation of the condensation algorithm. The first assumption allows the dynamics of the object to be entirely determined by the conditional density p ( x t | x t − 1 ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {x_{t-1}} )} . The model of the system dynamics determined by p ( x t | x t − 1 ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {x_{t-1}} )} must also be selected for the algorithm, and generally includes both deterministic and stochastic dynamics. The algorithm can be summarized by initialization at time t = 0 {\displaystyle t=0} and three steps at each time t: === Initialization === Form the initial sample set and weights by sampling according to the prior distribution. For example, specify as Gaussian and set the weights equal to each other. === Iterative procedure === Sample with replacement N {\displaystyle N} times from the set { s 0 ( n ) , n = 1 , . . . , N } {\displaystyle \{s_{0}^{(n)},n=1,...,N\}} with probability { π 0 ( n ) , n = 1 , . . . , N } {\displaystyle \{\pi _{0}^{(n)},n=1,...,N\}} to generate a realization of p ( x t | z 1 , . . . , z t ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )} . Apply the learned dynamics p ( x t | x t − 1 ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {x_{t-1}} )} to each element of this new set, to generate a new set { s t ( n ) } {\displaystyle \{s_{t}^{(n)}\}} . To take into account the current observation z t {\displaystyle \mathbf {z_{t}} } , set π t ( n ) = p ( z t | s ( n ) ) ∑ j = 1 N p ( z t | s ( j ) ) {\displaystyle \pi _{t}^{(n)}={\frac {p(\mathbf {z_{t}} |s^{(n)})}{\sum _{j=1}^{N}p(\mathbf {z_{t}} |s^{(j)})}}} for each element { s t ( n ) } {\displaystyle \{s_{t}^{(n)}\}} . This algorithm outputs the probability distribution p ( x t | z 1 , . . . , z t ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {z_{1},...,z_{t}} )} which can be directly used to calculate the mean position of the tracked object, as well as the other moments of the tracked object. Cumulative weights can instead be used to achieve a more efficient sampling. == Implementation considerations == Since object-tracking can be a real-time objective, consideration of algorithm efficiency becomes important. The condensation algorithm is relatively simple when compared to the computational intensity of the Ricatti equation required for Kalman filtering. The parameter N {\displaystyle N} , which determines the number of samples in the sample set, will clearly hold a trade-off in efficiency versus performance. One way to increase efficiency of the algorithm is by selecting a low degree of freedom model for representing the shape of the object. The model used by Isard 1998 is a linear parameterization of B-splines in which the splines are limited to certain configurations. Suitable configurations were found by analytically determining combinations of contours from multiple views, of the object in different poses, and through principal component analysis (PCA) on the deforming object. Isard and Blake model the object dynamics p ( x t | x t − 1 ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {x_{t-1}} )} as a second order difference equation with deterministic and stochastic components: p ( x t | x t − 1 ) ∝ e − 1 2 | | B − 1 ( ( x t − x ¯ ) − A ( x t − 1 − x ¯ ) ) | | 2 ) {\displaystyle p(\mathbf {x_{t}} |\mathbf {x_{t-1}} )\propto e^{-{\frac {1}{2}}||B^{-1}((\mathbf {x_{t}} -\mathbf {\bar {x}} )-A(\mathbf {x_{t-1}} -\mathbf {\bar {x}} ))||^{2})}} where x ¯ {\displaystyle \mathbf {\bar {x}} } is the mean value of the state, and A {\displaystyle A} , B {\displaystyle B} are matrices representing the deterministic and stochastic components of the dynamical model respectively. A {\displaystyle A} , B {\displaystyle B} , and x ¯ {\displaystyle \mathbf {\bar {x}} } are estimated via Maximum Likelihood Estimation while the object performs typical movements. The observation model p ( z | x ) {\displaystyle p(\mathbf {z} |\mathbf {x} )} cannot be directly estimated from the data, requiring assumptions to be made in order to estimate it. Isard 1998 assumes that the clutter which may make the object not visible is a Poisson random process with spatial density λ {\displaystyle \lambda } and that any true target measurement is unbiased and normally distributed with standard deviation σ {\displaystyle \sigma } . The basic condensation algorithm is used to track a single object in time. It is possible to extend the condensation algorithm using a single probability distribution to describe the likely states of multiple objects to track multiple objects in a scene at the same time. Since clutter can cause the object probability distribution to split into multiple peaks, each peak represents a hypothesis about the object configuration. Smoothing is a statistical technique of conditioning the distribution based on both past and future measurements once the tracking is complete in order to reduce the effects of multiple peaks. Smoothing cannot be directly done in real-time since it requires information of future measurements. == Applications == The algorithm can be used for vision-based robot localization of mobile robots. Instead of tracking the position of an object in the scene, however, the position of the camera platform is tracked. This allows the camera platform to be globally localized given a visual map of the environment. Extensions of the condensation algorithm have also been used to recognize human gestures in image sequences. This application of the condensation algorithm impacts the ran

    Read more →
  • SmarterChild

    SmarterChild

    SmarterChild was a chatbot available on AOL Instant Messenger and Windows Live Messenger (previously MSN Messenger) networks. == History == SmarterChild was an apparently intelligent agent or "bot" developed by ActiveBuddy, Inc., with offices in New York and Sunnyvale. It was widely distributed across global instant messaging networks. SmarterChild became very popular, attracting over 30 million Instant Messenger "buddies" on AIM (AOL), MSN and Yahoo Messenger over the course of its lifetime. Founded in 2000, ActiveBuddy was the brainchild of Robert Hoffer and Timothy Kay, who later brought seasoned advertising executive Peter Levitan on board as CEO. The concept for conversational instant messaging bots came from the founder's vision to add natural language comprehension functionality to the increasingly popular AIM instant messaging application. The original implementation took shape as a demo that Kay programmed in Perl in his Los Altos garage to connect a single buddy name, "ActiveBuddy", to look up stock symbols, and later allow AIM users to play Colossal Cave Adventure, a word-based adventure game, and MIT's Boris Katz Start Question Answering System but quickly grew to include a wide range of database applications the company called 'knowledge domains' including instant access to news, weather, stock information, movie times, yellow pages listings, and detailed sports data, as well as a variety of tools (personal assistant, calculators, translator, etc.). None of the individual domains which the company had named “stocksBuddy”, “sportsBuddy”, etc. ever launched publicly. When Stephen Klein came on board as COO — and eventually CEO — he insisted that all of the disparate test “buddies” be launched together with the company’s highly-developed colloquial chat domain. He suggested using “SmarterChild”, a username coined by Tim Kay which Tim was using to test various things. The bundled domains were launched publicly as SmarterChild (on AIM initially) in June 2001. SmarterChild provided information wrapped in fun and quirky conversation. The company generated no revenue from SmarterChild, but used it as a demonstration of the power of what Klein called “conversational computing”. The company subsequently marketed Automated Service Agents—delivering immediate answers to customer service inquiries—-to large corporations, like Comcast, Cingular, TimeWarner Cable, etc. SmarterChild's popularity spawned targeted marketing-oriented bots for Radiohead, Austin Powers, Intel, Keebler, The Sporting News and others. ActiveBuddy co-founders, Kay and Hoffer, as co-inventors, were issued two controversial U.S. patents in 2002. ActiveBuddy changed its name to Colloquis (briefly Conversagent) and targeted development of consumer-facing enterprise customer service agents, which the company marketed as Automated Service Agents. Microsoft acquired Colloquis in October 2006 and proceeded to de-commission SmarterChild and kill off the Automated Service Agent business as well. Robert Hoffer, ActiveBuddy co-founder, licensed the technology from Microsoft after Microsoft abandoned the Colloquis technology.

    Read more →
  • Fred (chatbot)

    Fred (chatbot)

    Fred, or FRED, was an early chatbot written by Robby Garner. == History == The name Fred was initially suggested by Karen Lindsey, and then Robby jokingly came up with an acronym, "Functional Response Emulation Device." Fred has also been implemented as a Java application by Paco Nathan called JFRED Archived 2008-08-24 at the Wayback Machine. Fred Chatterbot is designed to explore Natural Language communications between people and computer programs. In particular, this is a study of conversation between people and ways that a computer program can learn from other people's conversations to make its own conversations. Fred used a minimalistic "stimulus-response" approach. It worked by storing a database of statements and their responses, and made its own reply by looking up the input statements made by a user and then rendering the corresponding response from the database. This approach simplified the complexity of the rule base, but required expert coding and editing for modifications. Fred was a predecessor to Albert One, which Garner used in 1998 and 1999 to win the Loebner Prize.

    Read more →
  • Graph cuts in computer vision and artificial intelligence

    Graph cuts in computer vision and artificial intelligence

    As applied in the field of computer vision, graph cut optimization can be employed to efficiently solve a wide variety of low-level computer vision problems (early vision), such as image smoothing, the stereo correspondence problem, image segmentation, object co-segmentation, numerous military applications (eg Automatic target recognition) and many other problems that can be formulated in terms of energy minimization (eg Climate Science and Environmental modelling). Graph cut techniques are now increasingly being used in combination with more general spatial Artificial intelligence techniques (eg to enforce structure in Large language model output to sharpen tumour boundaries and similarly for various Augmented reality, Self-driving car, Robotics, Google Maps applications etc). Many of these energy minimization problems can be approximated by solving a maximum flow problem in a graph (and thus, by the max-flow min-cut theorem, define a minimal cut of the graph). Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the maximum a posteriori estimate of a solution. Although many computer vision algorithms involve cutting a graph (e.g. normalized cuts), the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization (other graph cutting algorithms may be considered as graph partitioning algorithms). "Binary" problems (such as denoising a binary image) can be solved exactly using this approach; problems where pixels can be labeled with more than two different labels (such as stereo correspondence, or denoising of a grayscale image) cannot be solved exactly, but solutions produced are usually near the global optimum. == History == The foundational theory of graph cuts in computer vision was first developed by Margaret Greig, Bruce Porteous and Allan Seheult (GPS) of Durham University in a now legendary discussion contribution to Julian Besag's 1986 paper and a more detailed follow on paper in 1989. In the Bayesian statistical context of smoothing noisy images, using a Markov random field as the image prior distribution, they showed with a mathematically beautiful proof how the maximum a posteriori estimate of a binary image can be obtained exactly by maximizing the flow through an associated image network, or graph, involving the introduction of a source and sink and Log-likelihood ratios. The problem was shown to be efficiently solvable exactly, an unexpected result as the problem was believed to be computationally intractable (NP hard). GPS also addressed the computational cost of the max-flow algorithm on large graphs, a significant concern at the time. They proposed a partitioning algorithm (see Section 4 of GPS) involving the recursive amalgamation of non-overlapping blocks, or tiles, which gave a 12X increase in speed. This approach recursively solved and amalgamated independent sub-graphs until the whole graph was solved. While contemporaries like Geman and Geman had advocated Parallel computing in the context of Simulated annealing, the GPS blocking strategy offered a deterministic structure amenable to parallelisation and anticipated modern artificial intelligence design across multiple GPUs. However, until recently, this aspect of the paper was largely ignored and subsequent research focused on Serial computer global search trees, such as the Boykov-Kolmogorov algorithm. Although the general k {\displaystyle k} -colour problem is NP hard for k > 2 , {\displaystyle k>2,} the GPS approach has turned out to have very wide applicability in general computer vision problems. This was first demonstrated by Boykov, Veksler and Zabih who, in a seminal paper published more than 10 years after the original GPS paper, and in other important works, lit the blue touch paper for the general adoption of graph cut techniques in computer vision. They showed that, for general problems, the GPS approach can be applied iteratively to sequences of binary problems, using their now ubiquitous alpha-expansion algorithm, yielding near optimal solutions. Prior to these results, approximate local optimisation techniques such as simulated annealing (as proposed by the Geman brothers) or iterated conditional modes (a type of greedy algorithm suggested by Julian Besag) were used to solve such image smoothing problems. Building on these advancements, GPS graph cut optimization was subsequently adapted for interactive image segmentation, most notably through the "GrabCut" algorithm introduced by Carsten Rother, Vladimir Kolmogorov, and Andrew Blake of Microsoft Research, Cambridge. GrabCut extended earlier interactive graph cut methods by replacing monochrome image histograms with Gaussian mixture models to estimate colour distributions, and by employing an iterative GPS energy minimisation scheme. This approach significantly simplified user interaction, requiring only a rough bounding box around the target object rather than detailed user-drawn strokes, and it quickly became a standard tool in both academic research and commercial image editing software. The GPS paper connected and bridged profound ideas from Mathematical statistics (Bayes' theorem, Markov random field), Physics (Ising model), Optimisation (Energy function) and Computer science (Network flow problem) and led the move away from approximate local and slow optimisation approaches (eg simulated annealing) to more powerful exact, or near exact, faster global optimisation techniques. It is now recognised as seminal as it was well ahead of its time and, in particular, was published years before the computing power revolution of Moore's law and GPUs. Significantly, GPS was published in a mathematical statistics (rather than a computer vision) journal, and this led to it being overlooked by the computer vision community for many years. It is unofficially known as "The Velvet Underground" paper of computer vision (ie although very few computer vision people read the paper [bought the record], those that did, most importantly Boykov, Veksler and Zabih, started new and important research [formed a band]). This is confirmed by GPS' very large amplification ratio (2nd order citations/first order citations), estimated at well in excess of 100. Despite the foundational nature of the GPS work, formal recognition from the computer vision community has predominantly gone to the researchers who followed to extend and popularise the graph cut method. For example, Boykov, Veksler and Zabih deservedly received a Helmholtz Prize from the ICCV in 2011. This prize recognises ICCV papers from 10 or more years earlier that have had a significant impact on computer vision research. In 2011, Couprie et al. proposed a general image segmentation framework, called the "Power Watershed", that minimized a real-valued indicator function from [0,1] over a graph, constrained by user seeds (or unary terms) set to 0 or 1, in which the minimization of the indicator function over the graph is optimized with respect to an exponent p {\displaystyle p} . When p = 1 {\displaystyle p=1} , the Power Watershed is optimized by graph cuts, when p = 0 {\displaystyle p=0} the Power Watershed is optimized by shortest paths, p = 2 {\displaystyle p=2} is optimized by the random walker algorithm and p = ∞ {\displaystyle p=\infty } is optimized by the watershed algorithm. In this way, the Power Watershed may be viewed as a generalization of graph cuts that provides a straightforward connection with other energy optimization segmentation/clustering algorithms. == Binary segmentation of images == === Notation === Image: x ∈ { R , G , B } N {\displaystyle x\in \{R,G,B\}^{N}} Output: Segmentation (also called opacity) S ∈ R N {\displaystyle S\in R^{N}} (soft segmentation). For hard segmentation S ∈ { 0 for background , 1 for foreground/object to be detected } N {\displaystyle S\in \{0{\text{ for background}},1{\text{ for foreground/object to be detected}}\}^{N}} Energy function: E ( x , S , C , λ ) {\displaystyle E(x,S,C,\lambda )} where C is the color parameter and λ is the coherence parameter. E ( x , S , C , λ ) = E c o l o r + E c o h e r e n c e {\displaystyle E(x,S,C,\lambda )=E_{\rm {color}}+E_{\rm {coherence}}} Optimization: The segmentation can be estimated as a global minimum over S: arg ⁡ min S E ( x , S , C , λ ) {\displaystyle {\arg \min }_{S}E(x,S,C,\lambda )} === Existing methods === Standard Graph cuts: optimize energy function over the segmentation (unknown S value). Iterated Graph cuts: First step optimizes over the color parameters using K-means. Second step performs the usual graph cuts algorithm. These 2 steps are repeated recursively until convergence Dynamic graph cuts:Allows to re-run the algorithm much faster after modifying the problem (e.g. after new seeds have been added by a user). === Energy function === Pr ( x ∣ S ) = K − E {\displaystyle \Pr(x\mid S)=K^{-E}} where the energy E {\displaystyle E} is composed of two different mod

    Read more →
  • Syman

    Syman

    SYMAN is an artificial intelligence technology that uses data from social media profiles to identify trends in the job market. SYMAN is designed to organize actionable data for products and services including recruiting, human capital management, CRM, and marketing. SYMAN was developed with a $21 million series B financing round secured by Identified, which was led by VantagePoint Capital Partners and Capricorn Investment Group.

    Read more →
  • Physics-informed neural networks

    Physics-informed neural networks

    In machine learning, physics-informed neural networks (PINNs), also referred to as theory-trained neural networks (TTNs), are a type of universal function approximator that can embed the knowledge of any physical laws that govern a given data-set in the learning process, and can be described by partial differential equations (PDEs). Low data availability for some biological and engineering problems limit the robustness of conventional machine learning models used for these applications. The prior knowledge of general physical laws acts in the training of neural networks (NNs) as a regularization agent that limits the space of admissible solutions, increasing the generalizability of the function approximation. This way, embedding this prior information into a neural network results in enhancing the information content of the available data, facilitating the learning algorithm to capture the right solution and to generalize well even with a low amount of training examples. Because they process continuous spatial and time coordinates and output continuous PDE solutions, they can be categorized as neural fields. == Function approximation == Most of the physical laws that govern the dynamics of a system can be described by partial differential equations. For example, the Navier–Stokes equations are a set of partial differential equations derived from the conservation laws (i.e., conservation of mass, momentum, and energy) that govern fluid mechanics. The solution of the Navier–Stokes equations with appropriate initial and boundary conditions allows the quantification of flow dynamics in a precisely defined geometry. However, these equations cannot be solved exactly and therefore numerical methods must be used (such as finite differences, finite elements and finite volumes). In this setting, these governing equations must be solved while accounting for prior assumptions, linearization, and adequate time and space discretization. Recently, solving the governing partial differential equations of physical phenomena using deep learning has emerged as a new field of scientific machine learning (SciML), leveraging the universal approximation theorem and high expressivity of neural networks. In general, deep neural networks could approximate any high-dimensional function given that sufficient training data are supplied. However, such networks do not consider the physical characteristics underlying the problem, and the level of approximation accuracy provided by them is still heavily dependent on careful specifications of the problem geometry as well as the initial and boundary conditions. Without this preliminary information, the solution is not unique and may lose physical correctness. To remedy this, Physics-Informed Neural Networks (PINNs) leverage governing physical equations in neural network training. Namely, PINNs are designed to be trained to satisfy the given training data as well as the imposed governing equations. In this fashion, a neural network can be guided with training datasets that do not necessarily need to be large or complete. An accurate solution of partial differential equations can potentially be found without knowing the boundary conditions. Therefore, with some knowledge about the physical characteristics of the problem and some form of training data (even sparse and incomplete), PINNs may be used for finding an optimal solution with high fidelity. PINNs can be applied to a wide range of problems in computational science, and are a pioneering technology leading to the development of new classes of numerical solvers for PDEs. PINNs can be thought of as a mesh-free alternative to traditional approaches (e.g., CFD for fluid dynamics), and new data-driven approaches for model inversion and system identification. Notably, a trained PINN network can be used to predict values on simulation grids of different resolutions without needing to be retrained. Additionally, the derivatives used in the partial differential equations can be computed using automatic differentiation (AD), which is assessed to be superior to numerical or symbolic differentiation. == Modeling and computation == A general nonlinear partial differential equation can be written as: u t + N [ u ; λ ] = 0 , x ∈ Ω , t ∈ [ 0 , T ] {\displaystyle u_{t}+{\mathcal {N}}[u;\lambda ]=0,\quad x\in \Omega ,\quad t\in [0,T]} where u ( t , x ) {\displaystyle u(t,x)} denotes the solution, N [ ⋅ ; λ ] {\displaystyle {\mathcal {N}}[\cdot ;\lambda ]} is a nonlinear operator parameterized by λ {\displaystyle \lambda } , and Ω {\displaystyle \Omega } is a subset of R D {\displaystyle \mathbb {R} ^{D}} . This general form of governing equations summarizes a wide range of problems in mathematical physics, such as conservative laws, diffusion process, advection-diffusion systems, and kinetic equations. Given noisy measurements of a generic dynamic system described by the equation above, PINNs can be designed to solve two classes of problems: data-driven solutions of partial differential equations data-driven discovery of partial differential equations === Data-driven solution of partial differential equations === The data-driven solution of PDE computes the hidden state u ( t , x ) {\displaystyle u(t,x)} of the system given boundary data and/or measurements z {\displaystyle z} , and fixed model parameters λ {\displaystyle \lambda } . We solve: u t + N [ u ] = 0 , x ∈ Ω , t ∈ [ 0 , T ] {\displaystyle u_{t}+{\mathcal {N}}[u]=0,\quad x\in \Omega ,\quad t\in [0,T]} . by defining the residual f ( t , x ) {\displaystyle f(t,x)} as: f := u t + N [ u ] {\displaystyle f:=u_{t}+{\mathcal {N}}[u]} , and approximating u ( t , x ) {\displaystyle u(t,x)} by a deep neural network. This network can be differentiated using automatic differentiation. The parameters of u ( t , x ) {\displaystyle u(t,x)} and f ( t , x ) {\displaystyle f(t,x)} can be then learned by minimizing the following loss function L tot {\displaystyle L_{\text{tot}}} : L tot = L u + L f {\displaystyle L_{\text{tot}}=L_{u}+L_{f}} where: L u = ‖ u − z ‖ Γ {\displaystyle L_{u}=\Vert u-z\Vert _{\Gamma }} is the error between the PINN u ( t , x ) {\displaystyle u(t,x)} and the set of boundary conditions and measured data on the set of points Γ {\displaystyle \Gamma } where the boundary conditions and data are defined. L f = ‖ f ‖ Γ {\displaystyle L_{f}=\Vert f\Vert _{\Gamma }} is the mean-squared error of the residual function. This second term encourages the PINN to learn the structural information expressed by the PDE during the training process. This approach has been used to yield computationally efficient physics-informed surrogate models with applications in the forecasting of physical processes, model predictive control, multi-physics and multi-scale modeling, and simulation. It has been shown to converge to the solution of the PDE. === Data-driven discovery of partial differential equations === Given noisy and incomplete measurements z {\displaystyle z} of the state of the system, the data-driven discovery of PDEs results in computing the unknown state u ( t , x ) {\displaystyle u(t,x)} and learning model parameters λ {\displaystyle \lambda } that best describe the observed data: u t + N [ u ; λ ] = 0 , x ∈ Ω , t ∈ [ 0 , T ] {\displaystyle u_{t}+{\mathcal {N}}[u;\lambda ]=0,\quad x\in \Omega ,\quad t\in [0,T]} By defining f ( t , x ) {\displaystyle f(t,x)} as: f := u t + N [ u ; λ ] = 0 {\displaystyle f:=u_{t}+{\mathcal {N}}[u;\lambda ]=0} , and approximating u ( t , x ) {\displaystyle u(t,x)} by a deep neural network, f ( t , x ) {\displaystyle f(t,x)} results in a PINN. This network can be derived using automatic differentiation. The parameters of u ( t , x ) {\displaystyle u(t,x)} and f ( t , x ) {\displaystyle f(t,x)} , together with the parameter λ {\displaystyle \lambda } of the differential operator can be then learned by minimizing the following loss function L tot {\displaystyle L_{\text{tot}}} : L tot = L u + L f {\displaystyle L_{\text{tot}}=L_{u}+L_{f}} where: L u = ‖ u − z ‖ Γ {\displaystyle L_{u}=\Vert u-z\Vert _{\Gamma }} , with u {\displaystyle u} and z {\displaystyle z} state solutions and measurements at sparse location Γ {\displaystyle \Gamma } , respectively. L f = ‖ f ‖ Γ {\displaystyle L_{f}=\Vert f\Vert _{\Gamma }} is the residual function. This second term requires the structured information represented by the partial differential equations to be satisfied in the training process. This strategy allows for discovering dynamic models described by nonlinear PDEs assembling computationally efficient and fully differentiable surrogate models that may find application in predictive forecasting, control, and data assimilation. == Extensions and applications == === For piece-wise function approximation === PINNs are unable to approximate PDEs that have strong non-linearity or sharp gradients (such as those that commonly occur in practical fluid flow problems). Piecewise approximation has been an old practic

    Read more →
  • Ernie Bot

    Ernie Bot

    Ernie Bot (Chinese: 文心一言, Pinyin: wénxīn yīyán), full name Enhanced Representation through Knowledge Integration, is an artificial intelligence chatbot developed by the Chinese technology company Baidu. Ernie Bot rivals GPT models in Chinese NLP tasks. It is built on the company's ERNIE series of large language models, which have been in development since 2019. The service was first launched for invited testing on March 16, 2023, and was released to the general public on August 31, 2023, after receiving approval from Chinese regulators. Since its public launch, Ernie Bot has undergone several updates, with newer versions like ERNIE 4.0 and 4.5 released to improve its capabilities. The service has seen rapid user adoption, reportedly reaching over 200 million users by April 2024. It has been integrated into various products, notably powering AI features for the Chinese release of Samsung's Galaxy S24 smartphones. As a product operating in China, Ernie Bot is subject to the country's censorship regulations. It has been observed to refuse answers to politically sensitive questions, such as those regarding CCP general secretary Xi Jinping, the 1989 Tiananmen Square protests and massacre, and other topics deemed taboo by the government. == History == Ernie Bot was initially released for invited testing on March 16, 2023. The live release demo was reported to have been prerecorded, which caused Baidu's stock to drop 10 percent on the day of the launch. The company's stock gained 14 percent the following day after analysts from Citigroup and Bank of America tested Ernie Bot and gave it positive preliminary reviews. On August 31, 2023, Ernie Bot was released to the public after receiving approval from Chinese regulatory authorities. By December 2023, Baidu announced the service had surpassed 100 million users. In January 2024, Hong Kong newspaper South China Morning Post reported that a university research lab linked to the People's Liberation Army (PLA) had tested Ernie Bot for military response scenarios. Baidu denied the allegations, stating it had no connection with the academic paper. That same month, Ernie was integrated into Samsung's Galaxy S24 lineup for its launch in China. The user base reportedly grew to 200 million by April 2024 and 300 million by June 2024. In September 2024, Baidu changed the chatbot's Chinese name from "Wenxin Yiyan" (文心一言) to "Wenxiaoyan" (文小言) to position it as a search assistant. On March 16, 2025, Baidu announced version 4.5 and the reasoning model ERNIE X1. The following month, at the Create2025 Baidu AI Developer Conference, the company released the Wenxin 4.5 Turbo and Wenxin X1 Turbo models, designed to be faster and less expensive to operate. == Development == Ernie Bot is based on Baidu's ERNIE (Enhanced Representation through Knowledge Integration) series of foundation models. The general training process begins with pre-training on large datasets, followed by refinement using techniques like supervised fine-tuning, reinforcement learning with human feedback, and prompt engineering. === Foundation models === ==== Ernie 3.0 ==== The model powering the initial launch of Ernie Bot. It was trained with 10 billion parameters on a 4-terabyte corpus consisting of plain text and a large-scale knowledge graph. ==== Ernie 3.5 ==== Released in June 2023. At the time of release, its performance was reported as "slightly inferior" to OpenAI's GPT-4. ==== Ernie 4.0 ==== Unveiled in October 2023 and released to paying subscribers in November. According to Baidu, this version featured improved performance over its predecessor, with information updated to April 2023. ==== Ernie X1 ==== Announced in March 2025, with Ernie X1 positioned as a specialized reasoning model. Baidu stated that performance improvements were achieved through new technologies such as "FlashMask" dynamic attention masking and a heterogeneous multimodal mixture-of-experts architecture. === Turbo Models === In June 2024, Baidu announced Ernie 4.0 Turbo. In April 2025, Ernie 4.5 Turbo and X1 Turbo were released. These models are optimized for faster response times and lower operational costs. == Service == In its subscription options, the professional plan gives users access to Ernie 4.0 with a payment either for a month or with reduced payment for auto-renewal per month. Meanwhile, Ernie 3.5 is free of charge. Ernie 4.0, the language model for Ernie bot, has information updated to April 2023. == Censorship == Ernie Bot is subject to the Chinese government's censorship regime. In public tests with journalists, Ernie Bot refused to answer questions about CCP general secretary Xi Jinping, the 1989 Tiananmen Square protests and massacre, the persecution of Uyghurs in China in Xinjiang, and the 2019–2020 Hong Kong protests. When queried about the origin of SARS-CoV-2, Ernie Bot stated that it originated among American vape users.

    Read more →
  • Perplexity AI

    Perplexity AI

    Perplexity AI, Inc., or simply Perplexity, is an American privately held software company offering a web search engine that processes user queries and synthesizes responses. Perplexity products use large language models and incorporate real-time web search capabilities, providing responses based on current Internet content, citing sources used. Its real-time search engine is called Sonar and is based on Meta's Llama model. A free public version is available, while a paid Pro subscription offers access to more advanced language models and additional features. Perplexity AI, Inc., was founded in August 2022 by Aravind Srinivas, Denis Yarats, Johnny Ho, and Andy Konwinski. As of September 2025, the company was valued at US$20 billion. Perplexity AI has attracted legal scrutiny over allegations of copyright infringement, unauthorized content use, and trademark issues from several major media organizations, including the BBC, Dow Jones, and The New York Times. According to separate analyses by Wired and later Cloudflare, Perplexity uses undisclosed web crawlers with spoofed user-agent strings to scrape the content of websites which prohibit, or explicitly block, web scraping. == History == In August 2022, Perplexity AI, Inc., was founded by Aravind Srinivas, Denis Yarats, Johnny Ho, and Andy Konwinski, engineers with backgrounds in back-end systems, artificial intelligence (AI) and machine learning. It launched its main search engine on December 7, 2022, and has since released a Google Chrome extension and apps for iOS and Android. In February 2023, Perplexity reported two million unique visitors. By April 2024, Perplexity had raised $165 million in funding, valuing the company at over $1 billion. As of June 2025, Perplexity closed a $500 million round of funding that elevated its valuation to $14 billion. Investors in Perplexity AI have included Jeff Bezos, Tobias Lütke, Nat Friedman, Nvidia, and Databricks. Perplexity has also received funding from 1789 Capital, a venture capital firm notable for its association with Donald Trump Jr. During Bloomberg’s Tech Summit 2025, Srinivas shared that the company processed 780 million queries in May 2025, experiencing more than 20% month-over-month growth, processing around 30 million queries daily. In July 2024, Perplexity announced the launch of a new publishers' program to share advertising revenue with partners. On January 18, 2025, the day before the impending U.S. ban on the social media app TikTok, Perplexity submitted a proposal for a merger with TikTok US. On August 12, 2025, Perplexity made a bid to buy Chrome from Google for $34.5 billion. Perplexity stated that the sale could remedy anti-trust litigation against Google, in which a judge was considering compelling the sale of Chrome. In December 2025, Cristiano Ronaldo took an undisclosed stake in Perplexity AI and entered a global brand partnership with the company. === Business Strategy and Finance (2026) === As of early 2026, Perplexity AI reached a valuation of $21.21 billion following its Series E-6 funding round. The company's Annual Recurring Revenue (ARR) grew from $80 million in late 2024 to an estimated $200 million by February 2026. In January 2026, the company entered into a three-year, $750 million commitment with Microsoft Azure to secure the GPU capacity required for its advanced "Deep Research" and "Model Council" features. In February 2026, Perplexity transitioned to a subscription-first model by discontinuing its AI-integrated advertising strategy. Leadership stated the move was intended to preserve user trust in the "answer engine," prioritizing objective results over ad revenue. The company also introduced the "Model Council" feature on February 5, 2026, which allows users to compare outputs from multiple large language models, such as GPT-5.2 and Claude 4.6, simultaneously. To expand its user base, Perplexity began offering a free year of Pro access to students, U.S. Military Veterans, and government employees. == Products and services == === Search engine web portal === Perplexity’s primary offering is an online information retrieval system (search engine) that uses large language models to generate responses to user queries by searching and summarizing web-based content. Perplexity offers a feature known as Perplexity Pages that generates structured summaries and report-like content from user queries by aggregating cited sources. Perplexity is available without charge or registration to Web users, a freemium model. === Perplexity Pro === Perplexity Pro is a subscription tier, a more capable paid "enterprise" service, including stronger security and data protection and additional tools, including the ability to search uploaded documents alongside web content and access to a programmatic application programming interface (API). It allows the user to select between backend models such as GPT-5.4, Claude 4.6 and Gemini 3.1 Pro. The company has also developed its own models, Sonar (based on Llama 3.3) and R1 1776 (based on DeepSeek R1). === Internal Knowledge Search === Internal Knowledge Search enables Pro and Enterprise Pro users to simultaneously search across web content and internal documents. Users can upload and search through Excel, Word, PDF, and other common file formats. Enterprise Pro users can upload and index up to 500 files. === Search API === Perplexity's Search API provides AI developers with programmatic access to the company's search infrastructure. The September 2025 release includes a software development kit, an open-source evaluation framework called search_evals, and documentation detailing the API's design and optimization. === Shopping hub === Perplexity's Shopping Hub is an online shopping platform that provides AI-generated product recommendations, and enables users to purchase products directly through Perplexity's interface. It was launched in November 2024 with backing by Amazon and Nvidia. === Finance === In October 2024, Perplexity AI introduced new finance-related features, including looking up stock prices and company earnings data. The tool provides real-time stock quotes and price tracking, industry peer comparisons and basic financial analysis tools. The platform sources its financial data from Financial Modeling Prep. === Assistant === In January 2025, Perplexity launched the Perplexity Assistant, an AI-powered tool designed to enhance the functionality of its search engine. It can perform tasks across multiple apps, such as hailing a ride or searching for a song, and can maintain context across actions. The assistant is also multi-modal, meaning it can use a phone's camera to provide answers about the user's surroundings or on-screen content. Perplexity has acknowledged that the assistant is still in development and may not always function as expected. For instance, certain features, such as summarizing unread emails or upcoming calendar events, require users to enable a workaround based on notifications. === Comet === In July 2025, Perplexity launched Comet, an AI browser based on Chromium. Initially, access to the browser was limited to users subscribed to the most expensive subscription tier. The browser was later released for free download in October 2025. A key feature is integration of the Perplexity search engine, which can perform a variety of tasks such as generating article summaries, describing an image, conducting research about a topic and composing emails. === Truth Social chatbot === Perplexity has been contracted to produce a chatbot for Donald Trump's social media platform Truth Social. == Leadership == Aravind Srinivas is the CEO and co-founder of Perplexity AI. He previously held research positions at OpenAI, Google DeepMind, and other AI research institutions focusing on machine learning and artificial intelligence. In a March 2026 All-In episode, Srinivas said the incoming AI-related layoffs were "glorious future" to "look forward", as it freed people from jobs they didn't like and gave them opportunities to pursue entrepreneurship. == Controversies == === Copyright and trademark infringement allegations === In June 2024, Forbes publicly criticized Perplexity for using their content. According to Forbes, Perplexity published a story largely copied from a proprietary Forbes article without mentioning or prominently citing Forbes. In response, Srinivas said that the feature had some "rough edges" and accepted feedback but maintained that Perplexity only "aggregates" rather than plagiarizes information. In October 2024, The New York Times sent a cease-and-desist notice to Perplexity to stop accessing and using NYT content, claiming that Perplexity is violating its copyright by scraping data from its website. In June 2024, Dow Jones and New York Post filed a lawsuit against Perplexity, alleging copyright infringement. The lawsuit also alleged that Perplexity harmed their brand by attributing hallucinated quotes, for example on F-16 jets for Ukraine, to artic

    Read more →
  • List of C software and tools

    List of C software and tools

    This is a list of software and programming tools for the C programming language, including libraries, debuggers, compilers, integrated development environments (IDEs), and other related development tools and utilities. == Libraries and tools == Adns — asynchronous DNS resolver library Advanced Linux Sound Architecture — API for sound card device drivers Allegro — cross-platform software library for video game development Apache Portable Runtime — Apache web server tool set of APIs that map to the underlying operating system Argon2 — memory-hard password hashing library Berkeley DB — embedded database software library for key/value data Binary File Descriptor library — binary file manipulation library in the GNU toolchain Boehm garbage collector – conservative garbage collector Borland Graphics Interface — graphics library for Borland compilers BSAFE — FIPS 140-2 validated cryptography library Chipmunk — 2D real-time rigid body physics engine C POSIX library — specification of a C standard library for POSIX systems C standard library – standard library for the C programming language Cairo – vector graphics library API for software developers CFD General Notation System (CGNS) — data format and library for computational fluid dynamics cJSON — lightweight JSON parser CLIPS — public-domain software tool for building expert systems Core Audio — low-level API for dealing with sound in Apple's macOS and iOS operating systems Core Foundation — API for macOS and iOS and other Apple operating systems Core Image — GPU accelerated image processing technology for Apple operating systems with Quartz graphics rendering layer. Core Text — text layout and font rendering API for macOS and iOS. Cryptlib — portable cryptography library cURL / libcurl — CLI app for uploading and downloading individual files, such as a URL from a web server over HTTP. DevIL — cross-platform image library for loading and converting file formats DirectFB — graphics acceleration and input device handling library Dld — dynamic loading library Expat — stream-oriented XML 1.0 parser library, written in C99. FFmpeg — multimedia framework for audio/video processing Fontconfig — font customization and configuration library FreeTDS — database library for Sybase and Microsoft SQL Server FreeType — render text onto bitmaps with a font rasterization engine GD Graphics Library — image creation and manipulation library GDK — graphics abstraction layer for GTK GEGL — graph-based image processing framework GIO — I/O and virtual file system library in GLib GLib — utility library providing data structures, event loops, and portability functions. glibc — GNU implementation of the C standard library GLFW — library for OpenGL contexts, windows, and input device handling GNet — networking library for GLib GNU Libtool — Library management tool GNU portability library — collection of portability routines for GNU software GNU Portable Threads — POSIX/ANSI-C based user space thread library for UNIX for scheduling multithreading GNU Readline — command-line editing library GnuTLS — secure communications (TLS/SSL) library GObject — object system library for GNOME GTK — widget toolkit for creating graphical user interfaces GTK Scene Graph Kit (GSK) — scene graph and rendering toolkit for GTK HDF — file format and library for managing large datasets Integrated Performance Primitives — Intel library of optimized multimedia and data processing routines IUP — portable GUI toolkit J2K-Codec — JPEG 2000 image codec JasPer — reference implementation of the codec specified in the JPEG-2000 Part-1 standard LDAP API — API for interacting with Lightweight Directory Access Protocol LZO — lossless compression library Liba52 — decoder for A/52 (AC-3) audio streams libarchive — reading and writing various archive and compression formats Libart — 2D graphics library Libavcodec — codec library from FFmpeg Libavdevice — library for handling multimedia devices Libavfilter — audio and video filter library Libavformat — library for muxing and demuxing multimedia Libpcap — packet capture library Libdca — decoder for DTS audio Libdvdcss — access to encrypted DVD-Video discs libevent — asynchronous event notification callbacks libffi — foreign function interface libfuse — userspace filesystem Libgegl — programming interface to GEGL image processing libgcrypt — cryptography Libgimp — plug-in development library for GIMP Libhybris — compatibility layer for running Android libraries on Linux Libinput — input device library for Wayland and X.Org libjpeg — JPEG image library libLAS — reading and writing geospatial data encoded in the ASPRS laser (LAS) file format libmicrohttpd — small C library for embedding HTTP server functionality Libmpcodecs — media player codec library from MPlayer Libmpdemux — demultiplexing library from MPlayer libpng — PNG image format Libpostproc — video post-processing library from FFmpeg libpq — PostgreSQL client LibreSSL — fork of OpenSSL for TLS Librsb — parallel library for sparse matrix computations Librsvg — SVG rendering library libsndfile — reading and writing audio files libsodium — easy-to-use cryptography library Libswscale — image scaling and colorspace conversion library LibTIFF — TIFF image handling library Libusb — USB device access library Libuv — asynchronous I/O and event loop library LibVLC — media player engine from VLC LibVNCServer — implementation of the VNC server protocol Libvpx — VP8 and VP9 video codec library Libwww — early World Wide Web protocol library from W3C libxml2 — XML parsing Libxslt — XSLT library for the GNOME Project libzip — ZIP archives Lightning Memory-Mapped Database — fast key–value database engine LittleCMS — open-source color management system LZ4 — fast lossless compression algorithm LZFSE — compression library developed by Apple MatrixSSL — lightweight TLS implementation Mbed TLS — portable cryptography and TLS library MediaLib — Sun Microsystems library for multimedia processing Mesa — OpenGL and Vulkan graphics library Microwindows — small windowing system for embedded devices Ming — library for generating SWF (Flash) files Mongoose — embedded web server and networking library Mpg123 — MP3 audio decoding library MPIR — multiple-precision arithmetic library MsQuic — Microsoft implementation of the QUIC transport protocol MuJoCo — physics engine for robotics and control Mustache — logic-less templating library Ncurses — terminal control library Nettle — low-level cryptography library Newt — text-based user interface library Netpbm — graphics conversion and processing library Nghttp2 — implementation of the HTTP/2 protocol Oniguruma — regular expression library Open Asset Import Library — library to import/export 3D model formats OpenCL — parallel computing API/library OpenCV — computer vision OpenGL — API for rendering 2D and 3D vector graphics OpenGL Utility Library — OpenGL utility functions OpenJPEG — JPEG 2000 image codec OpenSSL — SSL and TLS protocols and cryptography library Pango — layout engine library which works with the HarfBuzz shaping engine for displaying multi-language text perf (Linux) — performance analyzing tool PCRE — regular expression library PROJ — library for map projections and coordinate transforms Quartz 2D — 2D graphics rendering API for macOS and iOS platforms, part of the Core Graphics framework. Raylib — simple library for games and multimedia Redland RDF Application Framework — RDF data storage library S2n-tls — TLS implementation from AWS Setcontext — context switching library functions SDL — Simple DirectMedia Layer systemd — system and service manager libraries for Linux Tk — GUI widgets for building graphical user interfaces VDPAU — video decoding acceleration API Vorbis — audio compression codec library VTD-XML — high-performance XML parser Wimlib — library for handling Windows Imaging Format disk images Windows.h — base Windows API header file WolfSSH — lightweight SSH library WolfSSL — lightweight SSL/TLS library X Toolkit Intrinsics — toolkit library for the X Window System x264 — H.264 video codec library XCB — C binding for the X Window System protocol Xft — font rendering library using FreeType Xlib — low-level X Window System API XMDF — eXtensible Model Data Format for scientific data XMLStarlet — XML command-line toolkit zlib — data compression Zopfli — data compression library that performs deflate, gzip and zlib data encoding. Zstd — fast data compression library == Integrated development environments == Anjuta — GNOME IDE CLion — cross-platform commercial IDE from JetBrains Code::Blocks — cross-platform open-source IDE CodeLite — open-source IDE Dev-C++ Eclipse CDT Geany — text editor with IDE features KDevelop — KDE IDE NetBeans Qt Creator SlickEdit Visual Studio Xcode === Online IDEs === CodeSandbox — online IDE primarily for web development with some C support via containers GitHub Codespaces — cloud-based online IDE developed by GitHub Google Cloud Shell — browser-based shell and editor that can comp

    Read more →
  • Residual neural network

    Residual neural network

    A residual neural network (also referred to as a residual network or ResNet) is a deep learning architecture in which the layers learn residual functions with reference to the layer inputs. It was developed in 2015 for image recognition, and won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) of that year. As a point of terminology, "residual connection" refers to the specific architectural motif of x ↦ f ( x ) + x {\displaystyle x\mapsto f(x)+x} , where f {\displaystyle f} is an arbitrary neural network module. The motif had been used previously (see §History for details). However, the publication of ResNet made it widely popular for feedforward networks, appearing in neural networks that are seemingly unrelated to ResNet. The residual connection stabilizes the training and convergence of deep neural networks with hundreds of layers, and is a common motif in deep neural networks, such as transformer models (e.g., BERT, and GPT models such as ChatGPT), the AlphaGo Zero system, the AlphaStar system, and the AlphaFold system. == Mathematics == === Residual connection === In a multilayer neural network model, consider a (non-residual) subnetwork with a certain number of stacked layers (e.g., 2 or 3). Let H ( x ; α ) {\displaystyle H(x;\alpha )} denote the subnetwork. Suppose H ∗ {\displaystyle H^{}} is the desired optimal output of this subnetwork. Residual learning simply adds x {\displaystyle x} directly to the output, such that the optimal learned output now becomes be H ∗ − x {\displaystyle H^{}-x} , which is interpreted as a "residual" with respect to x {\displaystyle x} . The operation of "adding x {\displaystyle x} " is implemented via a "skip connection" that performs an identity mapping to connect the input of the subnetwork with its output. This connection is referred to as a "residual connection" in later work. Let F ( x ; α ) = H ( x ; a ) + x {\displaystyle F(x;\alpha )=H(x;a)+x} . The function F {\displaystyle F} is often represented by matrix multiplication interlaced with activation functions and normalization operations (e.g., batch normalization or layer normalization). As a whole, one of these subnetworks is referred to as a "residual block". A deep residual network is constructed by simply stacking these blocks. Long short-term memory (LSTM) has a memory mechanism that serves as a residual connection. In an LSTM without a forget gate, an input x t {\displaystyle x_{t}} is processed by a function F {\displaystyle F} and added to a memory cell c t {\displaystyle c_{t}} , resulting in c t + 1 = c t + F ( x t ) {\displaystyle c_{t+1}=c_{t}+F(x_{t})} . An LSTM with a forget gate essentially functions as a highway network. To stabilize the variance of the layers' inputs, it is recommended to replace the residual connections x + f ( x ) {\displaystyle x+f(x)} with x / L + f ( x ) {\displaystyle x/L+f(x)} , where L {\displaystyle L} is the total number of residual layers. === Projection connection === If the function F {\displaystyle F} is of type F : R n → R m {\displaystyle F:\mathbb {R} ^{n}\to \mathbb {R} ^{m}} where n ≠ m {\displaystyle n\neq m} , then F ( x ) + x {\displaystyle F(x)+x} is undefined. To handle this special case, a projection connection is used: y = F ( x ) + P ( x ) {\displaystyle y=F(x)+P(x)} where P {\displaystyle P} is typically a linear projection, defined by P ( x ) = M x {\displaystyle P(x)=Mx} where M {\displaystyle M} is a m × n {\displaystyle m\times n} matrix. The matrix is trained via backpropagation, as is any other parameter of the model. === Signal propagation === The introduction of identity mappings facilitates signal propagation in both forward and backward paths. ==== Forward propagation ==== If the output of the ℓ {\displaystyle \ell } -th residual block is the input to the ( ℓ + 1 ) {\displaystyle (\ell +1)} -th residual block (assuming no activation function between blocks), then the ( ℓ + 1 ) {\displaystyle (\ell +1)} -th input is: x ℓ + 1 = F ( x ℓ ) + x ℓ {\displaystyle x_{\ell +1}=F(x_{\ell })+x_{\ell }} Applying this formulation recursively, e.g.: x ℓ + 2 = F ( x ℓ + 1 ) + x ℓ + 1 = F ( x ℓ + 1 ) + F ( x ℓ ) + x ℓ {\displaystyle {\begin{aligned}x_{\ell +2}&=F(x_{\ell +1})+x_{\ell +1}\\&=F(x_{\ell +1})+F(x_{\ell })+x_{\ell }\end{aligned}}} yields the general relationship: x L = x ℓ + ∑ i = ℓ L − 1 F ( x i ) {\displaystyle x_{L}=x_{\ell }+\sum _{i=\ell }^{L-1}F(x_{i})} where L {\textstyle L} is the index of a residual block and ℓ {\textstyle \ell } is the index of some earlier block. This formulation suggests that there is always a signal that is directly sent from a shallower block ℓ {\textstyle \ell } to a deeper block L {\textstyle L} . ==== Backward propagation ==== The residual learning formulation provides the added benefit of mitigating the vanishing gradient problem to some extent. However, it is crucial to acknowledge that the vanishing gradient issue is not the root cause of the degradation problem, which is tackled through the use of normalization. To observe the effect of residual blocks on backpropagation, consider the partial derivative of a loss function E {\displaystyle {\mathcal {E}}} with respect to some residual block input x ℓ {\displaystyle x_{\ell }} . Using the equation above from forward propagation for a later residual block L > ℓ {\displaystyle L>\ell } : ∂ E ∂ x ℓ = ∂ E ∂ x L ∂ x L ∂ x ℓ = ∂ E ∂ x L ( 1 + ∂ ∂ x ℓ ∑ i = ℓ L − 1 F ( x i ) ) = ∂ E ∂ x L + ∂ E ∂ x L ∂ ∂ x ℓ ∑ i = ℓ L − 1 F ( x i ) {\displaystyle {\begin{aligned}{\frac {\partial {\mathcal {E}}}{\partial x_{\ell }}}&={\frac {\partial {\mathcal {E}}}{\partial x_{L}}}{\frac {\partial x_{L}}{\partial x_{\ell }}}\\&={\frac {\partial {\mathcal {E}}}{\partial x_{L}}}\left(1+{\frac {\partial }{\partial x_{\ell }}}\sum _{i=\ell }^{L-1}F(x_{i})\right)\\&={\frac {\partial {\mathcal {E}}}{\partial x_{L}}}+{\frac {\partial {\mathcal {E}}}{\partial x_{L}}}{\frac {\partial }{\partial x_{\ell }}}\sum _{i=\ell }^{L-1}F(x_{i})\end{aligned}}} This formulation suggests that the gradient computation of a shallower layer, ∂ E ∂ x ℓ {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{\ell }}}} , always has a later term ∂ E ∂ x L {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{L}}}} that is directly added. Even if the gradients of the F ( x i ) {\displaystyle F(x_{i})} terms are small, the total gradient ∂ E ∂ x ℓ {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{\ell }}}} resists vanishing due to the added term ∂ E ∂ x L {\textstyle {\frac {\partial {\mathcal {E}}}{\partial x_{L}}}} . == Variants of residual blocks == === Basic block === A basic block is the simplest building block studied in the original ResNet. This block consists of two sequential 3x3 convolutional layers and a residual connection. The input and output dimensions of both layers are equal. === Bottleneck block === A bottleneck block consists of three sequential convolutional layers and a residual connection. The first layer in this block is a 1×1 convolution for dimension reduction (e.g., to 1/2 of the input dimension); the second layer performs a 3×3 convolution; the last layer is another 1×1 convolution for dimension restoration. The models of ResNet-50, ResNet-101, and ResNet-152 are all based on bottleneck blocks. === Pre-activation block === The pre-activation residual block applies activation functions before applying the residual function F {\displaystyle F} . Formally, the computation of a pre-activation residual block can be written as: x ℓ + 1 = F ( ϕ ( x ℓ ) ) + x ℓ {\displaystyle x_{\ell +1}=F(\phi (x_{\ell }))+x_{\ell }} where ϕ {\displaystyle \phi } can be any activation (e.g. ReLU) or normalization (e.g. LayerNorm) operation. This design reduces the number of non-identity mappings between residual blocks, and allows an identity mapping directly from the input to the output. This design was used to train models with 200 to over 1000 layers, and was found to consistently outperform variants where the residual path is not an identity function. The pre-activation ResNet with 200 layers took 3 weeks to train for ImageNet on 8 GPUs in 2016. Since GPT-2, transformer blocks have been mostly implemented as pre-activation blocks. This is often referred to as "pre-normalization" in the literature of transformer models. == Applications == Originally, ResNet was designed for computer vision. All transformer architectures include residual connections. Indeed, very deep transformers cannot be trained without them. The original ResNet paper made no claim on being inspired by biological systems. However, later research has related ResNet to biologically-plausible algorithms. A study published in Science in 2023 disclosed the complete connectome of an insect brain (specifically that of a fruit fly larva). This study discovered "multilayer shortcuts" that resemble the skip connections in artificial neural networks, including ResNets. == History == === Previous work === Residual connections were noticed in neu

    Read more →
  • Digital image processing

    Digital image processing

    Digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allows a much wider range of algorithms to be applied to the input data and can avoid problems such as the build-up of noise and distortion during processing. Since images are defined over two dimensions (perhaps more), digital image processing may be modeled in the form of multidimensional systems. The generation and development of digital image processing are mainly affected by three factors: first, the development of computers; second, the development of mathematics (especially the creation and improvement of discrete mathematics theory); and third, the demand for a wide range of applications in environment, agriculture, military, industry and medical science has increased. == History == Many of the techniques of digital image processing, or digital picture processing as it often was called, were developed in the 1960s, at Bell Laboratories, the Jet Propulsion Laboratory, Massachusetts Institute of Technology, University of Maryland, and a few other research facilities, with application to satellite imagery, wire-photo standards conversion, medical imaging, videophone, character recognition, and photograph enhancement. The purpose of early image processing was to improve the quality of the image. In image processing, the input is a low-quality image, and the output is an image with improved quality. Common image processing includes image enhancement, restoration, encoding, and compression. The first successful application was the American Jet Propulsion Laboratory (JPL). They used image processing techniques such as geometric correction, gradation transformation, noise removal, etc. on the thousands of lunar photos sent back by the Space Detector Ranger 7 in 1964, taking into account the position of the Sun and the environment of the Moon. The impact of the successful mapping of the Moon's surface map by the computer has been a success. Later, more complex image processing was performed on the nearly 100,000 photos sent back by the spacecraft, so that the topographic map, color map and panoramic mosaic of the Moon were obtained, which achieved extraordinary results and laid a solid foundation for human landing on the Moon. The cost of processing was fairly high, however, with the computing equipment of that era. That changed in the 1970s, when digital image processing proliferated as cheaper computers and dedicated hardware became available. This led to images being processed in real-time, for some dedicated problems such as television standards conversion. As general-purpose computers became faster, they started to take over the role of dedicated hardware for all but the most specialized and computer-intensive operations. With the fast computers and signal processors available in the 2000s, digital image processing has become the most common form of image processing, and is generally used because it is not only the most versatile method, but also the cheapest. === Image sensors === The basis for modern image sensors is metal–oxide–semiconductor (MOS) technology, invented at Bell Labs between 1955 and 1960, This led to the development of digital semiconductor image sensors, including the charge-coupled device (CCD) and later the CMOS sensor. The charge-coupled device was invented by Willard S. Boyle and George E. Smith at Bell Labs in 1969. While researching MOS technology, they realized that an electric charge was the analogy of the magnetic bubble and that it could be stored on a tiny MOS capacitor. As it was fairly straightforward to fabricate a series of MOS capacitors in a row, they connected a suitable voltage to them so that the charge could be stepped along from one to the next. The CCD is a semiconductor circuit that was later used in the first digital video cameras for television broadcasting. The NMOS active-pixel sensor (APS) was invented by Olympus in Japan during the mid-1980s. This was enabled by advances in MOS semiconductor device fabrication, with MOSFET scaling reaching smaller micron and then sub-micron levels. The NMOS APS was fabricated by Tsutomu Nakamura's team at Olympus in 1985. The CMOS active-pixel sensor (CMOS sensor) was later developed by Eric Fossum's team at the NASA Jet Propulsion Laboratory in 1993. By 2007, sales of CMOS sensors had surpassed CCD sensors. MOS image sensors are widely used in optical mouse technology. The first optical mouse, invented by Richard F. Lyon at Xerox in 1980, used a 5 μm NMOS integrated circuit sensor chip. Since the first commercial optical mouse, the IntelliMouse introduced in 1999, most optical mouse devices use CMOS sensors. === Image compression === An important development in digital image compression technology was the discrete cosine transform (DCT), a lossy compression technique first proposed by Nasir Ahmed in 1972. DCT compression became the basis for JPEG, which was introduced by the Joint Photographic Experts Group in 1992. JPEG compresses images down to much smaller file sizes, and has become the most widely used image file format on the Internet. Its highly efficient DCT compression algorithm was largely responsible for the wide proliferation of digital images and digital photos, with several billion JPEG images produced every day as of 2015. Medical imaging techniques produce very large amounts of data, especially from CT, MRI and PET modalities. As a result, storage and communications of electronic image data are prohibitive without the use of compression. JPEG 2000 image compression is used by the DICOM standard for storage and transmission of medical images. The cost and feasibility of accessing large image data sets over low or various bandwidths are further addressed by use of another DICOM standard, called JPIP, to enable efficient streaming of the JPEG 2000 compressed image data. === Digital signal processor (DSP) === Electronic signal processing was revolutionized by the wide adoption of MOS technology in the 1970s. MOS integrated circuit technology was the basis for the first single-chip microprocessors and microcontrollers in the early 1970s, and then the first single-chip digital signal processor (DSP) chips in the late 1970s. DSP chips have since been widely used in digital image processing. The discrete cosine transform (DCT) image compression algorithm has been widely implemented in DSP chips, with many companies developing DSP chips based on DCT technology. DCTs are widely used for encoding, decoding, video coding, audio coding, multiplexing, control signals, signaling, analog-to-digital conversion, formatting luminance and color differences, and color formats such as YUV444 and YUV411. DCTs are also used for encoding operations such as motion estimation, motion compensation, inter-frame prediction, quantization, perceptual weighting, entropy encoding, variable encoding, and motion vectors, and decoding operations such as the inverse operation between different color formats (YIQ, YUV and RGB) for display purposes. DCTs are also commonly used for high-definition television (HDTV) encoder/decoder chips. == Tasks == Digital image processing allows the use of much more complex algorithms, and hence, can offer both more sophisticated performance at simple tasks, and the implementation of methods which would be impossible by analogue means. In particular, digital image processing is a concrete application of, and a practical technology based on: Classification Feature extraction Multi-scale signal analysis Pattern recognition Projection Some techniques that are used in digital image processing include: Anisotropic diffusion Hidden Markov models Image editing Image restoration Independent component analysis Linear filtering Neural networks Partial differential equations Pixelation Point feature matching Principal components analysis Self-organizing maps Wavelets == Digital image transformations == === Filtering === Digital filters are used to blur and sharpen digital images. Filtering can be performed by: convolution with specifically designed kernels (filter array) in the spatial domain masking specific frequency regions in the frequency (Fourier) domain The following examples show both methods: ==== Image padding in Fourier domain filtering ==== Images are typically padded before being transformed to the Fourier space, the highpass filtered images below illustrate the consequences of different padding techniques: Notice that the highpass filter shows extra edges when zero padded compared to the repeated edge padding. ==== Filtering code examples ==== MATLAB example for spatial domain highpass filtering. === Affine transformations === Affine transformations enable basic image transformations including scale, rotate, translate, mirror and shear as is shown in the following examples: To apply the affine

    Read more →