
Oracle Vectors and Similarity Search for Christmas
Heli Helskyaho
9.12.2025
As the holiday season approaches, many of us find ourselves searching for the perfect playlist, gift recommendations, family photos, or recipes that capture the right feeling. But, how can one do those kinds of “feeling searches”? The “feeling” can be captured by a vector embedding, and these vector embeddings can be compared to each other using a similarity search to find similar feelings.
The Oracle AI Database supports vector embeddings and similarity search natively. These capabilities allow applications to move beyond exact matches and rigid rules, and instead find content based on semantic or perceptual similarity.
What is a vector embedding?
A vector embedding is a numerical representation of the essential characteristics of text, images, audio, or any other data in a form that machines can compare mathematically. Items that are similar will have embeddings that are numerically close to each other. For example, you can compare two classical Christmas songs, two red Nordic-style sweaters, or two warmly lit living-room photos.
How does it work in an Oracle Database?
Oracle Database 23ai introduced a native VECTOR data type, vector indexes for an efficient approximate similarity search, and built-in SQL functions for generating vector embeddings (VECTOR_EMBEDDING), distance functions (VECTOR_DISTANCE, and its equivalents of different distance metrics, for example COSINE_DISTANCE and L1_DISTANCE) for comparing the vector embeddings, as well as several PL/SQL packages for advanced operations with vectors (DBMS_VECTOR, DBMS_VECTOR_CHAIN, DBMS_HYBRID_VECTOR).
Let’s see a couple of examples of how they work.
Example 1: Building the Ideal Christmas Playlist
Suppose a table CHRISTMAS_SONGS contains a column EMBEDDING of type VECTOR for each track. To find songs that are most similar to Michael Bublé’s “It’s Beginning to Look A Lot Like Christmas”:
SELECT song_name, artist, year
FROM christmas_songs
ORDER BY VECTOR_DISTANCE(
embedding,
(SELECT embedding FROM christmas_songs
WHERE song_name = 'It’s Beginning to Look a Lot Like Christmas')
)
FETCH FIRST 20 ROWS ONLY;
The result naturally surfaces songs by Bing Crosby, Dean Martin, Frank Sinatra, and Ella Fitzgerald, all sharing the same warm, crooner-era holiday atmosphere, without any explicit genre or year filters, and returning the 20 closest songs to Michael Bublé’s Christmas classic.
Example 2: Intelligent Christmas Gift Recommendations
An e-commerce table GIFT_CATALOG stores product data, including their images. The IMAGE_EMBEDDING column stores the vector embedding of each image.
After a customer purchases a hand-knitted Scandinavian reindeer sweater, the recommendation engine runs:
SELECT product_name, price, image_url
FROM gift_catalog
WHERE category = 'Christmas'
ORDER BY VECTOR_DISTANCE(image_embedding, :purchased_item_embedding)
FETCH FIRST 10 ROWS ONLY;
The system returns 10 visually and stylistically coherent suggestions of similar products as gift recommendations.
Example 3: Searching Family Christmas Photos by Mood
Let's say you have 10,000 family Christmas photos over the years. You want to find 50 pictures that “feel like” the one where everyone is wearing ugly sweaters around the tree in 2017. A personal photo archive table FAMILY_PHOTOS contains embeddings generated from each image. To retrieve 50 photos from the years 2000-2025 that feel like the iconic 2017 photo referred as :reference_photo_embedding, simply query:
SELECT photo_id, taken_date, thumbnail_url
FROM family_photos
WHERE EXTRACT(YEAR FROM taken_date) BETWEEN 2000 AND 2025
ORDER BY VECTOR_DISTANCE(photo_embedding, :reference_photo_embedding)
FETCH FIRST 50 ROWS ONLY;
The query instantly assembles a heartfelt montage of similar cozy, festive moments across decades.
Example 4: Finding Better Alternatives to Traditional Fruitcake
A recipe database stores text embeddings of ingredient lists and descriptions. A user searching for “something like Grandma’s rum-soaked fruitcake but less dense” can have their natural language query converted to an embedding and matched to the whole database to find the eight best-matching recipes:
SELECT recipe_name, rating
FROM holiday_recipes
ORDER BY VECTOR_DISTANCE(text_embedding, :query_embedding)
FETCH FIRST 8 ROWS ONLY;
Relevant, highly rated alternatives such as Jamaican black cake, German stollen, or panettone rise to the top.
Why This Matters
Traditional relational queries excel at precise conditions, for example, customer_no = 42, price < 50, color = red, rating ≥ 4. Vector similarity search excels at capturing nuance, mood, style, and intent.
Oracle has made the technology accessible and production-ready:
- Vectors live alongside regular columns in the same table
- Embeddings can be created either inside the database or outside of it
- Indexing and querying require only a few lines of SQL
- Performance scales to billions of vectors with reasonable response times
- Integration with existing security, backup, and high-availability features is seamless
- Integration with existing multimodal features of an Oracle Database is seamless
Conclusion
This holiday season, while we enjoy the lights, music, and traditions that feel just right, it’s worth noting that modern databases can now understand “feelings” in a surprisingly human way. Oracle’s vector capabilities bring semantic and perceptual search into the mainstream enterprise applications; no separate specialized engine required. Whether you’re curating the perfect Christmas playlist, recommending thoughtful gifts, rediscovering cherished memories, or rescuing dessert, vector similarity search delivers results that simply feel right.
To explore these features yourself, Oracle’s Always Free Autonomous AI Database includes full vector search support.
Happy holidays, and happy querying. 🎄
Leveraging Oracle APEX and Generative AI to Unify Project Reporting for Siemens Energy
Elmer Nickels
12.12.2025
Managing large-scale engineering projects is complex enough without the administrative burden of reporting on them. At Siemens Energy, a global leader in energy technology, Project Managers (PMs) were spending valuable hours every week manually compiling reports. This process was disconnected, labor-intensive, and inconsistent.
Miracle (miracleoy.fi) partnered with Siemens Energy to change that. By moving from scattered Word templates to a unified Oracle APEX application enhanced by Generative AI, we transformed reporting from a chore into a streamlined operation.
The Challenge: The "Template Trap"
Before the transformation, Siemens Energy faced a common enterprise hurdle: the "Template Trap."
While the project data existed in robust databases, the reporting mechanism was manual. PMs had to hunt down data from various systems and manually copy-paste it into individual Word documents. Because these documents lived on local drives or disjointed share points, there was no "single source of truth."
This created three distinct friction points:
- High Manual Effort: PMs were acting as data scribes rather than managers.
- Inconsistent Narratives: One PM might write a detailed essay on a technical hiccup, while another might write three vague bullet points.
- Lack of Portfolio Visibility: For leadership, aggregating these disparate Word docs to get a clear view of project health was nearly impossible.
The Solution: A Unified, Intelligent Hub
To solve this, we re-engineered the process. We built a custom reporting application using Oracle APEX, chosen for its rapid development capabilities and seamless integration with Siemens Energy’s existing Oracle database infrastructure.
The new system automates the heavy lifting. Instead of typing out fields, the APEX app pulls live data—financials, timelines, and milestones—directly from the source database. The report structure is now enforced programmatically, ensuring every project report looks and feels the same.
Still, the true game-changer was addressing the qualitative side of reporting. How do you standardize the written explanation of project status, technical issues, or product bulletins?
We integrated Large Language Models (LLMs) directly into the reporting workflow. Here is how it works:
- Data Ingestion: The system aggregates product bulletins and status entries associated with a project.
- AI Summarization: The LLM processes this data and generates concise, natural-language executive summaries. It translates complex technical data into clear business context.
- Human-in-the-Loop: The AI generates a draft, but the PM retains the controls. They review the generated summaries, make necessary edits, and approve the final text.
This approach ensures the speed of automation with the accuracy of human oversight.
Comparison: Manual vs. Automated Reporting
To understand the business impact, let’s look at the shift in methodology:
| Feature | The Manual Approach | The New APEX + AI Solution |
| Data Source | Manual copy/paste from multiple systems | Automated real-time pull from Database |
| Report Structure | Varied Word templates per PM | Unified, enforce structure globally |
| Summary Creation | PM writes from scratch | LLM generates drafts from raw inputs |
| Consistency | Highly variable quality and depth | Standardized tone and format |
| Visibility | Siloed in documents | Aggregated and queryable data |
Business Value Delivered
The shift to an APEX-based solution with AI integration delivered immediate value to Siemens Energy:
- Reclaiming PM Time: By automating data entry and drafting narratives, PMs reduced the time spent on reporting significantly, freeing them up to focus on project delivery and risk mitigation.
- Standardized "One Truth": Leadership now has a consistent view across diverse projects. Because the structure is unified, comparing project status across the portfolio is seamless.
- Data Integrity: By removing the manual copy-paste step, human error was virtually eliminated from the quantitative data.
Conclusion
At Siemens Energy, we proved that project reporting doesn't have to be a manual burden. By combining the data-handling power of Oracle APEX with the summarization capabilities of Generative AI, we turned a fragmented documentation process into a streamlined, intelligent system.
The result? Reports that write themselves (almost), and Project Managers who can get back to managing projects.
There are several public AI services into which you can load your own documents and ask questions about their contents. But what if the documents are confidential and should not be sent outside the company? And what if there are tens or hundreds of documents, making it impossible to ask questions from all of them at the same time using the public services?
Oracle tools for Machine Learning
Defining Data Model Quality Metrics for Data Vault 2.0 Model Evaluation
By Heli Helskyaho, Laura Ruotsalainen, Tomi Männistö
Keywords: data warehouse; Data Vault 2.0; data model; metrics
Towards Automating Database Designing
Keywords: data warehouse; Technological innovation, Computer languages, Databases, Soft sensors, Decision making, Chatbots, Data models
Introduction to AI Services in the Oracle Cloud Infrastructure
LLMs, GPTs, and All That Jazz
introduced in 2017. Large Language Models (LLMs), that for example ChatGPT uses, are based on this transformer architecture and have made significant advancements in natural language processing. The acronym GPT comes from words Generative Pre-trained Transformer. We will discuss the technology in later issues of ORAWORLD. In this article we will talk about how a GPT tool can be used and what are the risks and limitation you should be aware of. We will use ChatGPT as an example. (Page 12)
Machine Learning For Beginners
Developer Strategies: How to Use Free Cloud Services
Here fishy, fishy. To entice developers to their platforms, cloud providers all offer free versions of a selection of their cloud services. The goal, of course, is to hook them with tasty functionality and keep them as paying customers for the long haul.
Free services from Oracle, Amazon, Azure, Google and others usually break down something like this: New customers can get a few hundred dollars of free credits to use full versions of cloud services until they burn through those credits. Existing customers can also get free short-term access to a smaller number of services to test and train on before deciding whether to buy.
The story behind a COVID-19 exposure-tracking application in Finland
In September 2020, COVID-19 was spreading fast and was extremely dangerous, with people globally afraid of becoming infected. Before vaccinations became available, avoiding exposure was the only way to keep safe and minimize the spread.
In Finland, a group of passionate volunteers made it their mission to collect all available exposure data in a blog and report it on Twitter. Although the blog was a great asset to the public, maintaining it became very time-consuming. Data needed to be copied into Microsoft Excel spreadsheets for further analysis, and the volunteers needed to create new charts and reports continually.
Autonomous Databases Give You Time for Data Modeling
Anyone who’s worked with Heli Helskyaho—Oracle ACE director, EMEA Oracle User Group community ambassador, and author—on a database project or experienced one of her talks knows she likes to make things fun. That, she says, is one reason she’s excited about autonomous databases from Oracle.