TL;DR
- Clarify ROI before adopting AI. The question isn't "how can we use AI?" — it's "what problem are we solving with AI?"
- Prompt engineering is 90% of LLM success. Master Few-shot and Chain of Thought techniques.
- Use RAG to put internal data to work. Your chunking strategy and vector DB choice make or break it.
- In production, cost control, rate limit handling, and fallback logic are non-negotiable.
Introduction: The Reality of AI Adoption
"We want to implement AI."
If you're an engineer, you've probably heard this from leadership. But AI is a tool, not a goal. The real question is: what problem are you trying to solve?
Our team has worked on more than ten AI projects over the past two years. The gap between successful and failed projects was always clear.
What success looks like:
- Starting from a concrete business problem
- Starting small and validating impact early
- Integrating naturally into existing human workflows
What failure looks like:
- Vague goals like "do something with AI"
- Large upfront investment before proving value
- No plan for integrating with existing processes
This article walks through a practical approach to AI implementation, with a focus on LLMs (large language models).
Deciding Whether to Adopt AI
Identifying the Right Use Cases
Use cases where AI works well:
✅ Tasks AI is good at
- Processing large volumes of text (summarization, classification, extraction)
- Pattern recognition (anomaly detection, recommendations)
- Natural language interfaces (chatbots, search)
- Automating repetitive tasks that still require judgment
❌ Tasks AI is not good at
- Processes requiring 100% accuracy (e.g., final review of legal documents)
- Processes with extremely tight real-time requirements
- Domains with very little data
- High-stakes decisions with strict accountability requirementsAn ROI Framework
interface AIProjectROI {
// Cost factors
developmentCost: number; // Development cost
apiCost: number; // API usage fee (monthly)
infrastructureCost: number; // Infrastructure cost
maintenanceCost: number; // Operations and maintenance cost
// Benefit factors
timeSavingHours: number; // Hours of work saved per month
hourlyRate: number; // Hourly rate equivalent
qualityImprovement: number; // Value from quality improvements
newRevenueOpportunity: number; // New revenue opportunities
}
function calculateROI(project: AIProjectROI): {
monthlyBenefit: number;
monthlyCost: number;
paybackMonths: number;
} {
const monthlyBenefit =
(project.timeSavingHours * project.hourlyRate) +
project.qualityImprovement +
project.newRevenueOpportunity;
const monthlyCost =
project.apiCost +
project.infrastructureCost +
project.maintenanceCost;
const paybackMonths = project.developmentCost / (monthlyBenefit - monthlyCost);
return { monthlyBenefit, monthlyCost, paybackMonths };
}
// Example: automated customer support response system
const supportBot = calculateROI({
developmentCost: 2000000, // 200万円
apiCost: 50000, // 5万円/month
infrastructureCost: 10000, // 1万円/month
maintenanceCost: 30000, // 3万円/month
timeSavingHours: 160, // 160 hours saved per month
hourlyRate: 3000, // 3000 yen/hour equivalent
qualityImprovement: 50000, // 5万円/month from quality gains
newRevenueOpportunity: 0
});
// Result: 53万円 monthly benefit, 9万円 monthly cost, 4.5-month payback periodLLM Usage Patterns
1. Text Classification and Sentiment Analysis
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
interface ClassificationResult {
category: string;
confidence: number;
sentiment: 'positive' | 'neutral' | 'negative';
}
async function classifyCustomerInquiry(
inquiry: string
): Promise<ClassificationResult> {
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content: `You are a customer support inquiry classification system.
Classify the inquiry into one of the following categories:
- billing: billing and payment issues
- technical: technical problems
- account: account-related issues
- general: other general inquiries
Respond in JSON format.`
},
{
role: 'user',
content: inquiry
}
],
response_format: { type: 'json_object' },
temperature: 0.1 // Keep temperature low for stable classification results
});
return JSON.parse(response.choices[0].message.content!);
}
// Usage
const result = await classifyCustomerInquiry(
'It looks like my credit card was charged twice.'
);
// { category: 'billing', confidence: 0.95, sentiment: 'negative' }2. Text Generation and Summarization
async function summarizeDocument(
document: string,
maxLength: number = 200
): Promise<string> {
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content: `You are an expert document summarizer.
Summarize the given document in ${maxLength} characters or fewer.
Be concise while capturing all key points.`
},
{
role: 'user',
content: document
}
],
max_tokens: Math.ceil(maxLength * 1.5), // Japanese text is approximately 1.5 tokens per character
temperature: 0.3
});
return response.choices[0].message.content!;
}3. Structured Data Extraction
import { z } from 'zod';
// Type definition for extracted data
const InvoiceSchema = z.object({
vendorName: z.string(),
invoiceNumber: z.string(),
invoiceDate: z.string(),
dueDate: z.string(),
totalAmount: z.number(),
taxAmount: z.number(),
lineItems: z.array(z.object({
description: z.string(),
quantity: z.number(),
unitPrice: z.number(),
amount: z.number()
}))
});
type Invoice = z.infer<typeof InvoiceSchema>;
async function extractInvoiceData(invoiceText: string): Promise<Invoice> {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `You are an invoice data extraction system.
Extract the following fields from the invoice text and return them as JSON:
- vendorName: issuing company name
- invoiceNumber: invoice number
- invoiceDate: invoice date (YYYY-MM-DD format)
- dueDate: payment due date (YYYY-MM-DD format)
- totalAmount: total amount (numeric)
- taxAmount: tax amount (numeric)
- lineItems: array of line item objects`
},
{
role: 'user',
content: invoiceText
}
],
response_format: { type: 'json_object' },
temperature: 0
});
const data = JSON.parse(response.choices[0].message.content!);
// Validate with Zod
return InvoiceSchema.parse(data);
}Prompt Engineering
Core Principles
- Be explicit: Eliminate ambiguity and give specific instructions
- Provide context: Supply the necessary background information
- Specify output format: Clearly state the expected format
- Use examples (Few-shot): Demonstrate with concrete examples to improve accuracy
Few-shot Learning
async function categorizeProduct(productName: string): Promise<string> {
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content: 'Determine the category for a given product name.'
},
// Few-shot examples
{ role: 'user', content: 'iPhone 15 Pro Max 256GB' },
{ role: 'assistant', content: 'Smartphones' },
{ role: 'user', content: 'SONY WH-1000XM5' },
{ role: 'assistant', content: 'Headphones & Earphones' },
{ role: 'user', content: 'MacBook Air M3' },
{ role: 'assistant', content: 'Laptops' },
// Actual input
{ role: 'user', content: productName }
],
temperature: 0.1
});
return response.choices[0].message.content!;
}Chain of Thought
For tasks requiring complex reasoning, walking the model through its thinking step by step significantly improves accuracy.
async function analyzeBusinessProblem(problem: string): Promise<{
analysis: string;
recommendations: string[];
}> {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `You are a business consultant.
When analyzing a problem, work through the following steps:
1. Identify the core issue
2. List all contributing factors
3. Assess the impact of each factor
4. Consider possible solutions
5. Recommend the most effective course of action
Show your reasoning at each step before presenting your final recommendations.`
},
{
role: 'user',
content: problem
}
],
temperature: 0.7
});
// Parse and structure the response
const content = response.choices[0].message.content!;
// ... parsing logic
return { analysis: content, recommendations: [] };
}Implementing RAG (Retrieval-Augmented Generation)
RAG is an architecture where the LLM generates responses grounded in relevant information retrieved from an external knowledge base.
Architecture Overview
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ User │────▶│Query Embed- │────▶│ Vector DB │
│ Question │ │ ding │ │ (Pinecone) │
└─────────────┘ └─────────────┘ └──────┬──────┘
│
┌─────────────┐ │ Similar docs
│ LLM │◀───────────┘
│ (GPT-4) │
└──────┬──────┘
│
┌──────▼──────┐
│ Response │
│ Generation │
└─────────────┘Document Preprocessing and Chunking
interface DocumentChunk {
id: string;
content: string;
metadata: {
source: string;
page?: number;
section?: string;
};
embedding?: number[];
}
function chunkDocument(
document: string,
chunkSize: number = 1000,
overlap: number = 200
): string[] {
const chunks: string[] = [];
let start = 0;
while (start < document.length) {
let end = start + chunkSize;
// Avoid splitting in the middle of a sentence
if (end < document.length) {
const lastPeriod = document.lastIndexOf('。', end);
const lastNewline = document.lastIndexOf('\n', end);
const breakPoint = Math.max(lastPeriod, lastNewline);
if (breakPoint > start) {
end = breakPoint + 1;
}
}
chunks.push(document.slice(start, end).trim());
start = end - overlap;
}
return chunks;
}
// Semantic chunking (a more advanced approach)
async function semanticChunk(
document: string,
maxChunkSize: number = 1500
): Promise<string[]> {
// Split on headings and paragraphs
const sections = document.split(/\n#{1,3}\s/);
const chunks: string[] = [];
let currentChunk = '';
for (const section of sections) {
if (currentChunk.length + section.length > maxChunkSize) {
if (currentChunk) {
chunks.push(currentChunk.trim());
}
currentChunk = section;
} else {
currentChunk += '\n' + section;
}
}
if (currentChunk) {
chunks.push(currentChunk.trim());
}
return chunks;
}Vector Embeddings and Search
import { Pinecone } from '@pinecone-database/pinecone';
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });
const index = pinecone.index('knowledge-base');
// Embed and store documents
async function indexDocuments(chunks: DocumentChunk[]): Promise<void> {
// Generate embedding vectors using the OpenAI Embeddings API
const embeddings = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: chunks.map(c => c.content)
});
// Store in Pinecone
const vectors = chunks.map((chunk, i) => ({
id: chunk.id,
values: embeddings.data[i].embedding,
metadata: {
content: chunk.content,
...chunk.metadata
}
}));
await index.upsert(vectors);
}
// Search for similar documents
async function searchSimilarDocuments(
query: string,
topK: number = 5
): Promise<DocumentChunk[]> {
// Convert the query to an embedding vector
const queryEmbedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: query
});
// Similarity search in Pinecone
const results = await index.query({
vector: queryEmbedding.data[0].embedding,
topK,
includeMetadata: true
});
return results.matches.map(match => ({
id: match.id,
content: match.metadata!.content as string,
metadata: {
source: match.metadata!.source as string,
score: match.score
}
}));
}The Complete RAG Pipeline
async function ragQuery(userQuestion: string): Promise<string> {
// 1. Retrieve relevant documents
const relevantDocs = await searchSimilarDocuments(userQuestion, 5);
// 2. Build the context
const context = relevantDocs
.map(doc => `[Source: ${doc.metadata.source}]\n${doc.content}`)
.join('\n\n---\n\n');
// 3. Generate a response with the LLM
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `You are an internal knowledge base assistant.
Answer questions based solely on the reference materials below.
Rules:
- Only use information found in the reference materials
- If the information is not available, say "No relevant information was found"
- Always cite the source that supports your answer
Reference materials:
${context}`
},
{
role: 'user',
content: userQuestion
}
],
temperature: 0.3
});
return response.choices[0].message.content!;
}API Cost Optimization
Model Selection Strategy
type TaskComplexity = 'simple' | 'moderate' | 'complex';
function selectModel(task: TaskComplexity): string {
const modelMap = {
simple: 'gpt-4o-mini', // Classification, simple extraction
moderate: 'gpt-4o-mini', // Summarization, Q&A
complex: 'gpt-4o' // Complex reasoning, code generation
};
return modelMap[task];
}
// Cost estimation utility
function estimateCost(
model: string,
inputTokens: number,
outputTokens: number
): number {
const pricing: Record<string, { input: number; output: number }> = {
'gpt-4o': { input: 0.0025, output: 0.01 }, // per 1K tokens
'gpt-4o-mini': { input: 0.00015, output: 0.0006 }
};
const rate = pricing[model];
return (inputTokens / 1000 * rate.input) + (outputTokens / 1000 * rate.output);
}Caching Strategy
import { Redis } from 'ioredis';
const redis = new Redis(process.env.REDIS_URL);
async function cachedCompletion(
prompt: string,
options: { model: string; temperature: number }
): Promise<string> {
// Only cache deterministic results (temperature === 0)
if (options.temperature > 0) {
return await directCompletion(prompt, options);
}
const cacheKey = `llm:${options.model}:${hashString(prompt)}`;
// Check cache
const cached = await redis.get(cacheKey);
if (cached) {
return cached;
}
// Make the API call
const result = await directCompletion(prompt, options);
// Cache for 24 hours
await redis.setex(cacheKey, 86400, result);
return result;
}Batch Processing to Reduce Costs
async function batchClassify(
items: string[],
batchSize: number = 20
): Promise<string[]> {
const results: string[] = [];
for (let i = 0; i < items.length; i += batchSize) {
const batch = items.slice(i, i + batchSize);
// Process multiple items in a single request
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content: `Classify each of the following ${batch.length} items.
Return your answer as a JSON array.`
},
{
role: 'user',
content: batch.map((item, idx) => `${idx + 1}. ${item}`).join('\n')
}
],
response_format: { type: 'json_object' }
});
const batchResults = JSON.parse(response.choices[0].message.content!);
results.push(...batchResults.classifications);
}
return results;
}Error Handling and Fallbacks
Handling Rate Limits
import pRetry from 'p-retry';
async function robustCompletion(
messages: OpenAI.ChatCompletionMessageParam[]
): Promise<string> {
return pRetry(
async () => {
try {
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages
});
return response.choices[0].message.content!;
} catch (error: any) {
if (error.status === 429) {
// Retry on rate limit errors
throw error;
}
if (error.status === 500 || error.status === 503) {
// Retry on server errors
throw error;
}
// All other errors fail immediately
throw new pRetry.AbortError(error);
}
},
{
retries: 3,
minTimeout: 1000,
maxTimeout: 10000,
onFailedAttempt: (error) => {
console.log(`Attempt ${error.attemptNumber} failed. Retrying...`);
}
}
);
}Fallback Strategy
async function completionWithFallback(
messages: OpenAI.ChatCompletionMessageParam[]
): Promise<string> {
const models = ['gpt-4o', 'gpt-4o-mini', 'gpt-3.5-turbo'];
for (const model of models) {
try {
const response = await openai.chat.completions.create({
model,
messages,
timeout: 30000
});
return response.choices[0].message.content!;
} catch (error) {
console.error(`${model} failed:`, error);
continue;
}
}
// All models failed
throw new Error('All models failed');
}Monitoring in Production
Key Metrics
interface LLMMetrics {
requestId: string;
model: string;
inputTokens: number;
outputTokens: number;
latencyMs: number;
cost: number;
success: boolean;
errorType?: string;
}
async function trackedCompletion(
messages: OpenAI.ChatCompletionMessageParam[],
options: { model: string }
): Promise<{ result: string; metrics: LLMMetrics }> {
const startTime = Date.now();
const requestId = crypto.randomUUID();
try {
const response = await openai.chat.completions.create({
model: options.model,
messages
});
const metrics: LLMMetrics = {
requestId,
model: options.model,
inputTokens: response.usage!.prompt_tokens,
outputTokens: response.usage!.completion_tokens,
latencyMs: Date.now() - startTime,
cost: estimateCost(
options.model,
response.usage!.prompt_tokens,
response.usage!.completion_tokens
),
success: true
};
// Record metrics
await recordMetrics(metrics);
return {
result: response.choices[0].message.content!,
metrics
};
} catch (error: any) {
const metrics: LLMMetrics = {
requestId,
model: options.model,
inputTokens: 0,
outputTokens: 0,
latencyMs: Date.now() - startTime,
cost: 0,
success: false,
errorType: error.code || 'unknown'
};
await recordMetrics(metrics);
throw error;
}
}Automated Quality Evaluation
// Automated response quality evaluation
async function evaluateResponseQuality(
question: string,
response: string,
expectedTopics: string[]
): Promise<{
relevance: number;
completeness: number;
accuracy: number;
}> {
const evaluation = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `You are a response quality evaluator.
Score each dimension from 0 to 100:
- relevance: how well the answer addresses the question
- completeness: how thoroughly the answer covers the topic
- accuracy: how factually correct the information is
Respond in JSON format.`
},
{
role: 'user',
content: `Question: ${question}\n\nAnswer: ${response}\n\nExpected topics: ${expectedTopics.join(', ')}`
}
],
response_format: { type: 'json_object' }
});
return JSON.parse(evaluation.choices[0].message.content!);
}Summary: Keys to Successful AI Implementation
Adoption Phase
- Clarify the problem: Identify the specific business problem AI will solve
- Evaluate ROI: Estimate costs and benefits quantitatively
- Start small: Validate impact with a proof of concept
Implementation Phase
- Prompt design: Leverage Few-shot and Chain of Thought techniques
- Build RAG: Create a knowledge base from your internal data
- Optimize costs: Choose the right model, use caching, and batch requests
Operations Phase
- Monitor: Continuously track latency, cost, and quality
- Feedback loop: Iterate based on user feedback
- Regular reviews: Periodically review accuracy and ROI
AI implementation is never a one-and-done effort. Continuous improvement and operational discipline are what separate projects that succeed from those that stall.
Resources
If you're struggling with your AI adoption journey, feel free to reach out.
