Skip to main content

34. Data Migration & Integration -- Moving Data Safely

Level: Advanced SQL Server for real projects

ℹ️ What You'll Learn
  • ETL (Extract, Transform, Load) concepts
  • BULK INSERT syntax
  • BCP (Bulk Copy Program) utility
  • Importing from CSV/Excel
  • Data validation before load
  • Handling duplicates
  • Rollback on error
  • Common migration mistakes
  • Best practices

Data migration risks data loss. Planning prevents it. This article teaches safe imports.

Why Migration Matters for Backend Developers

Bulk operations need care:

Import 50K student records from Excel
v
Without validation: Duplicates, invalid data, corrupted
WITH validation: Clean, consistent, auditable
v
Migration success: Data integrity maintained

Migration = ETL pipeline.

ETL Concept

Extract: Get data (CSV, API, database). Transform: Clean, validate, standardize. Load: Insert into target.

Example 1: BULK INSERT from CSV

Import student list:

-- Create staging table (temporary)
CREATE TABLE Student_Staging (
Name NVARCHAR(100),
RollNumber NVARCHAR(20),
ClassName NVARCHAR(10),
ParentName NVARCHAR(100),
ParentPhone NVARCHAR(20)
);

-- BULK INSERT from CSV
BULK INSERT Student_Staging
FROM 'C:\Data\students.csv'
WITH (
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2, -- Skip header
TABLOCK
);

-- Check staged data
SELECT * FROM Student_Staging;

CSV imported to staging table. Review before moving to production.

Example 2: Data Validation

Validate before loading:

-- Identify invalid rows
SELECT * FROM Student_Staging
WHERE
Name IS NULL OR Name = ''
OR RollNumber IS NULL
OR ClassName NOT IN ('10-A', '10-B', '11-A', '11-B')
OR ParentPhone NOT LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]';
-- Show rows with errors

-- Count valid rows
SELECT COUNT(*) AS ValidRows FROM Student_Staging
WHERE
Name IS NOT NULL AND Name != ''
AND RollNumber IS NOT NULL
AND ClassName IN ('10-A', '10-B', '11-A', '11-B')
AND ParentPhone LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]';

Validate: Empty fields, invalid values, format errors.

Example 3: Handle Duplicates

Identify and skip duplicates:

-- Find duplicate roll numbers
SELECT RollNumber, ClassName, COUNT(*) AS Count
FROM Student_Staging
GROUP BY RollNumber, ClassName
HAVING COUNT(*) > 1;

-- Remove duplicates (keep first)
DELETE FROM Student_Staging
WHERE StudentId NOT IN (
SELECT MIN(StudentId) FROM Student_Staging
GROUP BY RollNumber, ClassName
);

-- Check for existing students (in production)
SELECT * FROM Student_Staging st
WHERE EXISTS (
SELECT 1 FROM Student s
WHERE s.RollNumber = st.RollNumber
AND s.ClassName = st.ClassName
);
-- Shows students already in system

Remove staging duplicates, check for existing.

Example 4: Transaction-based Load

Insert with rollback on error:

BEGIN TRANSACTION;
BEGIN TRY
-- Validate count
DECLARE @ValidCount INT;
SELECT @ValidCount = COUNT(*) FROM Student_Staging
WHERE Name IS NOT NULL AND RollNumber IS NOT NULL;

IF @ValidCount = 0
BEGIN
THROW 50001, 'No valid rows to load', 1;
END;

-- Insert
INSERT INTO Student (Name, RollNumber, ClassName, ParentName, ParentPhone, Status)
SELECT Name, RollNumber, ClassName, ParentName, ParentPhone, 'Active'
FROM Student_Staging
WHERE Name IS NOT NULL AND RollNumber IS NOT NULL;

-- Log migration
INSERT INTO AuditLog (Action, EntityName, ChangedBy, ChangedAt)
VALUES ('BulkImport', 'Student', SYSTEM_USER, GETDATE());

COMMIT TRANSACTION;
SELECT CAST(@@ROWCOUNT AS VARCHAR) + ' students loaded successfully' AS Result;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
SELECT ERROR_MESSAGE() AS ErrorMessage;
END CATCH

Transaction: All or nothing. Rollback on error.

Example 5: BCP Utility (Command Line)

Export/import using BCP:

REM Export to file
bcp SchoolDB.dbo.Student out C:\backup\students.bcp -S localhost -U sa -P password -T

REM Import from file
bcp SchoolDB.dbo.Student in C:\backup\students.bcp -S localhost -U sa -P password -T -h "TABLOCK"

BCP: Command-line tool. Fast for large volumes.

Example 6: Staging Table Pattern

Safe 3-step import:

-- Step 1: BULK INSERT to staging
BULK INSERT Student_Staging FROM 'C:\data.csv' WITH (FIELDTERMINATOR = ',');

-- Step 2: Validate and clean
DELETE FROM Student_Staging WHERE Name IS NULL;
DELETE FROM Student_Staging WHERE RollNumber IN (SELECT RollNumber FROM Student);

-- Step 3: Load to production
INSERT INTO Student (Name, RollNumber, ClassName, Status)
SELECT Name, RollNumber, ClassName, 'Active' FROM Student_Staging;

-- Cleanup
DROP TABLE Student_Staging;

Staging -> Validation -> Production.

Example 7: Handle Data Type Conversion

Convert during import:

CREATE TABLE ExamResult_Staging (
StudentId NVARCHAR(20),
ExamId NVARCHAR(20),
MarksObtained NVARCHAR(10),
ImportDate NVARCHAR(10)
);

BULK INSERT ExamResult_Staging FROM 'C:\exam_results.csv';

-- Convert and insert
INSERT INTO ExamResult (StudentId, ExamId, MarksObtained)
SELECT
CAST(StudentId AS INT),
CAST(ExamId AS INT),
CAST(MarksObtained AS DECIMAL(5,2))
FROM ExamResult_Staging
WHERE
ISNUMERIC(StudentId) = 1
AND ISNUMERIC(ExamId) = 1
AND ISNUMERIC(MarksObtained) = 1
AND TRY_CAST(MarksObtained AS DECIMAL(5,2)) IS NOT NULL;

Validate types before casting.


Common Migration Mistakes

Mistake 1: No staging table (direct import)

Wrong:

BULK INSERT Student FROM 'file.csv';
-- Bad data mixed with production immediately

Result: Corrupted production data.

Fix: Use staging:

BULK INSERT Student_Staging FROM 'file.csv';
-- Validate, clean
INSERT INTO Student SELECT * FROM Student_Staging WHERE valid;

Mistake 2: No validation (garbage in)

Wrong:

INSERT INTO Student SELECT * FROM Staging;
-- No checks: NULLs, invalid class names, duplicates

Result: Data integrity violations.

Fix: Validate before insert:

INSERT INTO Student
SELECT * FROM Staging
WHERE Name IS NOT NULL AND ClassName IN (...)
AND RollNumber NOT IN (SELECT RollNumber FROM Student);

Mistake 3: No rollback on error

Wrong:

INSERT INTO Student SELECT * FROM Staging;
INSERT INTO Fee SELECT * FROM FeeStagin; -- Typo, fails
-- Student inserted, Fee not (inconsistent)

Fix: Transaction:

BEGIN TRANSACTION;
INSERT INTO Student ...;
INSERT INTO Fee ...;
COMMIT;
-- All or nothing

Mistake 4: Large batch (out of memory)

Wrong:

INSERT INTO Student SELECT * FROM StagingWith1M rows;
-- Memory error, partial insert

Fix: Batch process:

WHILE EXISTS (SELECT 1 FROM Staging WHERE Processed = 0)
BEGIN
INSERT INTO Student
SELECT TOP 1000 * FROM Staging WHERE Processed = 0;
UPDATE Staging SET Processed = 1 WHERE StudentId IN (SELECT TOP 1000 StudentId ...);
END

Mistake 5: Not testing migration

Wrong:

Import 50K rows to production (first time)
-- Unexpected errors, data corrupted

Fix: Test first:

-- Test on copy of production
BULK INSERT Student_Test FROM 'file.csv';
-- Validate, measure time, check errors
-- Then run on production

Best Practices for Migration

  1. Use staging tables -- Separate validation from production
  2. Validate before load -- Check data quality
  3. Transaction-based -- All or nothing
  4. Batch large loads -- Avoid memory errors
  5. Handle duplicates -- Check existing data
  6. Test first -- On test database
  7. Log migration -- Track what loaded
  8. Rollback plan -- Know how to undo

🎯 Q1: What's ETL?

Extract: Get source data (CSV, API, database). Transform: Clean, validate, standardize. Load: Insert into target.

Migration pipeline: Extract -> Transform -> Load.

🎯 Q2: Should I import directly to production?

No, use staging:

File -> Staging (validate, clean) -> Production

Staging: Safe testing ground. Production: Validated only.

🎯 Q3: How do I handle duplicates?
-- Find duplicates
SELECT RollNumber, COUNT(*) FROM Staging
GROUP BY RollNumber HAVING COUNT(*) > 1;

-- Keep first, delete others
DELETE FROM Staging WHERE StudentId NOT IN (
SELECT MIN(StudentId) FROM Staging
GROUP BY RollNumber
);

Remove duplicates in staging before production insert.

🎯 Q4: What's BCP?

Bulk Copy Program (command-line):

bcp Table in file.bcp -S server -U user -P password

Fast import/export. Use for large volumes (1M+ rows).

🎯 Q5: How do I validate data before load?
SELECT * FROM Staging
WHERE
Name IS NULL -- Missing required
OR ClassName NOT IN (...) -- Invalid value
OR MarksObtained < 0 -- Out of range
OR RollNumber IN (SELECT RollNumber FROM Student); -- Duplicate

Test before inserting.

🎯 Q6: Should migration be wrapped in transaction?

Yes:

BEGIN TRANSACTION;
BEGIN TRY
INSERT INTO Student ...;
INSERT INTO Fee ...;
COMMIT;
END TRY
BEGIN CATCH
ROLLBACK;
END CATCH

All-or-nothing. Prevent partial inserts.


🤖Use AI to Learn Faster
⚠️ Important for beginners: Do NOT use AI to write your code yet. Type every example yourself. Your brain learns by doing, not by reading AI output. Use AI only to explain and quiz you — not to code for you. Once you have strong fundamentals, AI becomes a powerful productivity tool for repetitive tasks.

Use ChatGPT, Claude, or Copilot to go deeper on Data migration and ETL in SQL Server. Try these prompts:

  • "Show BULK INSERT from CSV with validation"
  • "How do I use staging table pattern (3 steps)?"
  • "What's ETL? Explain Extract, Transform, Load"
  • "How do I handle duplicates during import?"
  • "Quiz me: ask 5 questions about migration, validation, staging tables"

💡 Tip: After reading this article, paste your own code into AI and ask "What could go wrong here and why?" — fastest way to find edge cases and deepen understanding.

Next Article

35. Best Practices & Database Design -- Building Robust Systems ->

nexcoding.in