34. Data Migration & Integration -- Moving Data Safely
Level: Advanced SQL Server for real projects
- ETL (Extract, Transform, Load) concepts
- BULK INSERT syntax
- BCP (Bulk Copy Program) utility
- Importing from CSV/Excel
- Data validation before load
- Handling duplicates
- Rollback on error
- Common migration mistakes
- Best practices
Data migration risks data loss. Planning prevents it. This article teaches safe imports.
Why Migration Matters for Backend Developers
Bulk operations need care:
Import 50K student records from Excel
v
Without validation: Duplicates, invalid data, corrupted
WITH validation: Clean, consistent, auditable
v
Migration success: Data integrity maintained
Migration = ETL pipeline.
ETL Concept
Extract: Get data (CSV, API, database). Transform: Clean, validate, standardize. Load: Insert into target.
Example 1: BULK INSERT from CSV
Import student list:
-- Create staging table (temporary)
CREATE TABLE Student_Staging (
Name NVARCHAR(100),
RollNumber NVARCHAR(20),
ClassName NVARCHAR(10),
ParentName NVARCHAR(100),
ParentPhone NVARCHAR(20)
);
-- BULK INSERT from CSV
BULK INSERT Student_Staging
FROM 'C:\Data\students.csv'
WITH (
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2, -- Skip header
TABLOCK
);
-- Check staged data
SELECT * FROM Student_Staging;
CSV imported to staging table. Review before moving to production.
Example 2: Data Validation
Validate before loading:
-- Identify invalid rows
SELECT * FROM Student_Staging
WHERE
Name IS NULL OR Name = ''
OR RollNumber IS NULL
OR ClassName NOT IN ('10-A', '10-B', '11-A', '11-B')
OR ParentPhone NOT LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]';
-- Show rows with errors
-- Count valid rows
SELECT COUNT(*) AS ValidRows FROM Student_Staging
WHERE
Name IS NOT NULL AND Name != ''
AND RollNumber IS NOT NULL
AND ClassName IN ('10-A', '10-B', '11-A', '11-B')
AND ParentPhone LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]';
Validate: Empty fields, invalid values, format errors.
Example 3: Handle Duplicates
Identify and skip duplicates:
-- Find duplicate roll numbers
SELECT RollNumber, ClassName, COUNT(*) AS Count
FROM Student_Staging
GROUP BY RollNumber, ClassName
HAVING COUNT(*) > 1;
-- Remove duplicates (keep first)
DELETE FROM Student_Staging
WHERE StudentId NOT IN (
SELECT MIN(StudentId) FROM Student_Staging
GROUP BY RollNumber, ClassName
);
-- Check for existing students (in production)
SELECT * FROM Student_Staging st
WHERE EXISTS (
SELECT 1 FROM Student s
WHERE s.RollNumber = st.RollNumber
AND s.ClassName = st.ClassName
);
-- Shows students already in system
Remove staging duplicates, check for existing.
Example 4: Transaction-based Load
Insert with rollback on error:
BEGIN TRANSACTION;
BEGIN TRY
-- Validate count
DECLARE @ValidCount INT;
SELECT @ValidCount = COUNT(*) FROM Student_Staging
WHERE Name IS NOT NULL AND RollNumber IS NOT NULL;
IF @ValidCount = 0
BEGIN
THROW 50001, 'No valid rows to load', 1;
END;
-- Insert
INSERT INTO Student (Name, RollNumber, ClassName, ParentName, ParentPhone, Status)
SELECT Name, RollNumber, ClassName, ParentName, ParentPhone, 'Active'
FROM Student_Staging
WHERE Name IS NOT NULL AND RollNumber IS NOT NULL;
-- Log migration
INSERT INTO AuditLog (Action, EntityName, ChangedBy, ChangedAt)
VALUES ('BulkImport', 'Student', SYSTEM_USER, GETDATE());
COMMIT TRANSACTION;
SELECT CAST(@@ROWCOUNT AS VARCHAR) + ' students loaded successfully' AS Result;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
SELECT ERROR_MESSAGE() AS ErrorMessage;
END CATCH
Transaction: All or nothing. Rollback on error.
Example 5: BCP Utility (Command Line)
Export/import using BCP:
REM Export to file
bcp SchoolDB.dbo.Student out C:\backup\students.bcp -S localhost -U sa -P password -T
REM Import from file
bcp SchoolDB.dbo.Student in C:\backup\students.bcp -S localhost -U sa -P password -T -h "TABLOCK"
BCP: Command-line tool. Fast for large volumes.
Example 6: Staging Table Pattern
Safe 3-step import:
-- Step 1: BULK INSERT to staging
BULK INSERT Student_Staging FROM 'C:\data.csv' WITH (FIELDTERMINATOR = ',');
-- Step 2: Validate and clean
DELETE FROM Student_Staging WHERE Name IS NULL;
DELETE FROM Student_Staging WHERE RollNumber IN (SELECT RollNumber FROM Student);
-- Step 3: Load to production
INSERT INTO Student (Name, RollNumber, ClassName, Status)
SELECT Name, RollNumber, ClassName, 'Active' FROM Student_Staging;
-- Cleanup
DROP TABLE Student_Staging;
Staging -> Validation -> Production.
Example 7: Handle Data Type Conversion
Convert during import:
CREATE TABLE ExamResult_Staging (
StudentId NVARCHAR(20),
ExamId NVARCHAR(20),
MarksObtained NVARCHAR(10),
ImportDate NVARCHAR(10)
);
BULK INSERT ExamResult_Staging FROM 'C:\exam_results.csv';
-- Convert and insert
INSERT INTO ExamResult (StudentId, ExamId, MarksObtained)
SELECT
CAST(StudentId AS INT),
CAST(ExamId AS INT),
CAST(MarksObtained AS DECIMAL(5,2))
FROM ExamResult_Staging
WHERE
ISNUMERIC(StudentId) = 1
AND ISNUMERIC(ExamId) = 1
AND ISNUMERIC(MarksObtained) = 1
AND TRY_CAST(MarksObtained AS DECIMAL(5,2)) IS NOT NULL;
Validate types before casting.
Common Migration Mistakes
Mistake 1: No staging table (direct import)
Wrong:
BULK INSERT Student FROM 'file.csv';
-- Bad data mixed with production immediately
Result: Corrupted production data.
Fix: Use staging:
BULK INSERT Student_Staging FROM 'file.csv';
-- Validate, clean
INSERT INTO Student SELECT * FROM Student_Staging WHERE valid;
Mistake 2: No validation (garbage in)
Wrong:
INSERT INTO Student SELECT * FROM Staging;
-- No checks: NULLs, invalid class names, duplicates
Result: Data integrity violations.
Fix: Validate before insert:
INSERT INTO Student
SELECT * FROM Staging
WHERE Name IS NOT NULL AND ClassName IN (...)
AND RollNumber NOT IN (SELECT RollNumber FROM Student);
Mistake 3: No rollback on error
Wrong:
INSERT INTO Student SELECT * FROM Staging;
INSERT INTO Fee SELECT * FROM FeeStagin; -- Typo, fails
-- Student inserted, Fee not (inconsistent)
Fix: Transaction:
BEGIN TRANSACTION;
INSERT INTO Student ...;
INSERT INTO Fee ...;
COMMIT;
-- All or nothing
Mistake 4: Large batch (out of memory)
Wrong:
INSERT INTO Student SELECT * FROM StagingWith1M rows;
-- Memory error, partial insert
Fix: Batch process:
WHILE EXISTS (SELECT 1 FROM Staging WHERE Processed = 0)
BEGIN
INSERT INTO Student
SELECT TOP 1000 * FROM Staging WHERE Processed = 0;
UPDATE Staging SET Processed = 1 WHERE StudentId IN (SELECT TOP 1000 StudentId ...);
END
Mistake 5: Not testing migration
Wrong:
Import 50K rows to production (first time)
-- Unexpected errors, data corrupted
Fix: Test first:
-- Test on copy of production
BULK INSERT Student_Test FROM 'file.csv';
-- Validate, measure time, check errors
-- Then run on production
Best Practices for Migration
- Use staging tables -- Separate validation from production
- Validate before load -- Check data quality
- Transaction-based -- All or nothing
- Batch large loads -- Avoid memory errors
- Handle duplicates -- Check existing data
- Test first -- On test database
- Log migration -- Track what loaded
- Rollback plan -- Know how to undo
Extract: Get source data (CSV, API, database). Transform: Clean, validate, standardize. Load: Insert into target.
Migration pipeline: Extract -> Transform -> Load.
No, use staging:
File -> Staging (validate, clean) -> Production
Staging: Safe testing ground. Production: Validated only.
-- Find duplicates
SELECT RollNumber, COUNT(*) FROM Staging
GROUP BY RollNumber HAVING COUNT(*) > 1;
-- Keep first, delete others
DELETE FROM Staging WHERE StudentId NOT IN (
SELECT MIN(StudentId) FROM Staging
GROUP BY RollNumber
);
Remove duplicates in staging before production insert.
Bulk Copy Program (command-line):
bcp Table in file.bcp -S server -U user -P password
Fast import/export. Use for large volumes (1M+ rows).
SELECT * FROM Staging
WHERE
Name IS NULL -- Missing required
OR ClassName NOT IN (...) -- Invalid value
OR MarksObtained < 0 -- Out of range
OR RollNumber IN (SELECT RollNumber FROM Student); -- Duplicate
Test before inserting.
Yes:
BEGIN TRANSACTION;
BEGIN TRY
INSERT INTO Student ...;
INSERT INTO Fee ...;
COMMIT;
END TRY
BEGIN CATCH
ROLLBACK;
END CATCH
All-or-nothing. Prevent partial inserts.
Use ChatGPT, Claude, or Copilot to go deeper on Data migration and ETL in SQL Server. Try these prompts:
"Show BULK INSERT from CSV with validation""How do I use staging table pattern (3 steps)?""What's ETL? Explain Extract, Transform, Load""How do I handle duplicates during import?""Quiz me: ask 5 questions about migration, validation, staging tables"
💡 Tip: After reading this article, paste your own code into AI and ask "What could go wrong here and why?" — fastest way to find edge cases and deepen understanding.
Next Article
35. Best Practices & Database Design -- Building Robust Systems ->