Codex CLI Refactoring Guide: Safe Workflow & Real-World Techniques
Code refactoring is one of the highest-risk operations in software engineering — modifying working code to improve its internal structure without changing external behavior. Codex CLI reduces refactoring risk through sandbox isolation, step-by-step review, and automated testing. This guide covers everything from single-function extraction to 100K-line TypeScript migrations.
1. Why Use AI for Code Refactoring?
The core risk in manual refactoring is that humans struggle to modify code while keeping all behavior identical — especially for cross-file renames, large function splits, and similar operations where omissions are easy. Codex CLI reduces this risk through several mechanisms:
- Sandbox isolation: Every operation runs in an isolated sandbox with no direct impact on your codebase until you explicitly confirm
- Global consistency: Cross-file renames scan every reference point across the codebase, leaving nothing behind
- Test-driven verification: A single task can generate tests, refactor, then run tests — a complete closed loop
- Reviewable diffs: In suggest mode, every change is presented as a diff for human confirmation before being applied
| Dimension | Manual Refactoring | Codex-Assisted |
|---|---|---|
| Cross-file renames | IDE global replace — misses references in string literals and comments | Scans all files including strings and comments; handles uniformly |
| Large function splits | Manually identify responsibility boundaries; easy to miss edge cases | Auto-identifies responsibilities, generates extraction plan, preserves call signatures |
| TypeScript migration | File-by-file annotation; repetitive and slow | Bulk type inference, auto-annotation, flags positions it cannot infer |
| Regression risk | Relies on manual testing or existing test coverage | Can generate missing tests in the same task, then verify immediately |
| Reversibility | Requires manually creating git stash or branch | Recommends git branches; Codex can generate the full branching workflow |
2. The 5-Step Safe Refactoring Workflow
Regardless of refactoring scope, following this five-step workflow minimizes risk while maintaining maximum efficiency.
Step 1: Use dry-run to Assess Scope
Before any actual modifications, use the --dry-run flag to understand what Codex plans to do:
# Preview the refactoring plan without making any changes
codex exec --dry-run "Refactor the UserService class:
- Split createUser, updateUser, deleteUser into separate modules
- Extract common parameter validation logic into a validateUserInput function
- Keep all public method signatures unchanged"
# See which files will be touched
codex exec --dry-run "Extract all functions over 50 lines in src/utils.ts into separate files"
The dry-run output tells you: which files will be modified, which will be created, which will be deleted, and the estimated line count of changes. This lets you judge whether the refactoring plan is sound before committing to it.
Step 2: Create a git Branch to Protect Existing Code
# Create a dedicated refactoring branch
git checkout -b refactor/extract-user-service
# Or let Codex create the branch and run the refactoring
codex exec "Create a git branch refactor/user-service, then refactor src/services/user.ts:
split the processUserData function (80+ lines) into three separate functions,
each handling data validation, data transformation, and data persistence"
Step 3: Review Incrementally in Suggest Mode
# Suggest mode (default) — shows diff and waits for confirmation
codex "Extract the duplicate error-handling code in src/api/users.ts
into a handleApiError function, place it in src/utils/errors.ts,
and update all call sites with the correct import"
# Review the diff, then choose to apply or skip each change
Step 4: Run Tests to Verify Behavior Is Unchanged
# After applying Codex's changes, immediately run tests
npm test
# Or instruct Codex to run tests automatically after refactoring
codex exec "Refactor UserService, then run npm test after completing.
If any tests fail, list the failure reasons and fix them"
Step 5: Code Review Then Merge
# Review the full branch diff
git diff main..refactor/extract-user-service
# Create a PR (if using GitHub)
gh pr create --title "refactor: extract UserService into modules" \
--body "Codex-assisted refactoring. All tests pass. No behavior changes."
# Merge after approval
git merge refactor/extract-user-service
3. Three Core Refactoring Patterns
Pattern 1: Extract Function
Extracting a segment of logic from a large function into its own named function is the most common refactoring operation. Here is a typical extraction workflow:
# Identify: ask Codex to find extractable logic blocks
codex "Analyze the processPayment function in src/checkout/payment.ts (~120 lines),
identify blocks of logic that can be extracted into standalone functions,
and generate an extraction plan (don't execute — just list recommendations)"
# Execute: extract one block at a time
codex "Extract the card validation logic from processPayment into
validateCardNumber(cardNumber: string): boolean,
keep processPayment calling validateCardNumber,
do not change the function's external behavior"
codex "Extract the amount calculation logic from processPayment into
calculateFinalAmount(items, discounts, tax): number,
including discount application and tax rate computation,
keep processPayment calling the new function"
Pattern 2: Rename Refactoring (Enforce Naming Conventions)
When a codebase has accumulated inconsistent naming styles, cross-file renaming is the operation most likely to introduce bugs — and also where Codex has the greatest advantage:
# Enforce camelCase variable naming
codex "Scan all TypeScript files under src/, convert all snake_case
variable names and function names to camelCase.
Do NOT modify: object keys (may come from APIs), file names, class names"
# Rename a specific function across all files
codex "Rename the getUserData function to fetchUserProfile everywhere in src/
including test files, and update related comments and JSDoc strings"
# Enforce I-prefix on all interfaces
codex "Add an I prefix to all interface names in src/types/ that don't already have one
(e.g. User becomes IUser), then update every file that references these interfaces"
Pattern 3: Split Large Files
When a single file exceeds 1,000 lines, maintenance cost rises sharply. Here is the standard workflow for splitting a large file into focused modules:
# Step 1: analyze responsibilities
codex "Analyze src/api.ts (~1200 lines), identify how many distinct
responsibility domains it contains, and suggest how to split it into
multiple files by responsibility — give a proposed file structure (don't execute)"
# Step 2: extract one responsibility at a time
codex "Extract all user-related API functions (createUser, getUser, updateUser, deleteUser)
from src/api.ts into src/api/users.ts,
keep re-exports in src/api.ts for backward compatibility"
codex "Extract all product-related API functions from src/api.ts into src/api/products.ts,
keeping the same re-exports in src/api.ts"
# Step 3: clean up the entry file
codex "src/api.ts now contains only re-exports. Convert it into a proper barrel file.
Update any call sites that import specific functions directly from src/api.ts
to import from the relevant submodule instead"
4. TypeScript Migration Refactoring
Migrating a JavaScript project to TypeScript is a special kind of refactoring: runtime behavior stays identical while static type information is added. Codex CLI can dramatically accelerate this process.
Adding Type Annotations
# Batch-add basic type annotations
codex "Convert src/utils/string.js to TypeScript:
- Rename the file to string.ts
- Add type annotations to all function parameters and return values
- Use precise types (no any)
- Do not change any business logic"
# Generate a .d.ts declaration file for a JS file (gradual migration)
codex "Generate a TypeScript declaration file payment.d.ts for src/legacy/payment.js,
inferring types by analyzing actual usage patterns in the codebase"
Eliminating any Types
# Find and fix all any types
codex "Scan all TypeScript files under src/ for uses of any.
Where a concrete type can be inferred, replace with the precise type.
Where the type genuinely cannot be determined, replace with unknown
and add a TODO comment explaining why"
# Fix any in a specific file
codex "Fix all any types in src/services/api.ts:
- API response types should reference existing interfaces in src/types/api.ts
- Event handler parameters should use correct DOM event types
- Third-party callback parameters should reference the library's type definitions"
Extracting Interfaces
# Extract an interface from an implementation
codex "Analyze src/services/UserService.ts.
Extract an IUserService interface (public methods only)
into src/interfaces/IUserService.ts.
Make UserService implement IUserService.
Update code that depends on UserService to depend on IUserService instead
(program to the interface, not the implementation)"
# Infer interfaces from JSON data
codex "Based on the data structure in src/mock/user-response.json,
generate corresponding TypeScript interfaces in src/types/user.ts.
Mark fields that may be null or undefined with ? (optional properties)"
5. Test-Driven Refactoring (TDD Refactoring)
The core principle of TDD refactoring: have test coverage first, then change the code. Tests are your safety net — with tests in place, refactoring becomes a well-protected operation.
Step 1: Have Codex Generate Tests Covering Existing Behavior
# Generate comprehensive tests for the code you're about to refactor
codex "Generate complete unit tests for the calculateOrderTotal function
in src/services/order.ts:
- Cover normal cases (with discount / no discount / multiple items)
- Cover edge cases (empty cart / zero amount / very large amounts)
- Cover error cases (invalid item ID / negative quantity)
Place the test file in src/services/__tests__/order.test.ts using Jest"
# Confirm all tests pass (validate the tests themselves are correct)
npm test src/services/__tests__/order.test.ts
Step 2: Execute the Refactoring
# Once tests pass, run the refactoring
codex "Refactor the calculateOrderTotal function in src/services/order.ts:
- Extract discount calculation logic into applyDiscounts(items, discounts)
- Extract tax calculation logic into applyTax(subtotal, taxRate)
- The main function should only orchestrate — no calculation logic inline
- Keep the function signature unchanged: calculateOrderTotal(items, options): number"
Step 3: Verify Tests Still Pass
# Run tests after refactoring to confirm behavior is unchanged
npm test src/services/__tests__/order.test.ts
# If there are failures, ask Codex to diagnose
codex "Tests are failing. The error output is: [paste error output here]
Analyze the behavioral difference between the refactored calculateOrderTotal
and the original. Fix the implementation to make tests pass.
Do not modify the tests themselves."
# Run the full test suite to ensure nothing else was affected
npm test
6. AGENTS.md Refactoring Configuration
For projects with ongoing refactoring needs, configuring refactoring standards in AGENTS.md lets Codex automatically follow project conventions in every session, eliminating the need to re-explain rules each time.
## Refactoring Standards
- Run `npm test` after every refactoring task to verify behavior is unchanged
- Public APIs must not be removed — only additions are allowed (backward compatibility)
- Follow project naming conventions: camelCase variables, PascalCase classes,
UPPER_SNAKE_CASE constants
- Read CHANGELOG.md before refactoring to understand historical decisions
and avoid repeating abandoned approaches
- Extracted utility functions go in src/utils/; service classes go in src/services/
- When creating a new file, also create its test file under __tests__/
- Do not introduce new external dependencies during refactoring
## Refactoring Progress Tracking
- Completed modules are listed in the "Refactoring Progress" section at the end of this file
- In-progress modules are marked 🔄, pending are marked ⬜, completed are marked ✅
Save the above to your project's AGENTS.md (or append to the appropriate section if the file already exists). These standards will automatically apply to every subsequent Codex refactoring task.
packages/legacy/ requires more conservative strategies), create a separate AGENTS.md in that directory. Codex automatically merges both configurations, with the deeper path taking precedence.
7. Splitting Functions Over 50 Lines
Function length alone is not the problem — but beyond a certain size, functions tend to take on multiple responsibilities, and that is what actually needs refactoring. Here is a complete guide to splitting strategy by function size.
Step 1: Identify Responsibilities
# Ask Codex to analyze what a function is doing
codex "Analyze the handleCheckout function in src/controllers/checkout.ts (~180 lines).
List every responsibility it carries (describe each in one sentence).
Identify which responsibilities are tightly coupled and which can be extracted independently."
Step 2: Generate an Extraction Plan (dry-run)
# Based on the responsibility analysis, generate a concrete extraction plan
codex exec --dry-run "Based on the responsibility analysis, generate a function extraction plan
for handleCheckout:
- Each extracted function should have a clear name (verb + noun)
- Specify parameter and return types
- State the extraction order (extract leaf dependencies first)
Do not execute — output the plan only"
Step 3: Extract One Responsibility at a Time
# Extract one responsibility per task, verify after each
codex "Extract the inventory check logic from handleCheckout into
checkInventoryAvailability(items): Promise<void>.
The function should throw InsufficientStockError if stock is insufficient.
Replace the inline logic in handleCheckout with a call to the new function."
npm test # Verify after every extraction
codex "Extract the payment processing logic from handleCheckout into
processPaymentTransaction(paymentInfo, amount): Promise<string>,
returning a transaction ID. Replace inline logic in handleCheckout."
npm test
| Function Size | Recommended Strategy | Extraction Granularity |
|---|---|---|
| 50–100 lines | Identify 2–3 responsibilities and extract directly | One responsibility per task |
| 100–300 lines | Map responsibilities first, then extract from innermost to outermost | 3–5 tasks, one responsibility each |
| 300+ lines | Consider splitting into separate classes or modules, not just functions | Extract to private methods first, then evaluate class splitting |
8. FAQ
Will Codex refactoring change my function behavior?
No — as long as you explicitly state "preserve behavior" or "keep the function signature unchanged" in your prompt. Codex keeps function inputs and outputs identical, restructuring only the internals. That said, language models are not perfect. For absolute certainty, run your test suite after every refactoring task. Including "run npm test after refactoring and confirm all tests pass" in your prompt is the gold-standard practice.
How do I ensure refactoring doesn't introduce bugs?
Use a three-layer safety net:
- Preview phase: Use
--dry-runto review the scope of changes before applying anything - Isolation phase: Work on a git branch so you can
git checkout .to revert instantly - Verification phase: Generate test coverage before refactoring, then run tests after to confirm behavior is unchanged
Writing "run npm test after every refactoring task" into AGENTS.md makes Codex do this automatically, with no manual reminder needed.
Is refactoring a large project (100K+ lines) feasible?
Completely feasible, but an incremental strategy is mandatory — never attempt to refactor the entire codebase in a single task. The recommended approach: refactor module by module in 200–500 line increments; start from leaf nodes (utility functions with the fewest dependencies); work inward toward core modules; track progress in AGENTS.md to maintain cross-session context. For very large codebases, using GPT-4.1's 1M token context (codex --model gpt-4.1) improves handling of modules with complex dependency graphs.
Which files does Codex read during refactoring?
Codex automatically reads: (1) files explicitly named in your prompt; (2) direct dependencies found via static analysis (import chains); (3) related test files if they exist. Use the --context flag to force-include specific files such as architecture docs or type definitions. Use .codexignore to exclude directories that don't need to be read (like node_modules). In suggest mode, the complete list of files planned for modification is displayed before any changes are applied, giving you a clear picture of what was read.